The Splitter is Apollo's 1st module in the intelligent processing chain, that automatically identifies whether there are one or more separate documents in the file uploaded to our IDP platform, using Computer Vision technology.
If the splitter identifies multiple documents, they will be separated into distinct files, each containing a single document.
To prepare and train a custom splitter, identify those files in your history of documents received over time that contain multiple documents and upload them to the splitter for system training. All files uploaded for learning and training form the dataset for the splitter. For better splitter performance after learning, we recommend using at least 100 pages representing the first page of a document and at least 100 pages that do NOT represent the first page of a document, meaning they should be page 2, 3, or further.
It is important for the dataset to be balanced, meaning the number of examples from the first page should be approximately equal to the number of examples from page 2, 3, etc.
Here are the steps you need to follow:
1. In the left menu, go to Documents Flow and activate the splitter module from the central configuration menu.
2. Click on Splitter, then Add a new splitter, and enter the name of the splitter you want to create.
3. Click on the name of the splitter, then upload the documents for learning. Don't forget the earlier recommendations regarding the required training dataset.
4. After all training examples have been uploaded to the splitter, mark each page as either First Page or NOT First Page.
5. Once you have marked the pages with the appropriate attributes, click on Auto-tags. Now you can start the training process, which takes approximately 4-5 hours.
When the training is complete, the splitter will automatically transition to the Ready state but remain Inactive. To use it, click on Edit, then Activate. You can test it on a few documents from the My Documents section, and you will see how they are automatically analyzed and separated into distinct files, each containing a single document.
It's crucial to remember that you cannot have more than one active splitter! If you activate a splitter while another one was already active, Apollo will deactivate the first one and activate the newest one.
Have a great day ! Apollo is here to handle Intelligent Document Processing for you! 😊