.png)
How to Accurately Classify Documents with Intelligent OCR? A Concrete Use Case on ID Documents
Case study
Last update:
April 16, 2025
5 minutes
What are the best OCR software to process your invoices in 2025? We analyzed 10 data capture solutions to help you make the right choice. Thanks to the advances in AI and LLMs, these tools are more flexible, accurate and capable of transforming your document management into a real time-saver.
We compared 10 OCR tools, including Koncile, to help you pick the best invoice data extractor powered by AI and LLMs.
Thanks to advances in AI and LLMs, OCR tools are becoming more flexible, accurate, and capable of turning document management into a real time-saver. Koncile's fully modular OCR solution is among the innovative options that combine traditional OCR technology with LLMs for enhanced performance.
Why Extract Line-Item Details from Invoices? Every Invoice Line Contains Strategic Information—expenses, pricing, and cost variations. However, these valuable insights often remain unused because invoice formats are unstructured and vary between suppliers. Accurate Data Extraction Optimizes accounting, financial control, and procurement management, facilitating analysis and negotiation. The key challenge is Transforming Invoice Data into an Actionable, Structured Database.
AWS Textract identified 43 invoice fields, including essential details such as names, addresses, addresses, net and gross totals, and even some predefined fields like shipping costs and payment terms. When these fields are present in an invoice, the success rate approaches 100%.
Textract offers a Line Item Fields section to recognize invoice line details. While It Performs Well On Simple Invoices, extracting data into an Excel table without errors for 14 out of 15 cases, the tool struggles with Complex Invoices. Over 10 of the 15 complex invoices tested contained Significant errors, such as missing lines, misclassified descriptions, or irrelevant line additions. The issue arises because the recognition primarily relates on Computer Vision, Rather than Understanding linguistics. Textract is best suited for Simple Invoices in native PDF format rather than scanned PDFs.
Textract Does Not Allow Custom Field Extraction, such as company-specific identifiers. However, users can leverage the AnalyzeDocument - Queries feature to specify custom extractions. Additionally, if you work with Multiple suppliers with different invoice formats, Textract Does Not Consolidate Extracted line-item data into a unified Excel file, limiting its analytical potential.
Koncile is a highly customizable OCR solution Designed to automate and enhance the accuracy of invoice data extraction. Powered by an AI engine that combines Computer Vision and Large Language Models (LLMs), it achieves near 100% accuracy In identifying all essential fields, including supplier details (name, address, company registration number), net and gross amounts, VAT rates, and payment terms. Unlike traditional OCR solutions that often miss key elements or misinterpret data formats, Koncile ensures Consistent and reliable extraction, even for invoices with complex layouts.
Where many OCR tools struggle with detailed line-item extraction, Koncile excels by Understanding Invoice Structures Through Computer Vision and AI-Powered Text Analysis. It Extracts Accurately product descriptions, SKUs, quantities, quantities, unit prices, VAT rates, and discounts, adapting to Various supplier invoice formats. In our tests on Complex Invoices, Koncile achieved over 95% accuracy in line-item recognition, whereas other solutions failed to structure the data properly or produced errors in column recognition. This capability allows businesses to obtain Structured, usable data without the need for extensive manual corrections.
Koncile offers An Advanced Level of Customization, Enabling Businesses to Tailor Data Extraction to Their Specific Needs. Users can Configure which fields to extract, perform natural language queries to retrieve specific information, and standardize invoice formats For Seamless Integration Into Accounting systems or ERPs. Unlike solutions that require Extensive training on large datasets, Koncile dynamically adapts to different document structures, making it particularly effective for companies working with Multiple suppliers. With API and SDK integration, it seamlessly fits into existing workflows, providing Significant Time Savings And Fully Automated Invoice Processing.
Mindee provides An off-the-shelf invoice OCR Capable of detecting 16 primary fields. In our tests, the Success Rate for These Core Fields Was Nearly 100%, including for Scanned Invoices.
Mindee offers a Default set of line-item fields, including Description, product code, quantity, unit price, unit price, total price, and VAT. However, we 9 out of 15 complex invoices, the tool made errors when table formats became less standardized. Critical Data, Such as SKUs or EAN codes, were sometimes misclassified. Post-processing in Excel is required to correct errors.
Mindee provides an API Builder For Custom Field Extraction, but it requires Training the Model By annotating Dozens of Similar Invoices. Unlike more advanced AI tools, it does not support Natural Language Prompts For on-the-fly field extraction.
On average, Mindee Processed one invoice page in about 5 seconds Across our 30-test set.
Affinda's OCR Automatically detects common invoice fields. However, 5 out of 30 invoices Had errors in key fields such as Customer ID (SIRET) and total invoice amount.
Affinda Uses table detection for line-item recognition. Among the 15 complex invoices, 7 Produced Usable Results. However, when descriptions span multiple lines, Parasitic lines Often Appear, Making the Extracted Data Difficult to standardize.
Affinda offers Custom Field Selection, including the ability to Add or remove fields Using a Large language model (GPT). However, Customizing line-item extraction is not possible.
The tool includes a Correction feature For erroneous data and Adaptive Learning Capabilities for company-specific needs (not tested).
Google's Invoice Parser Extracts 37 predefined fields, but they Cannot be modified.
The Tool Extracts 7 fixed line-item fields (quantity, description, product code, order number, unit, unit price). However, these Fixed Fields Prevent Customization, Making It Unsuitable for unique business requirements. For simple invoices, accuracy is high, but for Complex Invoices, key details are often missing, and Some Lines Are Ignored.
Google Document AI supports Custom Training On invoice datasets, but we did not test this feature.
Nanonets is an OCR solution dedicated to document processing, including invoices. It Extracts 28 default fields And allows Format customization for each field (date, currency, etc.).
Nanonets extracts line-item details Using Table Recognition, similar to Affinda. However, for 15 complex invoices, some columns were Excluded, sometimes affecting critical data such as Product codes or unit prices (before tax).
The Pro version Allows users to Train datasets To specify where information is located. While useful for Long documents, this feature is Less Practical For Extraction line item In invoices.
Nanonets offers Google Drive integrations, Easy Excel exports, and Invoice Approval Workflows for seamless document processing.
Parsio's PDF Parser (pre-trained model) Extracts a Fixed Set of Invoice Fields. For these general fields (excluding line items), it achieves Near 100% accuracy for simple invoices And 97% for complex ones.
Among 15 complex invoices, 10 had precise line-item extraction. However, Issues persist with scanned PDFs. Since Customizing line-item extraction isn't possible, Misinterpretations occur (e.g., numbers being confused between fields). Users Cannot Correct Errors or Train the System, Making It Difficult to build a structured price database from extracted data.
Parsio offers GPT-4-based query search, Allowing Specific Data Extraction from documents. However, this feature Cannot be used for line-item recognition, making it impossible to Identify relevant fields across different invoice formats. Additionally, since it's Not yet combined with OCR, it Only Processes Native PDFs, ignoring document structure.
The Web app provides an email address Where documents can be sent for processing. A Wide range of integrations is available.
Airparser leverages GPT-4 Technology To extract Specific fields from various document types. It is developed by the same company as Parsio.
Airparser allows Custom Field Selection. Using the “list and table” function, it can extract Invoice line items By Defining Attributes For each row. Each field requires a description To refine Extraction accuracy.
For Simple Invoices, results are Satisfactory When field descriptions are Detailed Enough. However, for Complex Invoices, Column misalignment issues arise, leading to Higher Error Rates, especially in Scanned Invoices.
Base64.ai provides a Ready-to-Use Invoice Extraction Tool, offering a Standardized Set of Extracted Fields.
Among 15 simple invoices, 14 Were Extracted Accurately. However, for Complex Invoices, Issues Arose Due To Multiple numbers Causing Misinterpretations, page breaks affecting extraction and title-based information being ignored in 5 cases.
The Tool Allows Asking Questions About the document or Adding extracted fields, goal It does not support modifying line-item fields or providing extraction instructions.
Processing Time Can Reach Up to 1 Minute for Long Invoices. Base64.ai offers various integrations Into Document processing workflows.
Docsumo is a pre-configured OCR tool That Extracts Key Invoice Fields.
Docsumo extracts Line-items using table detection, similar to Nanonets and Affinda. It Works Well When Data Is Properly Aligned. However, for Complex tables, it Fails to Extract Relevant Information.
A “ChatAI” function Allows users to ask questions about the document. However, responses Cannot Yet Be Systematically Integrated into extracted fields. Additionally, The Tool Does Not Allow Modifying or Refining Either Key Field gold Line-Item Extractions.
Resources
How to Accurately Classify Documents with Intelligent OCR? A Concrete Use Case on ID Documents
Case study
Compare 4 OCRs according to your business uses, types of documents, API integration, customization and business logic.
Blog
Complete comparison of the best OCR solutions: Performances, use cases, prices.
Blog