Paper
18 December 2001 Generic approach for OCR performance evaluation
Abdel Belaid, Laurent Pierron
Author Affiliations +
Proceedings Volume 4670, Document Recognition and Retrieval IX; (2001) https://doi.org/10.1117/12.450729
Event: Electronic Imaging, 2002, San Jose, California, United States
Abstract
This paper presents the limits of the character recognition engines (commercial OCRs) and how to exceed these limits to achieve the industrial goals in terms of document capture and coding performances. The recent integration of these OCRs in several industrial capture chains leads to think that a solution is possible to reach electronically the same performances obtained by human typists. After a global description of the problems and the exposure of the OCR limits, the paper will focus on the methodology used and details the different steps proposed for the individual performance improvement. The first step consists in the individual evaluation of the OCRs. This is made by comparing the OCR result with a ground truth, which allows to highlight its defects and catalogue its main errors on the document processed. The second step allows to increase these individual performances by combination the OCR with some others. Our choice has been fixed on the combination of only two OCRs deemed very efficient and complementary on the same class of documents. The residual errors are treated in the last step which be able to propose a list of heuristics resolving punctually the OCR defects on the limit cases. In order to validate our approach, we present in the second part of the paper a practical case of experimentation to reach industrial performances. This approach has been tested in the framework of an industrial application for automatic document capture, by attempting the lowest score, imposed on one specific document class, of 1 error for 10000 characters.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Abdel Belaid and Laurent Pierron "Generic approach for OCR performance evaluation", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); https://doi.org/10.1117/12.450729
Lens.org Logo
CITATIONS
Cited by 5 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

RELATED CONTENT

Comparison of scanned administrative document images
Proceedings of SPIE (January 31 2020)
Chemical structure recognition: a rule-based approach
Proceedings of SPIE (January 23 2012)
Federal Register document image database
Proceedings of SPIE (January 07 1999)

Back to Top