Paper
30 March 1995 Spotting phrases in lines of imaged text
Author Affiliations +
Proceedings Volume 2422, Document Recognition II; (1995) https://doi.org/10.1117/12.205828
Event: IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology, 1995, San Jose, CA, United States
Abstract
A system that searches for user-specified phrases in imaged text is described. The search `phrases' can be word fragments, words, or groups of words. The imaged text can be composed of a number of different fonts and can contain graphics. A combination of morphology, simple statistical methods and hidden Markov modeling is used to detect and locate the phrases. The image is deskewed, and then bounding boxes are found for text-lines in the image using multiresolution morphology. Baselines, toplines and the x-height in a text-line are identified using simple statistical methods. The distance between baseline and x-height is used to normalize each hypothesized text-line bounding box, and the columns of pixel values in a normalized bounding box serve as the feature vector for that box. Hidden Markov models are crated for each user-specified search string and to represent all text and graphics other than the search strings. Phrases are identified using Viterbi decoding on a spotting network created from the models. The operating point of the system can be varied to trade off the percentage of words correctly spotted and the percentage of false alarms. Results are given using a subset of the UW English Document Image Database I.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Francine R. Chen, Dan S. Bloomberg, and Lynn D. Wilcox "Spotting phrases in lines of imaged text", Proc. SPIE 2422, Document Recognition II, (30 March 1995); https://doi.org/10.1117/12.205828
Lens.org Logo
CITATIONS
Cited by 23 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Statistical modeling

Databases

Visualization

Data modeling

Performance modeling

Statistical methods

Image processing

RELATED CONTENT

An online handwriting recognition system for Turkish
Proceedings of SPIE (January 17 2005)
New approach for logo recognition
Proceedings of SPIE (March 31 2000)
OmniPage vs. Sakhr paired model evaluation of two Arabic...
Proceedings of SPIE (January 07 1999)

Back to Top