Paper
14 April 1993 Representing lexicons by modified trie for fast partial-string matching
Stephen W. Lam, Xin Shen, Sheila X. Zhao, Sargur N. Srihari
Author Affiliations +
Proceedings Volume 1906, Character Recognition Technologies; (1993) https://doi.org/10.1117/12.143624
Event: IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology, 1993, San Jose, CA, United States
Abstract
A fast lexicon search based on trie is presented. Traditionally, a successful trie retrieval requires each element of the input string to find an exact match on a particular path in the trie. However, this approach cannot verify a garbled input string, which is generated due to imperfect character segmentation and recognition of an OCR. The string may contain multiple candidates for a character position or no candidate due to segmentation error. The proposed representation scheme, an extension of trie, allows lexicon look-up even with only some of the string elements specified. An input string is presented as a regular expression and all the paths in the trie that satisfy the expression will be considered as word candidates. The candidates can then be ranked based on character classifier decisions. Confusion in character recognition can be handled by allowing an expression component to have more than one character candidate while segmentation error can be overcome by postulating a word region to contain a certain number of unknown characters. Three extensions have been made to trie data structures for position independent access and fast exhaustive search of all valid paths: (1) bidirectional linkage between two nodes at adjacent levels to enable trie traversal in either direction, (2) the nodes with the same letter at the same word position are linked so that all the words which have the same letter at a specific position can be located immediately, and (3) an index table which records the entry points of letters at specific word positions in the trie in order to facilitate trie access at an arbitrary letter position. This representation scheme has been tested on postal address field recognition. The trie represents 11,157 words, the average access time is 0.02 sec on a SUN SPARC2 and with an average of 3 candidates.
© (1993) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Stephen W. Lam, Xin Shen, Sheila X. Zhao, and Sargur N. Srihari "Representing lexicons by modified trie for fast partial-string matching", Proc. SPIE 1906, Character Recognition Technologies, (14 April 1993); https://doi.org/10.1117/12.143624
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image segmentation

Detection and tracking algorithms

Sun

Binary data

Information visualization

Image processing

RELATED CONTENT


Back to Top