Evaluation of an automatic markup system

Kazem Taghva; Allen Condit; Julie Borsack

doi:10.1117/12.205833

30 March 1995 Evaluation of an automatic markup system

Kazem Taghva, Allen Condit, Julie Borsack

Proceedings Volume 2422, Document Recognition II; (1995) https://doi.org/10.1117/12.205833
Event: IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology, 1995, San Jose, CA, United States

Abstract

One predominant application of OCR is the recognition of full text documents for information retrieval. Modern retrieval systems exploit both the textual content of the document as well as its structure. The relationship between textual content and character accuracy have been the focus of recent studies. It has been shown that due to the redundancies in text, average precision and recall is not heavily affected by OCR character errors. What is not fully known is to what extent OCR devices can provide reliable information that can be used to capture the structure of the document. In this paper, we present a preliminary report on the design and evaluation of a system to automatically markup technical documents, based on information provided by an OCR device. The device we use differs from traditional OCR devices in that it not only performs optical character recognition, but also provides detailed information about page layout, word geometry, and font usage. Our automatic markup program, which we call Autotag, uses this information, combined with dictionary lookup and content analysis, to identify structural components of the text. These include the document title, author information, abstract, sections, section titles, paragraphs, sentences, and de-hyphenated words. A visual examination of the hardcopy is compared to the output of our markup system to determine its correctness.

Citation Download Citation

Kazem Taghva, Allen Condit, and Julie Borsack "Evaluation of an automatic markup system", Proc. SPIE 2422, Document Recognition II, (30 March 1995); https://doi.org/10.1117/12.205833

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available