When traveling in a region where the local language is not written using a "Roman alphabet," translating
written text (e.g., documents, road signs, or placards) is a particularly difficult problem since the text cannot
be easily entered into a translation device or searched using a dictionary. To address this problem, we are
developing the "Rosetta Phone," a handheld device (e.g., PDA or mobile telephone) capable of acquiring an
image of the text, locating the region (word) of interest within the image, and producing both an audio and a
visual English interpretation of the text. This paper presents a system targeted for interpreting words written in
Arabic script. The goal of this work is to develop an autonomous, segmentation-free Arabic phrase recognizer,
with computational complexity low enough to deploy on a mobile device. A prototype of the proposed system
has been deployed on an iPhone with a suitable user interface. The system was tested on a number of noisy
images, in addition to the images acquired from the iPhone's camera. It identifies Arabic words or phrases by
extracting appropriate features and assigning "codewords" to each word or phrase. On a dictionary of 5,000
words, the system uniquely mapped (word-image to codeword) 99.9% of the words. The system has a 82%
recognition accuracy on images of words captured using the iPhone's built-in camera.
In this paper, we investigate spatial and temporal models for texture analysis and synthesis. The goal is to use
these models to increase the coding efficiency for video sequences containing textures. The models are used to
segment texture regions in a frame at the encoder and synthesize the textures at the decoder. These methods
can be incorporated into a conventional video coder (e.g. H.264) where the regions to be modeled by the textures
are not coded in a usual manner but texture model parameters are sent to the decoder as side information. We
showed that this approach can reduce the data rate by as much as 15%.
Although considerable work has been done in management of "structured" video such as movies, sports, and
television programs that has known scene structures, "unstructured" video analysis is still a challenging problem
due to its unrestricted nature. The purpose of this paper is to address issues in the analysis of unstructured video
and in particular video shot by a typical unprofessional user (i.e home video). We describe how one can make use
of camera motion information for unstructured video analysis. A new concept, "camera viewing direction," is
introduced as the building block of home video analysis. Motion displacement vectors are employed to temporally
segment the video based on this concept. We then find the correspondence between the camera behavior with
respect to the subjective importance of the information in each segment and describe how different patterns in
the camera motion can indicate levels of interest in a particular object or scene. By extracting these patterns,
the most representative frames, keyframes, for the scenes are determined and aggregated to summarize the video
sequence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.