Managing large document databases is an important task today. Being able to automatically com-
pare document layouts and classify and search documents with respect to their visual appearance
proves to be desirable in many applications. We measure single page documents' similarity with
respect to distance functions between three document components: background, text, and saliency.
Each document component is represented as a Gaussian mixture distribution; and distances between
dierent documents' components are calculated as probabilistic similarities between corresponding
distributions. The similarity measure between documents is represented as a weighted sum of the
components' distances. Using this document similarity measure, we propose a browsing mechanism
operating on a document dataset. For these purposes, we use a hierarchical browsing environment
which we call the document similarity pyramid. It allows the user to browse a large document dataset
and to search for documents in the dataset that are similar to the query. The user can browse the
dataset on dierent levels of the pyramid, and zoom into the documents that are of interest.
KEYWORDS: 3D modeling, Head, 3D image processing, Image segmentation, Nose, 3D image reconstruction, Ear, 3D acquisition, Facial recognition systems, Mouth
3D Head models have many applications, such as virtual conference, 3D web game, and so on. The existing several web-based
face modeling solutions that can create a 3D face model from one or two user uploaded face images, are limited to
generating the 3D model of only face region. The accuracy of such reconstruction is very limited for side views, as well
as hair regions. The goal of our research is to develop a framework for reconstructing the realistic 3D human head based
on two approximate orthogonal views. Our framework takes two images, and goes through segmentation, feature points
detection, 3D bald head reconstruction, 3D hair reconstruction and texture mapping to create a 3D head model. The main
contribution of the paper is that the processing steps are applies to both the face region as well as the hair region.
Even though technology has allowed us to measure many different aspects of images, it is still a challenge to
objectively measure their aesthetic appeal. A more complex challenge is presented when an arrangement of
images is to be analyzed, such as in a photo-book page. Several approaches have been proposed to measure the
appeal of a document layout that, in general, make use of geometric features such as the position and size of a
single object relative to the overall layout. Fewer efforts have been made to include in a metric the influence of
the content and composition of images in the layout. Many of the aesthetic characteristics that graphic designers
and artists use in their daily work have been either left out of the analysis or only roughly approximated in an
effort to materialize the concepts.
Moreover, graphic design tools such as transparency and layering play an important role in the professional
creation of layouts for documents such as posters and flyers. The main goal of our study is to apply similar
techniques within an automated photo-layout generation tool. Among other design techniques, the tool makes
use of layering and transparency in the layout to produce a professional-looking arrangement of the pictures.
Two series of experiments with people from different levels of expertise with graphic design provided us with the
tools to make the results of our system more appealing. In this paper, we discuss the results of our experiments
in the context of distinct graphic design concepts.
Web article pages usually have hyperlinks (or links) that lead to print-friendly web pages containing mainly the article
content. Content extraction using these print-friendly pages is generally easier and more reliable, but there are many
variations of the print-link representations in HTML that made robust print-link detection more difficult than it first
appears. First, the link can be text-based, image-based, or both. For example, there is a lexicon of phrases used to
indicate print-friendly pages, such as "print", "print article", "print-friendly version", etc. In addition, some links use
printer-resembling image icons with or without a print phrase present. To complicate the matter further, not all the links
contain a valid URL, but instead the pages are dynamically generated either by the client Javascript or by the server, so
no URL is available for extraction. We estimate that there are more than 90% of the Web article pages have print-links,
of which about 35% of them have valid print-friendly URLs, which is a good percentage. Our solution to the print-link
extraction problem takes on two stages: (1) the detection of the print-link, (2) the retrieval of the print-friendly page
URL from the link attributes, including the test for its validity. Experimental results based on roughly 2000 web article
pages suggest our solution is capable of achieving over 99% precision and 97% recall performance measures.
Object segmentation is important in image analysis for imaging tasks such as image rendering and image retrieval. Pet
owners have been known to be quite vocal about how important it is to render their pets perfectly. We present here an
algorithm for pet (mammal) fur color classification and an algorithm for pet (animal) fur texture classification. Per fur
color classification can be applied as a necessary condition for identifying the regions in an image that may contain pets
much like the skin tone classification for human flesh detection. As a result of the evolution, fur coloration of all
mammals is caused by a natural organic pigment called Melanin and Melanin has only very limited color ranges. We
have conducted a statistical analysis and concluded that mammal fur colors can be only in levels of gray or in two
colors after the proper color quantization. This pet fur color classification algorithm has been applied for peteye
detection. We also present here an algorithm for animal fur texture classification using the recently developed multi-resolution
directional sub-band Contourlet transform. The experimental results are very promising as these transforms
can identify regions of an image that may contain fur of mammals, scale of reptiles and feather of birds, etc. Combining
the color and texture classification, one can have a set of strong classifiers for identifying possible animals in an image.
Many conventional image processing algorithms such as noise filtering, sharpening and deblurring, assume a noise model of Additive White Gaussian Noise (AWGN) with constant standard deviation throughout the image. However, this noise model does not hold for images captured from typical imaging devices such as digital cameras, scanners and camera-phones. The raw data from the image sensor goes through several image processing steps such as demosaicing, color correction, gamma correction and JPEG compression, and thus, the noise characteristics in the final JPEG image deviates significantly from the widely-used AWGN noise model. Thus, when the image processing algorithms are applied to the digital photographs, they may not provide optimal image quality after the image processing due to the inaccurate noise model. In this paper, we propose a noise model that better fits the images captured from typical imaging devices and describe a simple method to extract necessary parameters directly from the images without any prior knowledge of imaging pipeline algorithms implemented in the imaging devices. We show experimental results of the noise parameters extracted from the raw and processed digital images.
In managing large collections of digital photographs, there have been many research efforts to compute low level image features such as texture and color to aid different managing tasks (e.g. query-by-example applications or scene classification for image clustering). In this paper, we focus on the assessment of image quality as a complementary feature to improve the manageability of images. Specifically, we propose an effective and efficient algorithm to analyze the focus quality of the photographs and provide quantitative measurement of the assessment. In this algorithm, global figure-of-merits are computed from matrices of the local image statistics such as sharpness, brightness and color saturation. The global figure-of-merits represent how well each image meets the prior assumptions about focus quality of natural images. Then, a collection of the global figure-of-merits are used to decide how well-focused an image is. Experimental results show that the method can detect 90% of the out-of-focus photographs labeled by experts while producing 11% of false positives. We further apply this quantitative measure in image management tasks, including image content filtering/sorting based on the focus quality and image retrieval.
Fixed pattern noise (FPN) or nonuniformity caused by device and interconnect parameter variations across an image sensor is a major source of image quality degradation especially in CMOS image sensors. In a CMOS image sensor, pixels are read out through different chains of amplifiers each with different gain and offset. Whereas offset variations can be significantly reduced using correlated double sampling (CDS), no widely used method exists for reducing gain FPN. In this paper, we propose to use a video sequence and its optical flow to estimate gain FPN for each pixel. This scheme can be used in a digital video or still camera by taking any video sequence with motion prior to capture and using it to estimate gain FPN. Our method assumes that brightness along the motion trajectory is constant over time. The pixels are grouped in blocks and each block's pixel gains are estimated by iteratively minimizing the sum of the squared brightness variations along the motion trajectories. We tested this method on synthetically generated sequences with gain FPN and obtained results that demonstrate significant reduction in gain FPN with modest computations.
An important trend in the design of digital cameras is the integration of capture and processing onto a single CMOS chip. Although integrating the components of a digital camera system onto a single chip significantly reduces system size and power, it does not fully exploit the potential advantages of integration. We argue that a key advantage of integration is the ability to exploit the high speed imaging capability of CMOS image senor to enable new applications such as multiple capture for enhancing dynamic range and to improve the performance of existing applications such as optical flow estimation. Conventional digital cameras operate at low frame rates and it would be too costly, if not infeasible, to operate their chips at high frame rates. Integration solves this problem. The idea is to capture images at much higher frame rates than he standard frame rate, process the high frame rate data on chip, and output the video sequence and the application specific data at standard frame rate. This idea is applied to optical flow estimation, where significant performance improvements are demonstrate over methods using standard frame rate sequences. We then investigate the constraints on memory size and processing power that can be integrated with a CMOS image sensor in a 0.18 micrometers process and below. We show that enough memory and processing power can be integrated to be able to not only perform the functions of a conventional camera system but also to perform applications such as real time optical flow estimation.
CMOS image sensors have benefitted from technology scaling down to 0.35 micrometers with only minor process modifications. Several studies have predicted that below 0.25 micrometers , it will become difficult, if not impossible to implement CMOS image sensors with acceptable performance without more significant process modifications. To explore the imaging performance of CMOS Image sensors fabricated in standard 0.18 micrometers technology, we designed a set of single pixel photodiode and photogate APS test structures. The test structures include pixels with different size n+/pwell and nwell/psub photodiodes and nMOS photogates. To reduce the leakages due to the in-pixel transistors, the follower, photogate, and transfer devices all use 3.3V thick oxide transistors. The paper reports on the key imaging parameters measured from these test structures including conversion gain, dark current and spectral response. We find that dark current density decreases super-linearly in reverse bias voltage, which suggest that it is desirable to run the photodetectors at low bias voltages. We find that QE is quite low due to high pwell doping concentration. Finally we find that the photogate circuit suffered from high transfer gate off current. QE is not significantly affected by this problem, however.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.