PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 6492, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this Keynote Address paper, we review early work on Image and Video Quality Assessment against the backdrop of
an interpretation of image perception as a visual communication problem. As a way of explaining our recent work on
Video Quality Assessment, we first describe our recent successful advances on QA algorithms for still images,
specifically, the Structural SIMilarity (SSIM) Index and the Visual Information Fidelity (VIF) Index. We then describe
our efforts towards extending these Image Quality Assessment frameworks to the much more complex problem of
Video Quality Assessment. We also discuss our current efforts towards the design and construction of a generic and
publicly-available Video Quality Assessment database.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Portrait artists using oils, acrylics or pastels use a specific but open human vision methodology to create a painterly
portrait of a live sitter. When they must use a photograph as source, artists augment their process, since photographs
have: different focusing - everything is in focus or focused in vertical planes; value clumping - the camera darkens the
shadows and lightens the bright areas; as well as color and perspective distortion. In general, artistic methodology
attempts the following: from the photograph, the painting must 'simplify, compose and leave out what's irrelevant,
emphasizing what's important'. While seemingly a qualitative goal, artists use known techniques such as relying on
source tone over color to indirect into a semantic color temperature model, use brush and tonal "sharpness" to create a
center of interest, lost and found edges to move the viewers gaze through the image towards the center of interest as well
as other techniques to filter and emphasize. Our work attempts to create a knowledge domain of the portrait painter
process and incorporate this knowledge into a multi-space parameterized system that can create an array of NPR
painterly rendering output by analyzing the photographic-based input which informs the semantic knowledge rules.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We consider the coding properties of multilayer LNL (linear-nonlinear-linear) systems. Such systems consist
of interleaved layers of linear transforms (or filter banks), nonlinear mappings, linear transforms, and so forth.
They can be used as models of visual processing in higher cortical areas (V2, V4), and are also interesting
with respect to image processing and coding. The linear filter operations in the different layers are optimized
for the exploitation of the statistical redundancies of natural images. We explain why even simple nonlinear
operations-like ON/OFF rectification-can convert higher-order statistical dependencies remaining between
the linear filter coefficients of the first layer to a lower order. The resulting nonlinear coefficients can then
be linearly recombined by the second-level filtering stage, using the same principles as in the first stage. The
complete nonlinear scheme is invertible, i.e., information is preserved, if nonlinearities like ON/OFF rectification
or gain control are employed. In order to obtain insights into the coding efficiency of these systems we investigate
the feature selectivity of the resulting nonlinear output units and the use of LNL systems in image compression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion coding in the brain undoubtedly reflects the statistics of retinal image motion occurring in the natural
environment. To characterize these statistics it is useful to measure motion in artificial movies derived from simulated
environments where the "ground truth" is known precisely. Here we consider the problem of coding retinal image
motion when an observer moves through an environment. Simulated environments were created by combining the
range statistics of natural scenes with the spatial statistics of natural images. Artificial movies were then created by
moving along a known trajectory at a constant speed through the simulated environments. We find that across a range
of environments the optimal integration area of local motion sensors increases logarithmically with the speed to which
the sensor is tuned. This result makes predictions for cortical neurons involved in heading perception and may find use
in robotics applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Previous work on unsupervised learning has shown that it is possible to learn Gabor-like feature representations,
similar to those employed in the primary visual cortex, from the statistics of natural images. However, such
representations are still not readily suited for object recognition or other high-level visual tasks because they
can change drastically as the image changes to due object motion, variations in viewpoint, lighting, and other
factors. In this paper, we describe how bilinear image models can be used to learn independent representations
of the invariances, and their transformations, in natural image sequences. These models provide the foundation
for learning higher-order feature representations that could serve as models of higher stages of processing in the
cortex, in addition to having practical merit for computer vision tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe an invertible nonlinear image transformation that is well-matched to the statistical properties of
photographic images, as well as the perceptual sensitivity of the human visual system. Images are first decomposed
using a multi-scale oriented linear transformation. In this domain, we develop a Markov random field
model based on the dependencies within local clusters of transform coefficients associated with basis functions
at nearby positions, orientations and scales. In this model, division of each coefficient by a particular linear
combination of the amplitudes of others in the cluster produces a new nonlinear representation with marginally
Gaussian statistics. We develop a reliable and efficient iterative procedure for inverting the divisive transformation.
Finally, we probe the statistical and perceptual advantages of this image representation, examining
robustness to added noise, rate-distortion behavior, and artifact-free local contrast enhancement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We examine the spatiotemporal power spectra of image sequences that depict dense motion parallax, namely the
parallax seen by an observer moving laterally in a cluttered 3D scene. Previous models of the spatiotemporal
power have accounted for effects such as a static 1/f spectrum in each image frame, a spreading of power at high
spatial frequencies in the direction of motion, and a bias toward either lower or higher image speeds depending
on the 3D density of objects the scene. Here we use computer graphics to generate a parameterized set of image
sequences and qualitatively verify the main features of these models. The novel contribution is to discuss how
failures of 1/f scaling can occur in cluttered scenes. Such failures have been described for the spatial case, but
not for the spatiotemporal case. We find that when objects in the cluttered scene are visible over a wide range
of depths, and when the image size of objects is smaller than the image width, failures of 1/f scaling tend to
occur at certain critical frequencies, defined by a correspondence between object size and object speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We report progress on an approach (Geometric Texton Theory - GTT) that like Marr's 'primal sketch' aims to describe
image structure in a way that emphasises its qualitative aspects. In both approaches, image description is by labelling
points using a vocabulary of feature types, though compared to Marr we aim for a much larger feature vocabulary.
We base GTT on the Gaussian derivative (DtG) model of V1 measurement. Marr's primal sketch was based on DtG
filters of derivative order up to 2nd, for GTT we plan to extend to the physiologically plausible limit of 4th. This is how
we will achieve a larger feature vocabulary (we estimate 30-150) than Marr's 'edge', 'line' and 'blob'. The central
requirement of GTT then is for a procedure for determining the feature vocabulary that will scale up to 4th order. We
have previously published feature category systems for 1-D 1st order, 1-D 2nd order, 2-D 1st order and 2-D pure 2nd order.
In this paper we will present results of GTT as applied to 2-D mixed 1st + 2nd order features.
We will review various approaches to defining the feature vocabulary, including ones based on (i) purely geometrical
considerations, and (ii) natural image statistics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The independent components of natural images are a set of linear filters which are optimized for statistical independence.
With such a set of filters images can be represented without loss of information. Intriguingly, the filter
shapes are localized, oriented, and bandpass, resembling important properties of V1 simple cell receptive fields.
Here we address the question of whether the independent components of natural images are also perceptually less
dependent than other image components. We compared the pixel basis, the ICA basis and the discrete cosine
basis by asking subjects to interactively predict missing pixels (for the pixel basis) or to predict the coefficients
of ICA and DCT basis functions in patches of natural images. Like Kersten (1987)1 we find the pixel basis to
be perceptually highly redundant but perhaps surprisingly, the ICA basis showed significantly higher perceptual
dependencies than the DCT basis. This shows a dissociation between statistical and perceptual dependence
measures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The optimal coding hypothesis proposes that the human visual system has adapted to the statistical properties
of the environment by the use of relatively simple optimality criteria.
We here (i) discuss how the properties of different models of image coding, i.e. sparseness, decorrelation,
and statistical independence are related to each other (ii) propose to evaluate the different models by verifiable
performance measures (iii) analyse the classification performance on images of handwritten digits (MNIST data
base). We first employ the SPARSENET algorithm (Olshausen, 1998) to derive a local filter basis (on 13 × 13
pixels windows). We then filter the images in the database (28 × 28 pixels images of digits) and reduce the
dimensionality of the resulting feature space by selecting the locally maximal filter responses. We then train a
support vector machine on a training set to classify the digits and report results obtained on a separate test
set. Currently, the best state-of-the-art result on the MNIST data base has an error rate of 0,4%. This result,
however, has been obtained by using explicit knowledge that is specific to the data (elastic distortion model
for digits). We here obtain an error rate of 0,55% which is second best but does not use explicit data specific
knowledge. In particular it outperforms by far all methods that do not use data-specific knowledge.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are two aspects to unsupervised learning of invariant representations of images: First, we can reduce the
dimensionality of the representation by finding an optimal trade-off between temporal stability and informativeness.
We show that the answer to this optimization problem is generally not unique so that there is still
considerable freedom in choosing a suitable basis. Which of the many optimal representations should be selected?
Here, we focus on this second aspect, and seek to find representations that are invariant under geometrical transformations
occuring in sequences of natural images. We utilize ideas of 'steerability' and Lie groups, which have
been developed in the context of filter design. In particular, we show how an anti-symmetric version of canonical
correlation analysis can be used to learn a full-rank image basis which is steerable with respect to rotations. We
provide a geometric interpretation of this algorithm by showing that it finds the two-dimensional eigensubspaces
of the average bivector. For data which exhibits a variety of transformations, we develop a bivector clustering
algorithm, which we use to learn a basis of generalized quadrature pairs (i.e. 'complex cells') from sequences of
natural images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A major challenge facing content-based image retrieval is bridging the
gap between low-level image primitives and high-level semantics.
We have proposed a new approach for semantic image classification that
utilizes the adaptive perceptual color-texture segmentation algorithm
by Chen et al., which segments natural scenes into perceptually
uniform regions. The color composition and spatial texture features
of the regions are used as medium level descriptors, based on which
the segments are classified into semantic categories. The segment
features consist of spatial texture orientation information and color
composition in terms of a limited number of spatially adapted
dominant colors. The feature selection and the performance of the
classification algorithms are based on the segment statistics.
We investigate the dependence of the segment statistics on
the segmentation algorithm. For this, we compare the statistics of
the segment features obtained using the Chen et al. algorithm to those
that correspond to human segmentations, and show that they are
remarkably similar. We also show that when human segmentations are
used instead of the automatically detected segments, the performance
of the semantic classification approach remains approximately the
same.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In wavelet-based image coding, a variety of masking properties have been exploited that result in spatially-adaptive quantization
schemes. It has been shown that carefully selecting uniform quantization step-sizes across entire wavelet subbands
or subband codeblocks results in considerable gains in efficiency with respect to visual quality. These gains have been
achieved through analysis of wavelet distortion additivity in the presence of a background image; in effect, how wavelet
distortions from different bands mask each other while being masked by the image itself at and above threshold. More
recent studies have illustrated how the contrast and structural class of natural image data influences masking properties
at threshold. Though these results have been extended in a number of methods to achieve supra-threshold compression
schemes, the relationship between inter-band and intra-band masking at supra-threshold rates is not well understood. This
work aims to quantify the importance of spatially-adaptive distortion as a function of compressed target rate. Two experiments
are performed that require the subject to specify the optimal balance between spatially-adaptive and non-spatiallyadaptive
distortion. Analyses of the resulting data indicate that on average, the balance between spatially-adaptive and
non-spatially-adaptive distortion is equally important across all tested rates. Furthermore, though it is known that meansquared-
error alone is not a good indicator of image quality, it can be used to predict the outcome of this experiment with
reasonable accuracy. This result has convenient implications for image coding that are also discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We apply two recent non-linear, image-processing algorithms to colour image compression. The two algorithms are
colorization and joint bilateral filtering. Neither algorithm was designed for image compression. Our investigations
were to ascertain whether their mechanisms could be used to improve the image compression rate for the same level of
visual quality. Both show interesting behaviour, with the second showing a visible improvement in visual quality, over
JPEG, at the same compression rate. In both cases, we store luminance as a standard, lossily compressed, greyscale
image and store colour at a very low sampling rate. Each of the non-linear algorithms then uses the information from the
luminance channel to determine how to propagate the colour information appropriately to reconstruct a full colour
image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes to extend the Karhunen-Loeve compression algorithm to multiple images. The resulting
algorithm is compared against single-image Karhunen Loeve as well as algorithms based on the Discrete Cosine
Transformation (DCT).
Futhermore, various methods for obtaining compressable clusters from large image databases are evaluated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently flat panel computer displays and notebook computer are designed with a so called glare panel i.e. highly
glossy screens, have emerged on the market. The shiny look of the display appeals to the costumers, also there are
arguments that the contrast, colour saturation etc improves by using a glare panel.
LCD displays suffer often from angular dependent picture quality. This has been even more pronounced by the
introduction of Prism Light Guide plates into displays for notebook computers.
The TCO label is the leading labelling system for computer displays. Currently about 50% of all computer displays on
the market are certified according to the TCO requirements. The requirements are periodically updated to keep up with
the technical development and the latest research in e.g. visual ergonomics. The gloss level of the screen and the angular
dependence has recently been investigated by conducting user studies.
A study of the effect of highly glossy screens compared to matt screens has been performed. The results show a slight
advantage for the glossy screen when no disturbing reflexes are present, however the difference was not statistically
significant. When disturbing reflexes are present the advantage is changed into a larger disadvantage and this difference
is statistically significant. Another study of angular dependence has also been performed. The results indicates a linear
relationship between the picture quality and the centre luminance of the screen.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To achieve the best image quality, noise and artifacts are generally removed at the cost of a loss of details generating the
blur effect. To control and quantify the emergence of the blur effect, blur metrics have already been proposed in the
literature. By associating the blur effect with the edge spreading, these metrics are sensitive not only to the threshold
choice to classify the edge, but also to the presence of noise which can mislead the edge detection.
Based on the observation that we have difficulties to perceive differences between a blurred image and the same reblurred
image, we propose a new approach which is not based on transient characteristics but on the discrimination
between different levels of blur perceptible on the same picture.
Using subjective tests and psychophysics functions, we validate our blur perception theory for a set of pictures which
are naturally unsharp or more or less blurred through one or two-dimensional low-pass filters. Those tests show the
robustness and the ability of the metric to evaluate not only the blur introduced by a restoration processing but also focal
blur or motion blur. Requiring no reference and a low cost implementation, this new perceptual blur metric is applicable
in a large domain from a simple metric to a means to fine-tune artifacts corrections.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human perception of image distortions has been widely explored in recent years, however, research has not dealt
with distortions due to geometric operations. In this paper, we present the results we obtained by means of
psychovisual experiments aimed at evaluating the way the human visual system perceives geometric distortions
in images. A mathematical model of the geometric distortions is first introduced, then the impact of the model
parameters on the visibility of the distortion is measured by means of both objective metrics and subjective
tests.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we studied how video compression and color scaling interact to affect the overall video quality and the
color quality attributes. We examined the three subjective attributes: perceived color preference, perceived color
naturalness, and overall annoyance, as digital videos were subjected to compression and chroma scaling. Our objectives
were: (1) to determine how the color chroma scaling of compressed digital videos affected the mean color preference
and naturalness and overall annoyance ratings across subjects and (2) to determine how preference, naturalness, and
annoyance were related. Psychophysical experiments were carried out in which naïve subjects made numerical
judgments of these three attributes. Preference and naturalness scores increased to a maximum and decreased as the
mean chroma of the videos increased. As compression increased, both preference and naturalness scores decreased and
they varied less with mean chroma. Naturalness scores tended to reach a maximum at lower mean chroma than
preference scores. Annoyance scores decreased to a minimum and then increased as mean chroma increased. The mean
chroma at which annoyance was minimum was less than the mean chroma at which naturalness and preference were
maximum. Preference, naturalness, and annoyance scores for individual videos, were approximated relatively well by
Gaussian functions of mean chroma. Preference and naturalness scores decreased linearly as a function of the logarithm
of the total squared error, while annoyance scores increased as an S-shaped function of the logarithm of the total squared
error. A three-parameter model is shown to provide a good description of how each attribute depends on chroma and
compression for individual videos. Model parameters vary with video content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To interpret the impressions of observers, it is necessary to understand the relationship between components that
influence perceived video quality. This paper addresses the effect of assessment methodology on the subjective
judgement for spatial and temporal impaired video material, caused by video adaptation methods that come into play
when there is variable throughput of video material (I-Frame Delay and Signal-to-Noise Ratio scalability). Judgement
strategies used are the double-stimulus continuous-quality scale (DSCQS) and the double stimulus impairment scale
(DSIS). Results show no evidence for an influence of spatial artifacts on perceived video quality with the presented
judgement strategies. Results for the influence of temporal artifacts are less easy to interpret, because it is not possible to
distinguish whether the non-linear relation between DSIS and DSCQS appeared because of the temporal artifacts
themselves or presented scene content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For members of the Deaf Community in the United States, current communication tools include TTY/TTD
services, video relay services, and text-based communication. With the growth of cellular technology, mobile
sign language conversations are becoming a possibility. Proper coding techniques must be employed to compress
American Sign Language (ASL) video for low-rate transmission while maintaining the quality of the conversation.
In order to evaluate these techniques, an appropriate quality metric is needed. This paper demonstrates that
traditional video quality metrics, such as PSNR, fail to predict subjective intelligibility scores. By considering
the unique structure of ASL video, an appropriate objective metric is developed. Face and hand segmentation
is performed using skin-color detection techniques. The distortions in the face and hand regions are optimally
weighted and pooled across all frames to create an objective intelligibility score for a distorted sequence. The
objective intelligibility metric performs significantly better than PSNR in terms of correlation with subjective
responses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditionally, subjective quality assessments are made in isolation of mediating factors (e.g. interest in content, price).
This approach is useful for determining the pure perceptual quality of content. Recently, there has been a growing
interest in understanding users' quality of experience. To move from perceptual quality assessment to quality of
experience assessment, factors beyond reproduction quality must be considered. From a commercial perspective, content
and price are key determinants of success. This paper investigates the relationship between price and quality. Subjects
selected content that was of interest to them. Subjects were given a budget of ten pounds at the start of the test. When
viewing content, subjects were free to select different levels of quality. The lowest quality was free (and subjects left the
test with ten pounds). The highest quality used up the full budget (and subjects left the test with no money). A range of
pricing tariffs was used in the test. During the test, subjects were allowed to prioritise quality or price. The results of the
test found that subjects prioritised quality over price across all tariff levels. At the higher pricing tariffs, subjects became
more price sensitive. Using data from a number of subjective tests, a utility function describing the relationship between
price and quality was produced.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Optical modeling suggests that levels of retinal defocus routinely caused by presbyopia should produce phase
reversals (spurious resolution-SR) for spatial frequencies in the 2 cycles/letter range known to be critical for
reading. Simulations show that such reversals can have a decisive impact on character legibility, and that
correcting only this feature of defocused images (by re-reversing contrast sign errors created by defocus) can
make unrecognizably blurred letters completely legible. This deblurring impact of SR correction is remarkably
unaffected by the magnitude of defocus, as determined by blur-circle size. Both the deblurrring itself and its
robustness can be understood from the effect that SR correction has on the defocused pointspread function, which
changes from a broad flat cake to a sharply pointed cone. This SR-corrected pointspread acts like a delta function,
preserving image shape during convolution regardless of blur-disk size. Curiously, such pointspread functions
always contain a narrow annulus of negative light-intensity values whose radius equals the diameter of the blur
circle. We show that these properties of SR-correction all stem from the mathematical nature of the Fourier
transform of the sign of the optical transfer function, which also accounts for the inevitable low contrast of
images pre-corrected for SR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The diverse needs for digital auto-focusing systems have driven the development of a
variety of focus measures. The purpose of the current study was to investigate whether
any of these focus measures are biologically plausible; specifically whether they are
applicable to retinal images from which defocus information is extracted in the operation
of accommodation and emmetropization, two ocular auto-focusing mechanisms. Ten
representative focus measures were chosen for analysis, 6 in the spatial domain and 4
transform-based. Their performance was examined for combinations of non-defocus
aberrations and positive and negative defocus. For each combination, a wavefront was
reconstructed, the corresponding point spread function (PSF) computed using Fast
Fourier Transform (FFT), and then the blurred image obtained as the convolution of the
PSF and a perfect image. For each blurred image, a focus measure curve was derived for
each focus measure. Aberration data were either collected from 22 real eyes or randomly
generated data based on Gaussian parameters describing data from a published large scale
human study (n>100). For the latter data set, analyses made use of distributed computing
on a small inhomogeneous computer cluster. In the presence of small amounts of nondefocus
aberrations, all focus measures showed monotonic changes with positive or
negative defocus, and their curves generally remained unimodal, although there were
large differences in their variability, sensitivity to defocus and effective ranges. However,
the performance of a number of these focus measures became unacceptable when nondefocus
aberrations exceed a certain level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vision scientists have segmented appearances into aperture and object modes, based on observations that scene stimuli
appear different in a black -no light- surround. This is a 19th century assumption that the stimulus determines the mode,
and sensory feedback determines the appearance. Since the 1960's there have been innumerable experiments on spatial
vision following the work of Hubel and Wiesel, Campbell, Gibson, Land and Zeki. The modern view of vision is that
appearance is generated by spatial interactions, or contrast. This paper describes experiments that provide a significant
increment of new data on the effects of contrast and constancy over a wider range of luminances than previously studied.
Matches are not consistent with discounting the illuminant. The observers' matches fit a simple two-step physical
description: The appearance of maxima is dependent on luminance, and less-luminous areas are dependent on spatial
contrast. The need to rely on unspecified feedback processes, such as aperture mode and object mode, is no longer
necessary. Simple rules of maxima and spatial interactions account for all matches in flat 2D transparent targets,
complex 3D reflection prints and HDR displays.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The purpose of this study is to examine gray matching between dark and ambient condition and to improve visibility
using result of gray matching experiment in mobile display and target luminance is 30000 lux for experiment. First of
all, for measuring visibility on ambient condition, the patch count experiment is conducted by investigating that how
many patches can be seen at original images under the ambient light. The visibility in ambient condition was significant
in comparison to dark condition. Next, the gray matching experiment is conducted by comparing gray patches between
dark and ambient condition using method of adjustment. The participants responded that the white or bright gray patch
could not find same brightness patch under ambient condition. To confirm the visibility improvement through the result
of gray matching experiment, visibility is measured under the ambient light after simple implementation. It was same
procedure of the first visibility experiment. After applying the gray matching curve, visibility was more improvement.
Statistic T test result between patches applied gray curve and maximum of dark condition was not significant. It means
that visibility was not different between original patches of dark condition and patches applied curve of ambient
condition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Experiments using real images are conducted on a variety of color constancy algorithms (Chromagenic, Greyworld,
Max RGB, and a Maloney-Wandell extension called Subspace Testing) in order to determine whether or not extending
the number of channels from 3 to 6 to 9 would enhance the accuracy with which they estimate the scene illuminant
color. To create the 6 and 9 channel images, filters where placed over a standard 3-channel color camera. Although
some improvement is found with 6 channels, the results indicate that essentially the extra channels do not help as much
as might be expected.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image acquisition devices inherently do not have color constancy mechanism like human visual system. Machine color constancy problem can be circumvented using a white balancing technique based upon accurate illumination estimation. Unfortunately, previous study can give satisfactory results for both accuracy and stability under various conditions. To overcome these problems, we suggest a new method: spatial and temporal illumination estimation. This method, an evolution of the Retinex and Color by Correlation method, predicts on initial illuminant point, and estimates scene-illumination between the point and sub-gamuts derived by from luminance levels. The method proposed can raise estimation probability by not only detecting motion of scene reflectance but also by finding valid scenes using different information from sequential scenes. This proposed method outperforms recently developed algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is well known that LEDs have problems with color consistency and color stability over time. Two perception
experiments were conducted in order to determine guidelines for the color and luminance deviations between LEDs that
are allowed. The first experiment determined the visibility threshold of hue, saturation, and luminance deviations of one
LED in an array of LEDs and the second experiment measured the visibility threshold of hue, saturation, and luminance
gratings for different spatial frequencies. The results of the first experiment show that people are most sensitive for color
deviations between LEDs when a white color is generated. The visibility threshold for white was 0.004 &Dgr;u'v' for a
deviation in the hue of the LED primaries, 0.007 &Dgr;u'v' for a deviation in the saturation of the LED primaries and 0.006 &Dgr;u'v' for a deviation in the luminance of the LED primaries. The second experiment showed that the visibility of hue
gratings is independent of spatial frequency in the range of 0.4 to 1.2 cycles/degree. However, for saturation and
luminance gratings there was a significant effect of spatial frequency on the visibility threshold. Both experiments show
that observers are more sensitive to hue than to saturation deviations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The whiteness level of a printing paper is considered as an important quality measure. High paper whiteness improves
the contrast to printed areas providing a more distinct appearance of printed text and colors and increases the number of
reproducible colors. Its influence on perceived color rendering quality is however not completely explained. The intuitive
interpretation of paper whiteness is a material with high light reflection for all wavelengths in the visual part of the color
spectrum. However, a slightly bluish shade is perceived as being whiter than a neutral white. Accordingly, papers with
high whiteness values incline toward bluish-white. In paper production, a high whiteness level is achieved by the use of
highly bleached pulp together with high light scattering filler pigment. To further increase whiteness levels expensive
additives such as Fluorescent Whitening Agents (FWA) and shading dyes are needed. During the last years, the CIE
whiteness level of some commercial available office paper has exceeded 170 CIE units, a level that can only be reached
by the addition of significant amounts of FWA. Although paper whiteness is considered as an important paper quality
criterion, its influence on printed color images is complicated. The dynamic mechanisms of the human visual system
strive to optimize the visual response to each particular viewing condition. One of these mechanisms is chromatic
adaptation, where colored objects get the same appearance under different light sources, i.e. a white paper appears white
under tungsten, fluorescent and day light. In the process of judging printed color images, paper whiteness will be part of
the chromatic adaptation. This implies that variations in paper whiteness would be discounted by the human visual
system. On the other hand, high paper whiteness improves the contrast as well as the color gamut, both important
parameters for the perceived color reproduction quality. In order to quantify the influence of paper whiteness pilot papers
with different amount of FWA but in all other respects similar were produced on a small scale experimental paper
machine. The fact that only the FWA content changes reduces the influences of other properties separated from the paper
whiteness in the evaluation process. A set of images, all having characteristics with the potential to reveal the influence
of the varied whiteness level on color reproduction quality, were printed on the pilot papers in two different printers.
Prior to printing the test images in the experiment, ICC-profiles were calculated for all the used printer-substrate
combinations. A visual assessment study of the printed samples was carried out in order to relate the influence of the
paper whiteness level to perceived color reproduction quality. The results show an improved color rendering quality with
increased CIE whiteness value up to a certain level. Any further increase in paper whiteness does not contribute to an
improved color reproduction quality. Furthermore, the fact that some printing inks are UV blocking while others are not
will introduce a non uniform color shift in the printed image when the FWA activation changes. This non uniform color
shift has been quantified both for variations in illuminant as well as variations of FWA content in the paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Real time imaging applications such as interactive rendering and video conferencing face particularly challenging bandwidth problems, especially as we attempt to improve resolution to perceptual limits. Compression has been an amazing enabler of video streaming and storage, but in interactive settings, it can introduce application-killing latencies. Rather than synthesizing or capturing a verbose representation and then immediately converting it into its succinct form, we should generate the concise representation directly. Our research is inspired by human vision, which as Hoffman (1998) notes, constructs "continuous lines and surfaces...from discrete information." Our adaptive frameless renderer uses gradient samples and steerable filters to perform spatiotemporally adaptive reconstruction that preserves both edges and occlusion boundaries. Resulting RMS qualities are equivalent to traditionally synthesized imagery with 10 times more samples. Nevertheless in dynamic scenes, producing pleasing edges with so few samples is challenging. We are currently developing methods for reconstructing imagery using color samples supplemented with sparse edge information. Such higher-order representations will be a crucial enabler of interactive, hyper-resolution image synthesis, capture and display.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Composite images are synthesized from existing photographs by artists who make concept art, e.g. storyboards
for movies or architectural planning. Current techniques allow an artist to fabricate such an image by digitally
splicing parts of stock photographs. While these images serve mainly to "quickly" convey how a scene should
look, their production is laborious. We propose a technique that allows a person to design a new photograph
with substantially less effort. This paper presents a method that generates a composite image when a user
types in nouns, such as "boat" and "sand." The artist can optionally design an intended image by specifying
other constraints. Our algorithm formulates the constraints as queries to search an automatically annotated
image database. The desired photograph, not a collage, is then synthesized using graph-cut optimization,
optionally allowing for further user interaction to edit or choose among alternative generated photos. Our
results demonstrate our contributions of (1) a method of creating specific images with minimal human effort,
and (2) a combined algorithm for automatically building an image library with semantic annotations from any
photo collection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Art conservators often explore X-ray images of paintings to help find pentimenti, the artist's revisions hidden
beneath the painting's visible first surfaces. X-ray interpretation is difficult due to artifacts in the image,
superimposed features from all paint layers, and because image intensity depends on both the paint layer thickness
and each pigment's opacity. We present a robust user-guided method to suppress clutter, find visually significant
differences between X-ray images and color photographs, and visualize them together. These tools allow domain
experts as well as museum visitors to explore the artist's creative decisions that led to a masterpiece.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
High Dynamic Range displays offer higher brightness, higher contrast, better color reproduction and lower power
consumption compared to conventional displays available today. In addition to these benefits, it is possible to leverage
the unique design of HDR displays to overcome many of the calibration and lifetime degradation problems of liquid
crystal displays, especially those using light emitting diodes. This paper describes a combination of sensor mechanisms
and algorithms that reduce luminance and color variation for both HDR and conventional displays even with the use of
highly variable light elements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We address the problem of re-rendering images to high dynamic range (HDR) displays, which were originally
tone-mapped to standard displays. As these new HDR displays have a much larger dynamic range than standard
displays, an image rendered to standard monitors is likely to look too bright when displayed on a HDR monitor.
Moreover, because of the operations performed during capture and rendering to standard displays, the specular
highlights are likely to have been clipped or compressed, which causes a loss of realism. We propose a tone
scale function to re-render images first tone-mapped to standard displays, that focuses on the representation
of specular highlights. The shape of the tone scale function depends on the segmentation of the input image
into its diffuse and specular components. In this article, we describe a method to perform this segmentation
automatically. Our method detects specular highlights by using two low-pass filters of different sizes combined
with morphological operators. The results show that our method successfully detects small and middle sized
specular highlights. The locations of specular highlights define a mask used for the construction of the tone scale
function. We then propose two ways of applying the tone scale, the global version that applies the same curve
to each pixel in the image and the local version that uses spatial information given by the mask to apply the
tone scale differently to diffuse and to specular pixels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The advances in high dynamic range (HDR) imaging, especially in the display and camera technology, have a significant
impact on the existing imaging systems. The assumptions of the traditional low-dynamic range imaging, designed for
paper print as a major output medium, are ill suited for the range of visual material that is shown on modern displays. For
example, the common assumption that the brightest color in an image is white can be hardly justified for high contrast
LCD displays, not to mention next generation HDR displays, that can easily create bright highlights and the impression
of self-luminous colors. We argue that high dynamic range representation can encode images regardless of the technology
used to create and display them, with the accuracy that is only constrained by the limitations of the human eye and
not a particular output medium. To facilitate the research on high dynamic range imaging, we have created a software
package (http://pfstools.sourceforge.net/) capable of handling HDR data on all stages of image and video processing. The
software package is available as open source under the General Public License and includes solutions for high quality
image acquisition from multiple exposures, a range of tone mapping algorithms and a visual difference predictor for HDR
images. Examples of shell scripts demonstrate how the software can be used for processing single images as well as video
sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
High Dynamic Range (HDR) images are superior to conventional images. However, veiling glare is a physical limit to
HDR image acquisition and display. We performed camera calibration experiments using a single test target with 40
luminance patches covering a luminance range of 18,619:1. Veiling glare is a scene-dependent physical limit of the
camera and the lens. Multiple exposures cannot accurately reconstruct scene luminances beyond the veiling glare limit.
Human observer experiments, using the same targets, showed that image-dependent intraocular scatter changes identical
display luminances into different retinal luminances. Vision's contrast mechanism further distorts any correlation of
scene luminance and appearance.
There must be reasons, other than accurate luminance, that explains the improvement in HDR images. The multiple
exposure technique significantly improves digital quantization. The improved quantization allows displays to present
better spatial information to humans. When human vision looks at high-dynamic range displays, it processes them using
spatial comparisons.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sequential methods for face recognition rely on the analysis of local facial features in a sequential manner,
typically with a raster scan. However, the distribution of discriminative information is not uniform over the facial
surface. For instance, the eyes and the mouth are more informative than the cheek. We propose an extension
to the sequential approach, where we take into account local feature saliency, and replace the raster scan with
a guided scan that mimicks the scanpath of the human eye. The selective attention mechanism that guides the
human eye operates by coarsely detecting salient locations, and directing more resources (the fovea) at interesting
or informative parts. We simulate this idea by employing a computationally cheap saliency scheme, based on
Gabor wavelet filters. Hidden Markov models are used for classification, and the observations, i.e. features
obtained with the simulation of the scanpath, are modeled with Gaussian distributions at each state of the
model. We show that by visiting important locations first, our method is able to reach high accuracy with much
shorter feature sequences. We compare several features in observation sequences, among which DCT coefficients
result in the highest accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As human eyes scan an image, each fixation captures high resolution visual information from a small region of that
image. The resulting intermittent visual stream is sent along two visual pathways to the visual centers of the brain
concurrently with eye movement information. The ventral stream (the what pathway) is associated with object
recognition, while the dorsal stream (the where pathway) is associated with spatial perception. This research employs
three experiments to compare the relative importance of eye movement information within these two visual pathways.
During Experiment 1 participants visually examine (a) outdoor scenery images, and (b) object images, while their
fixation sequences are captured. These fixation sequences are then used to generate sequences of foveated of images, in
the form of videos. In Experiments 2 and 3 these videos are viewed by another set of participants. In doing so,
participants in Experiment 2 and 3 experience the same sequence of foveal stimuli as those in Experiment 1, but might or
might not experience the corresponding eye movement signals. The subsequent ability of the Experiment 2 and 3
participants to (a) recognize objects, and (b) locate landmarks in outdoor scenes provides information about the
importance of eye movement information in dorsal and ventral processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Earlier studies have shown that while free-viewing images people tend to gaze at regions with a high local density of bottom up features such as contrast and edge density. In particular, this tendency seems to be more emphasized during the first few fixations after image onset. In this paper, we present a new method to investigate how gaze locations are chosen by introducing varying image resolution, and measure how it affects eye-movement behavior during free viewing. Results show that gaze density overall is shifted toward regions presented in high resolution over those degraded in resolution. However, certain image regions seem to attract early fixations regardless of display resolution. These results suggest that top-down control of gaze guidance may be the dominant factor early in visual processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new implementation of the human attention map is introduced. Most conventional approaches
share characteristics such as the pooling rule is fixed and prior knowledge of camera aim is discarded. Unlike
previous research, the proposed method allows more freedom at the feature integration stage since human eyes
have a different sensitivity for each feature under different video aiming scenarios. An intelligent mechanism
is designed to identify the importance of each feature for each type of camera motion and skin tone feature,
and the feature integration is adaptive to different content. With this framework, more important features are
emphasized and less important features are suppressed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the last few decades several techniques for image content extraction, often based on segmentation, have been
proposed. It has been suggested that under the assumption of very general image content, segmentation becomes
unstable and classification becomes unreliable. According to recent psychological theories, certain image regions
attract the attention of human observers more than others and, generally, the image main meaning appears
concentrated in those regions. Initially, regions attracting our attention are perceived as a whole and hypotheses
on their content are formulated; successively the components of those regions are carefully analyzed and a more
precise interpretation is reached. It is interesting to observe that an image decomposition process performed
according to these psychological visual attention theories might present advantages with respect to a traditional
segmentation approach. In this paper we propose an automatic procedure generating image decomposition based
on the detection of visual attention regions. A new clustering algorithm taking advantage of the Delaunay-
Voronoi diagrams for achieving the decomposition target is proposed. By applying that algorithm recursively,
starting from the whole image, a transformation of the image into a tree of related meaningful regions is obtained
(Attention Tree). Successively, a semantic interpretation of the leaf nodes is carried out by using a structure of
Neural Networks (Neural Tree) assisted by a knowledge base (Ontology Net). Starting from leaf nodes, paths
toward the root node across the Attention Tree are attempted. The task of the path consists in relating the
semantics of each child-parent node pair and, consequently, in merging the corresponding image regions. The
relationship detected in this way between two tree nodes generates, as a result, the extension of the interpreted
image area through each step of the path. The construction of several Attention Trees has been performed and
partial results will be shown.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual attention models mimic the ability of a visual system, to detect potentially relevant parts of a scene. This
process of attentional selection is a prerequisite for higher level tasks such as object recognition. Given the high
relevance of temporal aspects in human visual attention, dynamic information as well as static information must
be considered in computer models of visual attention. While some models have been proposed for extending to
motion the classical static model, a comparison of the performances of models integrating motion in different
manners is still not available. In this article, we present a comparative study of various visual attention models
combining both static and dynamic features. The considered models are compared by measuring their respective
performance with respect to the eye movement patterns of human subjects. Simple synthetic video sequences,
containing static and moving objects, are used to assess the model suitability. Qualitative and quantitative
results provide a ranking of the different models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Typical studies of the visual motion of specularities have been concerned with how to discriminate the motion of
specularities from the motion of surface markings, and how to estimate the underlying surface shape. Here we
take a different approach and ask whether a field of specularities gives rise to motion parallax that is similar to
that of the underlying surface. The idea is that the caustics that are defined by specularities exist both in front of
and behind the underlying surface and hence define a range of depths relative to the observer. We asked whether
this range of depths leads to motion parallax. Our experiments are based on image sequences generated using
computer graphics and Phong shading. Using low relief undulating surfaces and assuming a laterally moving
observer, we compare the specular and diffuse components of the resulting image sequences. In particular, we
compare the image power spectra. We find that as long as the undulations are sufficiently large, the range of
speeds that are indicated in the power spectra of the diffuse and specular components will be similar to each
other. This suggests that specularities could provide reliable motion parallax information to a moving observer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Near regular textures feature a relatively high degree of regularity. They can be conveniently modeled by the
combination of a suitable set of textons and a placement rule. The main issues in this respect are the selection of
the minimum set of textons bringing the variability of the basic patterns; the identification and positioning of the
generating lattice; and the modelization of the variability in both the texton structure and the deviation from
periodicity of the lattice capturing the naturalness of the considered texture. In this contribution, we provide
a fully automatic solution to both the analysis and the synthesis issues leading to the generation of textures
samples that are perceptually indistinguishable from the original ones. The definition of an ad-hoc periodicity
index allows to predict the suitability of the model for a given texture. The model is validated through psychovisual
experiments providing the conditions for subjective equivalence among the original and synthetic textures,
while allowing to determine the minimum number of textons to be used to meet such a requirement for a given
texture class. This is of prime importance in model-based coding applications, as is the one we foresee, as it
allows to minimize the amount of information to be transmitted to the receiver.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A machine vision system needing to remain vigilant within its environment must be able to quickly perceive both
clearly identifiable objects as well as those that are deceptive or camouflaged (attempting to blend into the background).
Humans accomplish this task early in the visual pathways, using five spatially defined forms of processing. These
forms are Luminance-defined, Color-defined, Texture-defined, Motion-defined, and Disparity-defined. This paper
discusses a visual sensor approach that combines a biological system's strategy to break down camouflage with simple
image processing algorithms that may be implemented for real time video. Thermal imaging is added to increase
sensing capability. Preliminary filters using MATLAB and operating on digital still images show somewhat
encouraging results. Current efforts include implementing the sensor for real-time video processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We use an information-theoretic distortion measure called the Normalized Compression Distance (NCD), first
proposed by M. Li et al., to determine whether two rectangular gray-scale images are visually distinguishable to
a human observer. Image distinguishability is a fundamental constraint on operations carried out by all players
in an image watermarking system.
The NCD between two binary strings is defined in terms of compressed sizes of the two strings and of their
concatenation; it is designed to be an effective approximation of the noncomputable but universal Kolmogorov
distance between two strings. We compare the effectiveness of different types of compression algorithms? in
predicting image distinguishability when they are used to compute the NCD between a sample of images and
their watermarked counterparts. Our experiment shows that, as predicted by Li's theory, the NCD is largely
independent of the underlying compression algorithm.
However, in some cases the NCD fails as a predictor of image distinguishability, since it is designed to measure
the more general notion of similarity. We propose and study a modified version of the NCD to model the latter,
which requires that not only the change be small but also in some sense random with respect to the original
image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To reveal the cortical network underlying figure/ground perception and to understand its neural dynamics, we
developed a novel paradigm that creates distinct and prolonged percepts of spatial structures by instantaneous refreshes
in random dot fields. Three different forms of spatial configuration were generated by: (i) updating the whole stimulus
field, (ii) updating the ground region only (negative-figure), and (iii) updating the figure and ground regions in brief
temporal asynchrony. FMRI responses were measured throughout the brain. As expected, activation by the
homogenous whole-field update was focused onto the posterior part of the brain, but distinct networks extending
beyond the occipital lobe into the parietal and frontal cortex were activated by the figure/ground and by the negativefigure
configurations. The instantaneous stimulus paradigm generated a wide variety of BOLD waveforms and
corresponding neural response estimates throughout the network. Such expressly different responses evoked by
differential stimulation of the identical cortical regions assure that the differences could be securely attributed to the
neural dynamics, not to spatial variations in the HRF. The activation pattern for figure/ground implies a widely
distributed neural architecture, distinct from the control conditions. Even where activations are partially overlapping, an
integrated analysis of the BOLD response properties will enable the functional specificity of the cortical areas to be
distinguished.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image resolution is one of the important factors for visual realness. We performed subjective assessments to examine
the realness of images at six different resolutions, ranging from 19.5 cpd (cycles per degree) to 156 cpd. A paired-comparison
procedure was used to quantify the realness of six images versus each other or versus the real object. Three
objects were used. Both real objects and images were viewed through a synopter, which removed horizontal disparity
and presented the same image to both eyes. Sixty-five observers were asked to choose the viewed image which was
closer to the real object and appeared to be there naturally for each pair of stimuli selected from the group of six images
and the real object. It was undisclosed to the observers that real objects were included in the stimuli. The paired
comparison data were analyzed using the Bradley-Terry model. The results indicated that realness of an image increased
as the image resolution increased up to about 40-50 cpd, which corresponded to the discrimination threshold calculated
based on the observers' visual acuity, and reached a plateau above this threshold.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We investigate the hypothesis that the basic representation of space which underlies human navigation does not
resemble an image-like map and is not restricted by the laws of Euclidean geometry. For this we developed a
new experimental technique in which we use the properties of a virtual environment (VE) to directly influence
the development of the representation. We compared the navigation performance of human observers under two
conditions. Either the VE is consistent with the geometrical properties of physical space and could hence be
represented in a map-like fashion, or it contains severe violations of Euclidean metric and planar topology, and
would thus pose difficulties for the correct development of such a representation. Performance is not influenced
by this difference, suggesting that a map-like representation is not the major basis of human navigation. Rather,
the results are consistent with a representation which is similar to a non-planar graph augmented with path
length information, or with a sensorimotor representation which combines sensory properties and motor actions.
The latter may be seen as part of a revised view of perceptual processes due to recent results in psychology and
neurobiology, which indicate that the traditional strict separation of sensory and motor systems is no longer
tenable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
By adding an additional dimension to the traditional two dimensional art we make, we are able to expand our visual
experience, what we see, and thus what we might become. This visual expansion changes or adds to the patterns that
produce our thoughts and behavior. As 2D artists see and create in a more three dimensional space, their work may
generate within the viewer a deeper understanding of the thought processes in themselves and others.
This can be achieved by creating images in three dimensional. The work aligns more closely with natural physiology,
that is, it is seen with both eyes. Traditionally, color and rules of perspective trick the viewer into thinking in three
dimensions. By adding the stereoscopic element, an object is experienced in a naturally 3D space with the use of two
eyes. Further visual expansion is achieved with the use of ChromaDepth glasses to actually see the work in 3D as it is
being created. This cannot be done with other 3D methods that require two images or special programming to work.
Hence, the spontaneous creation of an image within a 3D space becomes a new reality for the artist. By working in a
truly three dimensional space that depends on two eyes to experience, an artist gains a new perspective on color,
transparency, overlapping, focus, etc. that allows him/her new ways of working and thus seeing: a new form of
expression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed a CMOS image sensor with a novel color filter array(CFA) where one of the green pixels of the
Bayer pattern was replaced with a white pixel. A transparent layer has been fabricated on the white pixel instead of a
color filter to realize over 95% transmission for visible light with wavelengths of 400-700 nm. Pixel pitch of the device
was 3.3 um and the number of pixels was 2 million (1600H x 1200V).
The novel Bayer-like WRGB (White-Red-Green-Blue) CFA realized higher signal-to-noise ratios of interpolated R, G,
and B values in low illumination (3lux) by 6dB, 1dB, and 6dB, respectively, compared with those of the Bayer pattern,
with the low-noise pre-digital signal process. Furthermore, there was no degradation of either resolution or color
representation for the interpolated image.
This new CFA has a great potential to significantly increase the sensitivity of CMOS/CCD image sensors with digital
signal processing technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deaf and hearing-impaired people capture information in video through visual content and captions. Those activities
require different visual attention strategies and up to now, little is known on how caption readers balance these two
visual attention demands. Understanding these strategies could suggest more efficient ways of producing captions. Eye
tracking and attention overload detections are used to study these strategies. Eye tracking is monitored using a pupilcenter-
corneal-reflection apparatus. Afterward, gaze fixation is analyzed for each region of interest such as caption area,
high motion areas and faces location. This data is also used to identify the scanpaths. The collected data is used to
establish specifications for caption adaptation approach based on the location of visual action and presence of character
faces. This approach is implemented in a computer-assisted captioning software which uses a face detector and a motion
detection algorithm based on the Lukas-Kanade optical flow algorithm. The different scanpaths obtained among the
subjects provide us with alternatives for conflicting caption positioning. This implementation is now undergoing a user
evaluation with hearing impaired participants to validate the efficiency of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The complexity of a polygonal mesh model is usually reduced by applying a simplification method, resulting in
a similar mesh having less vertices and faces. Although several such methods have been developed, only a few
observer studies are reported comparing them regarding the perceived quality of the obtained simplified meshes,
and it is not yet clear how the choice of a given method, and the level of simplification achieved, influence the
quality of the resulting model, as perceived by the final users. Mesh quality indices are the obvious less costly
alternative to user studies, but it is also not clear how they relate to perceived quality, and which indices best
describe the users behavior.
Following on earlier work carried out by the authors, but only for mesh models of the lungs, a comparison
among the results of three simplification methods was performed through (1) quality indices and (2) a controlled
experiment involving 65 observers, for a set of five reference mesh models of different kinds. These were simplified
using two methods provided by the OpenMesh library - one using error quadrics, the other additionally using
a normal flipping criterion - and also by the widely used QSlim method, for two simplification levels: 50% and
20% of the original number of faces. The main goal was to ascertain whether the findings previously obtained
for lung models, through quality indices and a study with 32 observers, could be generalized to other types of
models and confirmed for a larger number of observers. Data obtained using the quality indices and the results
of the controlled experiment were compared and do confirm that some quality indices (e.g., geometric distance
and normal deviation, as well as a new proposed weighted index) can be used, in specific circumstances, as
reasonable estimators of the user perceived quality of mesh models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a supra-threshold spatio-velocity CSF experiment is described. It consists in a contrast matching
task with a methods of limits procedure. Results enable the determination of contrast perception functions which
give, for given spatial and temporal frequencies, the perceived contrast of a moving stimulus.
These contrast perception functions are then used to construct supra-threshold spatio-velocity CSF. As for
supra-threshold CSF in spatial domain, it can be observed that CSF shape changes from band-pass behaviour
at threshold to low-pass behaviour at supra-threshold, along spatial frequencies. However, supra-threshold CSFs
have a band-pass behaviour along temporal frequency has threshold one. This means that if spatial variations
can be neglected above the visibility threshold, temporal ones are still of primary importance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Extracting key frames (KF) from video is of great interest in many applications, such as video summary, video
organization, video compression, and prints from video. KF extraction is not a new problem. However, current literature
has been focused mainly on sports or news video. In the consumer video space, the biggest challenges for key frame
selection from consumer videos are the unconstrained content and lack of any preimposed structure. In this study, we
conduct ground truth collection of key frames from video clips taken by digital cameras (as opposed to camcorders)
using both first- and third-party judges. The goals of this study are: (1) to create a reference database of video clips
reasonably representative of the consumer video space; (2) to identify associated key frames by which automated
algorithms can be compared and judged for effectiveness; and (3) to uncover the criteria used by both first- and thirdparty
human judges so these criteria can influence algorithm design. The findings from these ground truths will be
discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern algorithms that process images to be viewed by humans analyze the images strictly as signals, where
processing is typically limited to the pixel and frequency domains. The continuum of visual processing by the
human visual system (HVS) from signal analysis to cognition indicates that the signal-processing based model of
the HVS could be extended to include some higher-level, structural processing. An experiment was conducted to
study the relative importance of higher-level, structural representations and lower-level, signal-based representations
of natural images in a cognitive task. Structural representations preserve the overall image organization
necessary to recognize the image content and discard the finer details of objects such at textures. Signal-based
representations (i.e. digital photographs) decompose an image in terms of its frequency, orientation, and contrast.
Participants viewed sequences of images from either structural or signal-based representations, where subsequent
images in the sequence reveal additional detail or visual information from the source image. When the content
was recognizable, participants were instructed to provide a description of that image in the sequence. The
descriptions were subjectively evaluated to identify a participant's recognition threshold for a particular image
representation. The results from this experiment suggest that signal-based representations possess meaning to
human observers when the proportion of high frequency content, which conveys shape information, exceeds a
seemingly fixed proportion. Additional comparisons among the representations chosen for this experiment provide
insight toward quantifying their significance in cognition and developing a rudimentary measure of visual
entropy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Scalable Video Coding (SVC) is one of the promising techniques to ensure Quality of Service (QoS) in multimedia
communication through heterogeneous networks. SVC compresses a raw video into multiple bitstreams composed of a
base bitstream and enhancement bitstreams to support multi scalabilities such as SNR, temporal and spatial. Therefore, it
is able to extract an appropriate bitstream from original coded bitstream without re-encoding to adapt a video to user
environment. In this flexible environment, QoS has appeared as an important issue for service acceptability. Therefore,
there has been a need for measuring a degree of video quality to guarantee the quality of video streaming service.
Existing studies on the video quality metric have mainly focused on temporal and SNR scalability.
In this paper, we propose an efficient quality metric, which allows for spatial scalability as well as temporal and SNR
scalability. To this end, we study the effect of frame rate, SNR, spatial scalability and motion characteristics by using the
subjective quality assessment, and then a new video quality metric supporting full scalability is proposed. Experimental
results show that this quality metric has high correlation with subjective quality. Because the proposed metric is able to
measure a degree of video quality according to the variation of scalability, it will play an important role at the extraction
point for determining the quality of SVC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Much research has been focused on the study of bottom-up, feature-based visual perception, as a means to generate
salience maps, and predict the distribution of fixations within images. However, it is plausible that the eventual
perception of distinct objects within a 3D scene (and the subsequent top-down effects) would also have a significant
effect on the distributions of fixations within that scene. This research is aimed at testing a hypothesis that there exists a
switching from feature-based to object-based scanning of images, as the viewer gains a higher-level understanding of
the image content, and that this switching can be detected by changes in the pattern of eye fixations within the image.
An eye tracker is used to monitor the fixations of human participants over time, as they view images, in an effort to
answer questions pertaining to (1) the nature of fixations during bottom-up and top-down scene scan scenarios (2) the
ability of assessing whether the subject is perceiving the scene content based on low-level visual features or distinct
objects, and (3) identification of the participant's transition from a bottom-up feature-based perception to a top-down
object-based perception.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Color has been shown to be an important clue for object recognition and image indexing. We present a new
algorithm for color-based recognition of objects in cluttered scenes that also determines the 2D pose of each
object. As with so many other color-based object recognition algorithms, color histograms are also fundamental
to our new approach; however, we use histograms obtained from overlapping subwindows rather than the entire
image. An object from a database of prototypes is identified and located in an input image whenever there
are many good histogram matches between the respective subwindow histograms of the input image and the
image prototype from the database. In essence, local color histograms are the features to be matched. Once an
object's position in the image has been determined, its 2D pose is determined by approximating the geometrical
transformation most consistently mapping the locations of the prototype's subwindows to their matching
locations in the input image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We used adaptation to examine the relationship between perceptual norms--the stimuli observers describe as psychologically neutral, and response norms--the stimulus levels that leave visual sensitivity in a neutral or balanced state. Adapting to stimuli on opposite sides of a neutral point (e.g. redder or greener than white) biases appearance in opposite ways. Thus the adapting stimulus can be titrated to find the unique adapting level that does not bias appearance. We compared these response norms to subjectively defined neutral points both within the same observer (at different retinal eccentricities) and between observers. These comparisons were made for visual judgments of color, image focus, and human faces, stimuli that are very different and may depend on very different levels of processing, yet which share the property that for each there is a well defined and perceptually salient norm. In each case the adaptation aftereffects were consistent with an underlying sensitivity basis for the perceptual norm. Specifically, response norms were similar to and thus covaried with the perceptual norm, and under common adaptation differences between subjectively defined norms were reduced. These results are consistent with models of norm-based codes and suggest that these codes underlie an important link between visual coding and visual experience.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.