Image inpainting attempts to fill the missing areas of an image with plausible content that is visually coherent with the image context. Semantic image inpainting has remained a challenging task even with the emergence of deep learning-based approaches. We propose a deep semantic inpainting model built upon a generative adversarial network and a dense U-Net network. Such a design helps achieve feature reuse while avoiding feature explosion along the upsampling path of the U-Net. The model also uses a composite loss function for the generator network to enforce a joint global and local content consistency constraint. More specifically, our new loss function combines the global reconstruction loss characterizing the semantic similarity between the missing and known image regions with the local total variation loss characterizing the natural transitions among adjacent regions. Experimental results on CelebA-HQ and Paris StreetView datasets have demonstrated encouraging performance when compared with other state-of-the-art methods in terms of both quantitative and qualitative metrics. For the CelebA-HQ dataset, the proposed method can more faithfully infer the semantics of human faces; for the StreetView dataset, our method achieves improved inpainting results in terms of more natural texture transitions, better structural consistency, and enriched textural details.
Matching facial images acquired in different electromagnetic spectral bands remains a challenge. An example of this type of comparison is matching active or passive infrared (IR) against a gallery of visible face images. When combined with cross-distance, this problem becomes even more challenging due to deteriorated quality of the IR data. As an example, we consider a scenario where visible light images are acquired at a short standoff distance while IR images are long range data. To address the difference in image quality due to atmospheric and camera effects, typical degrading factors observed in long range data, we propose two approaches that allow to coordinate image quality of visible and IR face images. The first approach involves Gaussian-based smoothing functions applied to images acquired at a short distance (visible light images in the case we analyze). The second approach involves denoising and enhancement applied to low quality IR face images. A quality measure tool called Adaptive Sharpness Measure is utilized as guidance for the quality parity process, which is an improvement of the famous Tenengrad method. For recognition algorithm, a composite operator combining Gabor filters, Local Binary Patterns (LBP), generalized LBP and Weber Local Descriptor (WLD) is used. The composite operator encodes both magnitude and phase responses of the Gabor filters. The combining of LBP and WLD utilizes both the orientation and intensity information of edges. Different IR bands, short-wave infrared (SWIR) and near-infrared (NIR), and different long standoff distances are considered. The experimental results show that in all cases the proposed technique of image quality parity (both approaches) benefits the final recognition performance.
KEYWORDS: Code division multiplexing, Color difference, Statistical analysis, Error analysis, Visualization, Digital filtering, Colorimetry, Color reproduction, Optical filters, Algorithm development
Single sensor digital color cameras capture only one of the three primary colors at each pixel and a process called color demosaicking (CDM) is used to reconstruct the full color images. Most CDM algorithms assume the existence of high local spectral redundancy in estimating the missing color samples. However, for images with sharp color transitions and high color saturation, such an assumption may be invalid and visually unpleasant CDM errors will occur. In this paper, we exploit the image nonlocal redundancy to improve the local color reproduction result. First, multiple local directional estimates of a missing color sample are computed and fused according to local gradients. Then, nonlocal pixels similar to the estimated pixel are searched to enhance the local estimate. An adaptive thresholding method rather than the commonly used nonlocal means filtering is proposed to improve the local estimate. This allows the final reconstruction to be performed at the structural level as opposed to the pixel level. Experimental results demonstrate that the proposed local directional interpolation and nonlocal adaptive thresholding method outperforms many state-of-the-art CDM methods in reconstructing the edges and reducing color interpolation artifacts, leading to higher visual quality of reproduced color images.
Conventional wisdom in signal processing heavily relies on the concept of inner product defined in the Hilbert
space. Despite the popularity of Hilbert-space formulation, we argue it is overly-structured to account for the
complexity of signals arising from the real-world. Inspired by the works on fractal image decoding and nonlocal
image processing, we propose to view an image as the fixed-point of some nonexpansive mapping in the metric
space in this paper. Recently proposed BM3D-based denoising and nonlocal TV filtering can be viewed as
the special cases of nonexpansive mappings while differ on the choice of clustering techniques. The physical
interpretation of clustering-based nonexpansive mappings is that they convey organizational principles of the
dynamical system underlying the signals of interest. There is an interesting analogy between phases of matters
in statistical physics and types of structures in image processing. From this perspective, image reconstruction can
be solved by a deterministic-annealing based global optimization approach which collectively exploits the a priori
information about unknown image. The potential of this new paradigm, which we call "ollective sensing" is
demonstrated on the lossy compression application where significant gain over current state-of-the-art (SPIHT)
coding scheme has been achieved.
KEYWORDS: Data modeling, Digital filtering, Optical filters, Atrial fibrillation, Image fusion, Digital cameras, Cameras, Image processing, Linear filtering, Sensors
Image demosaicing is a problem of interpolating full-resolution color images from so-called color-filter-array
(CFA) samples. Among various CFA patterns, Bayer pattern has been the most popular choice and demosaicing
of Bayer pattern has attracted renewed interest in recent years partially due to the increased availability of source
codes/executables in response to the principle of "reproducible research". In this article, we provide a systematic
survey of over seventy published works in this field since 1999 (complementary to previous reviews22, 67).
Our review attempts to address important issues to demosaicing and identify fundamental differences among
competing approaches. Our findings suggest most existing works belong to the class of sequential demosaicing
- i.e., luminance channel is interpolated first and then chrominance channels are reconstructed based on recovered
luminance information. We report our comparative study results with a collection of eleven competing
algorithms whose source codes or executables are provided by the authors. Our comparison is performed on
two data sets: Kodak PhotoCD (popular choice) and IMAX high-quality images (more challenging). While
most existing demosaicing algorithms achieve good performance on the Kodak data set, their performance on
the IMAX one (images with varying-hue and high-saturation edges) degrades significantly. Such observation
suggests the importance of properly addressing the issue of mismatch between assumed model and observation
data in demosaicing, which calls for further investigation on issues such as model validation, test data selection
and performance evaluation.
Source classification has been widely studied in conventional coding of image and video signals. This paper
explores the idea of exploiting so-called classification gain in Wyner-Ziv (WZ) video coding. We first provide
theoretical analysis of how source classification can lead to improved Rate-Distortion tradeoff in WZ coding and
quantify the classification gain by the ratio of weighted arithmetic mean to weighted geometric mean over subsources.
Then we present a practical WZ video coding algorithm based on the source classification principle. The
statistics of both spatial and temporal correlation are taken into account in our classification strategy. Specifically, the subsource with the steepest R-D slope is identified to be the class of significant wavelet coefficients
of the blocks that are poorly motion-compensated in WZ frames. In such classification-based approach, rate
control is performed at the decoder which can be viewed as the dual to conventional video coding where R-D optimization
stays with the encoder. By combining powerful LDPC codes (for generating coded information) with
advanced temporal interpolation (for generating side information), we have observed that the new Wyner-Ziv
coder achieves highly encouraging performance for the test sequences used in our experiments. For example, the
gap between H264 JM11.0 (I-B-I-B...) and the proposed WZ video coder is dramatically reduced for foreman
and hall QCIF sequences when compared with the best reported results in the literature.
Texts represent an important class of information in our daily lives. This paper studies the problem of super-resolution (SR) of texts, namely reconstructing high-resolution texts from low-resolution video captured by handheld cameras. Such type of video is called nonideal due to uncontrolled imaging condition, unknown point spread function and inevitable distortion caused by compression algorithms. Motivated by the different consideration in SR from mosaicing, we investigate the error accumulation in homography-based registration of multi-view images. We advocate the nonuniform interpolation approach towards SR that can achieve resolution scalability at a low computational cost and study the issues of phase consistency and uncertainty that are difficult to be addressed under the conventional framework of treating SR as an inverse problem. We also present a nonlinear diffusion aided blind deconvolution technique for simultaneous suppression of compression artifacts and enhancement of textual information. The performance of the proposed SR-of-texts technique is demonstrated by extensive experiments with challenging real-world sequences.
KEYWORDS: Video, Motion models, Error analysis, Motion estimation, Signal processing, Cameras, Video coding, 3D image processing, Video processing, Digital filtering
Motion plays a fundamental role in coding and processing of video
signals. Existing approaches of modeling video source are mostly
based on explicitly estimating motion information from intensity
values. Despite its conceptual simplicity, motion estimation (ME)
is a long-standing open problem itself and accordingly the
performance of a system operated on inaccurate motion information
is unlikely to be optimal. In this paper, we present a novel
approach of modeling video signals without explicit ME. Instead,
motivated by a duality between edge contour of image and motion
trajectory of video, we demonstrate that the spatio-temporal
redundancy in video can be exploited by Least-Square(LS) based
adaptive filtering techniques. We consider the application of such
implicit motion models into the problem of error concealment or
more generally known as video inpainting. Our experimental results
have shown the excellent performance of the proposed LS-based
error concealment techniques under a variety of information loss
conditions.
Iris and face biometric systems are under intense study as a multimodal pair due in part to the ability to acquire both with the same capture system. While several successful research efforts have considered facial imagesas part of an iris-face multimodal biometric system, there is little work in the area exploring the iris recognition problem under different poses of the subjects. This is due to the fact that most commercial iris recognition systems depend on the high performance algorithm patented by Daugman, which does not take into consideration the pose and illumination variations in iris acquisition. Hence there is an impending need for sophisticated iris detection systems that localize the iris region for different poses and different facial views.
In this paper we present a non-frontal/non-ideal iris acquisition technique where iris images are extracted out of regular visual video sequences. This video sequence is captured 3 feet around the subject in a 90-degree arc from the profile view to the frontal view. We present a novel design for an iris detection filter that detects the location of the iris, the pupil and the sclera using a Laplacian of Gaussian ellipse detection technique. Experimental results show that the proposed approach can localize the iris location in facial images for a wide range of pose variations including semi-frontal views.
Video coding still remains a largely open problem after decades of
research. In this paper, we present some new ideas of video coding
from a geometric perspective. We focus on developing an improved
modeling of video source by studying its geometric constraints. It
is suggested that understanding the relationship between
location uncertainty models of motion discontinuity and image
singularity is critical to the efficiency of video coding. We
argue that linearity is the root of problem in conventional motion
compensated predictive coding and propose a classification-based
nonlinear coding framework in both spatial and wavelet domains.
Nonlinear processing tools are advocated for resolving
location-related uncertainty during the exploitation of geometric
constraints.
How to develop a multi-resolution representation of video signals for both efficient and scalable coding? What are fundamental advantages and limitations of a resolution scalable video coder? How to make use of the scalability features of a video codec to compromise other requirements (e.g. delay)? These are the central issues we address in this paper. We first demonstrate the importance of resolving the phase uncertainty on the efficiency of motion-compensated prediction (MCP) in the wavelet domain. Improved understanding of the relationship between phase associated with any wavelet transform (WT) and motion accuracy of MCP motivates us to develop a novel multi-resolution representation for video signals. The salient feature of our new representation is that MCP can be performed both effectively and independently at different resolutions. We apply previous theoretical results on fractional-pel MCP to analyze the sacrifice of coding efficiency due to the resolution scalability constraint. We also investigate the issue of delay in video coding and propose a framework of trading delay with spatial resolution. A video decoder can display a low-resolution frame with low delay first and then gracefully enhance the frame resolution as the delay increases. The low-delay resolution-scalable MCP-WT coder built upon our new wavelet-based multi-resolution representation of video signals has achieved significant Rate-Distortion improvements over previously-reported scalable coders in the literature and even outperforms non-scalable MPEG-2 coder by 1-2dB at the bit rate of 2-9Mbps.
This paper introduces a new framework of sequential error concealment techniques for block-based image coding systems. Unlike previous approaches which simultaneously recover the pixels inside the missing block, we propose to recover them in a sequential fashion. The structure of sequential recovery enhances the capability of handling complex texture patterns in the image and serious block loss situations during the transmission. Under the framework of sequential recovery, we present a novel spatially adaptive scheme to interpolate the missing pixels along the edge orientation. We also study the problem of how to fully exploit the information from the available surrounding neighbors with the sequential constraint. Experiment results have shown that novel sequential recovery techniques are superior to most existing parallel recovery techniques in terms of both subjective and objective quality of reconstructed images.
Recent developments on the implementation of integer-to- integer transform provide a new basis for transform-based lossless coding. Although it shares many features with popular transform-based lossy coding, there are also a few discrepancies between them because of different coding rules. In this paper we discuss several important discrepancies, including the evaluation of decorrelating performance, the implementation of transform and the criteria of choosing transform. We target at a better understanding of applying linear transforms in lossless coding scenario.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.