Compared to ordinary light field image (LFI), multi-exposure fusion light field image (MEF-LFI) can record more visual information and details of scene. However, MEF-LFI also produces distortions while enhancing LFI, leading to quality degradation. Therefore, it is crucial to develop effective MEF-LFI quality assessment models. This paper proposes a multi-exposure fusion light field image quality assessment method with motion region detection, which considers that the artifact distortion of MEF-LFI synthesized from dynamic scenes usually occurs in motion regions. A motion region detection module is designed for detecting artifact distortion in MEF-LFI. Considering that tone mapping (TM) operations can cause texture distortion in MEF-LFI, the spectral texture distortion feature extraction module and the spatial domain gradient feature extraction module are designed by combining Curvelet transform and Scharr operator, respectively. Due to the distortion of color shift in MEF-LFI, the color feature extraction module is constructed with the characteristics of HSI color model. In addition, considering the unique angular distortion of MEF-LFI, the angular feature extraction module is designed with Log-Gabor operator. Finally, the extracted feature vector is input into the support vector regression model to predicate the quality for MEF-LFI. The experimental results show that the proposed method is superior to the representative quality assessment methods and has better consistency with the human visual perception.
With the wide application of Colored Point Clouds(CPC), the amount of data is increasing, and efficient compression is required in practical applications. The most advanced compression technology for static CPC is the G-PCC proposed by MPEG. However, quantification errors from G-PCC-1 (Octree) can cause grid hole distortion, which can have a serious impact on the quality of user's visual perception. For this reason, this paper proposes a Point Cloud Projection based light-to-medium G-PCC-1 Hole Distortion Repair method (denoted as P-GHDR) for CPC. The distorted CPC is projected from 3D space to 2D plane, and the G-PCC-1 distorted is repaired by combining multi-view color and geometric projection maps. Finally, the repaired CPC is reconstructed by reverse projection. Experiments show that the proposed method can effectively improve the geometric and visual objective metrics of G-PCC-1 coded CPC, and can significantly improve the quality of CPC reconstructed with light-to-medium G-PCC-1 codes.
Compared with traditional 2D imaging, omnidirectional imaging techniques can provide users with 360°×180° immersive visual experience, which also make the objective quality assessment of omnidirectional images more challenging. In this work, a spherical triangle mesh representation and multi-channel residual graph convolution network (denoted as Multi-RES-GCN) based blind omnidirectional image quality assessment (IQA) is proposed. The proposed method includes two important stages: omnidirectional image’s spherical triangle mesh generation and optimization, and quality predictor based on Multi-RES-GCN. In the first stage, the spherical representation of omnidirectional image is used (called as spherical image), a new scheme of spherical triangle mesh generation and optimization is proposed, which can reasonably sample pixels on the spherical image and optimize the sampled points to generate more accurate triangular meshes. In the second stage, the spherical image is divided into six view regions, and the triangle mesh nodes are classified into the view regions according to their positions, and then input to the quality predictor. The quality predictor is composed of Multi-Res-GCN and Estimator. Multi-Res-GCN can model nodes and the dependency relationship between nodes. Estimator is designed to regress the features extracted by Multi-Res-GCN to the weights and quality score of each view region, and the final quality score of omnidirectional image is predicated by calculating the weighted summation of these quality scores. Experimental results demonstrate that the proposed method outperforms other state-of-the-art IQA metrics on two omnidirectional IQA databases.
Color point clouds can provide users more realistic visual information and better immersive experience than traditional imaging techniques. How to evaluate the visual quality of color point clouds accurately is an important issue to be solved urgently. In this work, we propose a novel full reference metric, called as Visual Quality Assessment of Color Point Clouds (VQA-CPC). Starting from the geometry and texture of color point cloud, the proposed metric calculates the distances from color point cloud’s points to their geometric centroid and the distances from the texture coordinates of the points to texture centroid. Then, a measuring distortion strategy based on distortion measurement is designed and used to extract the features of color point cloud. Finally, the extracted geometric features and texture features are used to construct the feature vector and predict quality of the distorted color point cloud. Moreover, we construct a color point cloud database, called as NBU-PCD1.0, for verifying the effectiveness of the proposed metric. Experimental results show that the proposed VQA-CPC metric is better than the existing point cloud metrics.
Object detection for blind zones is critical to ensuring the driving safety of heavy trucks. We propose a scheme to realize object detection in the blind zones of heavy trucks based on the improved you-only-look-once (YOLO) v3 network. First, according to the actual detection requirements, the targets are determined to establish a new data set of persons, cars, and fallen pedestrians, with a focus on small and medium objects. Subsequently, the network structure is optimized, and the features are enhanced by combining the shallow and deep convolution information of the Darknet platform. In this way, the feature propagation can be effectively enhanced, feature reuse can be promoted, and the network performance for small object detection can be improved. Furthermore, new anchors are obtained by clustering the data set using the K-means technique to improve the accuracy of the detection frame positioning. In the test stage, detection is performed using the trained model. The test results demonstrate that the proposed improved YOLO v3 network is superior to the original YOLO v3 model in terms of the blind zone detection and can satisfy the accuracy and real-time requirements with an accuracy of 94% and runtime of 13.792 ms / frame. Moreover, the mean average precision value for the improved model is 87.82%, which is 2.79% higher than that of the original YOLO v3 network.
360° video can provide users with immersive experience by showing the omnidirectional perspective, which is getting more attractive to consumers. However, 360° video tends to have higher resolution, resulting in increased bandwidth requirements for transmission. The characteristic of head-mounted displays (HMD) provides a new approach to reducing the cost of streaming 360° video bandwidth, which can encode 360° video by considering user’s orientation. In this paper, we propose a novel 360° video coding method based on the characteristics of Equi-rectangular Projection (ERP) and combined with user-oriented behavior. Specifically, a non-uniform tile method according is designed to the principle of ERP, which also meets the behavioral of users viewing 360° video. Additionally, appropriate coding parameters are set according to the positions of different tiles to reduce the redundancy introduced by oversampling to improve the coding efficiency. Experimental results show the proposed method can reduce the bandwidth requirement of streaming 360° video while ensuring the consistent visual quality, significantly.
KEYWORDS: Clouds, Image segmentation, Stereoscopy, Imaging systems, 3D image processing, Computer graphics, 3D vision, 3D acquisition, 3D applications, Target detection
In order to display a high dynamic range (HDR) image on a standard monitor, tone-mapping operators (TMOs) aim to compress HDR images into low dynamic range tone-mapped (TM) images. To accurately evaluate the performance of different TMOs, this paper proposes a no-reference image quality assessment (IQA) method for TM images. Firstly, the image is divided into dark area, middle area and bright area by using clustering algorithm. The entropy and area ratio features are extracted from three areas mentioned above and the saliency area that is detected by the proposed method. Then the natural scene statistics features of the luminance channel and RGB color channels of TMI are used to assess the luminance naturalness and chrominance naturalness, respectively. Finally the support vector regression module is utilized to yield a quality score of the TM images. The experimental results on the tone-mapped image database (TMID) show the effectiveness of the proposed algorithm. Compared with the existing representative IQA methods, the proposed method has better performance.
The existing saliency detection methods are not suitable for high dynamic range (HDR) images. In this work, based on human visual system, we propose a new method for detecting the saliency of HDR images via luminance regionalization. First, considering the visual characteristics of a wider luminance range of HDR images, luminance information of the HDR image is extracted, and the HDR image is divided into high, medium, and low luminance regions by luminance thresholding. Then, saliency map of each luminance region is detected, respectively. Color and texture features are extracted for the high luminance region, luminance and texture features are extracted for the low luminance region, and an existing LDR image saliency detection method is used for the medium luminance region. Finally, the three saliency maps are linearly fused to obtain the final HDR image saliency map. Experimental results on two public databases (EPFL HDR eye tracking database and TMID database) demonstrate that the proposed method performs well when against the five state-of-the-art methods in terms of detecting the salient regions of HDR images.
With the wide applications of three-dimensional (3D) mesh model in digital entertainment, animation, virtual reality and other fields, there are more and more processing techniques for 3D mesh models, including watermarking, compression, and simplification. These processing techniques will inevitably lead to various distortions in 3D mesh. Thus, it is necessary to design effective tools for 3D mesh quality assessment. In this work, considering that the curvature can measure concavity and convexity of surface well, and the human eyes are also very sensitive to the change of curvature, we propose a new objective 3D mesh quality assessment method. Curvature features are used to evaluate the visual difference between the reference and distorted meshes. Firstly, the Gaussian curvature and the mean curvature on the vertices of the reference and distorted meshes are calculated, and then the correlation function is used to measure the correlation coefficient of these meshes. In this case, the degree of degradation of the distorted mesh can be well represented. Finally, the Support Vector Regression model is used to fuse the two features and the objective quality score could be obtained. The proposed method is compared with seven existing 3D mesh quality assessment methods. Experimental results on the LIRIS_EPFL_GenPurpose Database show that the PLCC and SROCC of the proposed method are increased by 13.60% and 6.23%, compared with the best results of the seven representative methods. It implies that the proposed model has stronger consistency with the subjective visual perception of human eyes.
Light field has richer scene information than traditional images, including not only spatial information but also directional information. Aiming at multiple distortion problem of dense light field, combining with spatial and angular domain information, a light field image quality assessment method based on dense distortion curve analysis and scene information statistics is proposed in this paper. Firstly, the mean difference between all multi-view images in the angular domain of dense light field is extracted, and a corresponding distortion curve is drawn. Three statistical features are obtained by fitting the curve, which are slope, median and peak, respectively represent the distortion deviation, interpolation period and the maximum distortion. Then, the mean information entropy and mean gradient magnitude of the light field are extracted as the global and local features of the spatial domain. Finally, the extracted features are trained and tested by the Support Vector Regression. The experiment is conducted on the public MPI dense light field database. Experimental results show that the PLCC of the proposed method is 0.89, better than the existing methods, especially for different types of distorted contents.
Research on video quality assessment (VQA) plays a crucial role in improving the efficiency of video coding and the performance of video processing. It is well acknowledged that the motion energy model generates motion energy responses in a middle temporal area by simulating the receptive field of neurons in V1 for the motion perception of the human visual system. Motivated by the biological evidence for the visual motion perception, a VQA method is proposed in this paper, which comprises the motion perception quality index and the spatial index. To be more specific, the motion energy model is applied to evaluate the temporal distortion severity of each frequency component generated from the difference of Gaussian filter bank, which produces the motion perception quality index, and the gradient similarity measure is used to evaluate the spatial distortion of the video sequence to get the spatial quality index. The experimental results of the LIVE, CSIQ, and IVP video databases demonstrate that the random forests regression technique trained by the generated quality indices is highly correspondent to human visual perception and has many significant improvements than comparable well-performing methods. The proposed method has higher consistency with subjective perception and higher generalization capability.
Existing stereoscopic image quality assessment (SIQA) methods are mostly based on the luminance information, in which color information is not sufficiently considered. Actually, color is part of the important factors that affect human visual perception, and nonnegative matrix factorization (NMF) and manifold learning are in line with human visual perception. We propose an SIQA method based on learning binocular manifold color visual properties. To be more specific, in the training phase, a feature detector is created based on NMF with manifold regularization by considering color information, which not only allows parts-based manifold representation of an image, but also manifests localized color visual properties. In the quality estimation phase, visually important regions are selected by considering different human visual attention, and feature vectors are extracted by using the feature detector. Then the feature similarity index is calculated and the parts-based manifold color feature energy (PMCFE) for each view is defined based on the color feature vectors. The final quality score is obtained by considering a binocular combination based on PMCFE. The experimental results on LIVE I and LIVE Π 3-D IQA databases demonstrate that the proposed method can achieve much higher consistency with subjective evaluations than the state-of-the-art SIQA methods.
Stereoscopic image quality assessment (IQA) plays a vital role in stereoscopic image/video processing systems. We propose a new quality assessment for stereoscopic image that uses disparity-compensated view filtering (DCVF). First, because a stereoscopic image is composed of different frequency components, DCVF is designed to decompose it into high-pass and low-pass components. Then, the qualities of different frequency components are acquired according to their phase congruency and coefficient distribution characteristics. Finally, support vector regression is utilized to establish a mapping model between the component qualities and subjective qualities, and stereoscopic image quality is calculated using this mapping model. Experiments on the LIVE 3-D IQA database and NBU 3-D IQA databases demonstrate that the proposed method can evaluate stereoscopic image quality accurately. Compared with several state-of-the-art quality assessment methods, the proposed method is more consistent with human perception.
KEYWORDS: Visualization, Databases, 3D modeling, Data modeling, Performance modeling, Visual process modeling, 3D image processing, Spatial frequencies, Molybdenum, 3D visualizations
Three-dimensional (3-D) visual comfort assessment (VCA) is a particularly important and challenging topic, which involves automatically predicting the degree of visual comfort in line with human subjective judgment. State-of-the-art VCA models typically focus on minimizing the distance between predicted visual comfort scores and subjective mean opinion scores (MOSs) by training a regression model. However, obtaining precise MOSs is often expensive and time-consuming, which greatly constrains the extension of existing MOS-aware VCA models. This study is inspired by the fact that humans tend to conduct a preference judgment between two stereoscopic images in terms of visual comfort. We propose to train a robust VCA model on a set of preference labels instead of MOSs. The preference label, representing the relative visual comfort of preference stereoscopic image pairs (PSIPs), is generally precise and can be obtained at much lower cost compared with MOS. More specifically, some representative stereoscopic images are first selected to generate the PSIP training set. Then, we use a support vector machine to learn a preference classification model by taking a differential feature vector and the corresponding preference label of each PSIP as input. Finally, given a testing sample, by considering a full-round paired comparison with all the selected representative stereoscopic images, the visual comfort score can be estimated via a simple linear mapping strategy. Experimental results on our newly built 3-D image database demonstrate that the proposed method can achieve a better performance compared with the models trained on MOSs.
Perceptual stereoscopic image quality assessment (SIQA) aims to use computational models to measure the image
quality in consistent with human visual perception. In this research, we try to simulate monocular and binocular visual
perception, and proposed a monocular-binocular feature fidelity (MBFF) induced index for SIQA. To be more specific,
in the training stage, we learn monocular and binocular dictionaries from the training database, so that the latent response
properties can be represented as a set of basis vectors. In the quality estimation stage, we compute monocular feature
fidelity (MFF) and binocular feature fidelity (BFF) indexes based on the estimated sparse coefficient vectors, and
compute global energy response similarity (GERS) index by considering energy changes. The final quality score is
obtained by incorporating them together. Experimental results on four public 3D image quality assessment databases
demonstrate that in comparison with the most related existing methods, the devised algorithm achieves high consistency
alignment with subjective assessment.
In multi-view video system, multiple video plus depth is main data format of 3D scene representation. Continuous virtual
views can be generated by using depth image based rendering (DIBR) technique. DIBR process includes geometric
mapping, hole filling and merging. Unique weights, inversely proportional to the distance between the virtual and real
cameras, are used to merge the virtual views. However, the weights might not the optimal ones in terms of virtual view
quality. In this paper, a novel virtual view merging algorithm is proposed. In the proposed algorithm, machine learning
method is utilized to establish an optimal weight model. In the model, color, depth, color gradient and sequence
parameters are taken into consideration. Firstly, we render the same virtual view from left and right views, and select the
training samples by using a threshold. Then, the eigenvalues of the samples are extracted and the optimal merging
weights are calculated as training labels. Finally, support vector classifier (SVC) is adopted to establish the model which
is used for guiding virtual views rendering. Experimental results show that the proposed method can improve the quality
of virtual views for most sequences. Especially, it is effective in the case of large distance between the virtual and real
cameras. And compared to the original method of virtual view synthesis, the proposed method can obtain more than
0.1dB gain for some sequences.
Since stereoscopic images provide observers with both realistic and discomfort viewing experience, it is necessary to
investigate the determinants of visual discomfort. By considering that foreground object draws most attention when
human observing stereoscopic images. This paper proposes a new foreground object based visual comfort assessment
(VCA) metric. In the first place, a suitable segmentation method is applied to disparity map and then the foreground
object is ascertained as the one having the biggest average disparity. In the second place, three visual features being
average disparity, average width and spatial complexity of foreground object are computed from the perspective of
visual attention. Nevertheless, object’s width and complexity do not consistently influence the perception of visual
comfort in comparison with disparity. In accordance with this psychological phenomenon, we divide the whole images
into four categories on the basis of different disparity and width, and exert four different models to more precisely
predict its visual comfort in the third place. Experimental results show that the proposed VCA metric outperformance
other existing metrics and can achieve a high consistency between objective and subjective visual comfort scores. The
Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank Order Correlation Coefficient (SROCC) are over
0.84 and 0.82, respectively.
KEYWORDS: Video, Video processing, Video coding, Computer programming, Video compression, Cameras, Optical filters, Gaussian filters, Information science, 3D vision
In free viewpoint video system, the color and the corresponding depth video are utilized to synthesize the virtual views
by depth image based rendering (DIBR) technique. Hence, high quality of depth videos is a prerequisite for high quality
of virtual views. However, depth variation, caused by scene variance and limited depth capturing technologies, may
increase the encoding bitrate of depth videos and decrease the quality of virtual views. To tackle these problems, a depth
preprocess method based on smoothing the texture and abrupt changes of depth videos is proposed to increase the
accuracy of depth videos in this paper. Firstly, a bilateral filter is adopted to smooth the whole depth videos and protect
the edge of depth videos at the same time. Secondly, abrupt variation is detected by a threshold calculated according to
the camera parameter of each video sequence. Holes of virtual views occur when the depth values of left view change
obviously from low to high in horizontal direction or the depth values of right view change obviously from high to low.
So for the left view, depth value difference in left side gradually becomes smaller where it is greater than the thresholds.
And then, in right side of right view is processed likewise. Experimental results show that the proposed method can
averagely reduce the encoding bitrate by 25% while the quality of the synthesized virtual views can be improve by
0.39dB on average compared with using original depth videos. The subjective quality improvement is also achieved.
KEYWORDS: Digital watermarking, Image restoration, Detection and tracking algorithms, Image quality, 3D image processing, Image processing, Image compression, Multimedia, Visualization, Signal to noise ratio
We propose a new watermarking algorithm for stereoscopic image tamper detection and self-recovery in three-dimensional multimedia services. Initially, left and right views of stereoscopic image are divided into nonoverlapping 2×2 blocks in order to improve the accuracy of tamper localization in an image. As the left and right views of a stereoscopic image are not independent from each other but have an inter-view relationship, every block of a stereoscopic image is classified into matching block or nonmatching block and then block disparities are obtained. Both matching blocks in the left and right views have similar pixel values, so that fewer bits are allocated for recovery watermark generation, which can increase the quality of watermarked stereoscopic images. A hierarchical tamper-detection strategy with a four-level checkup is presented to improve the accuracy of tamper localization. Additionally, two copies of block (matching block and nonmatching block) information are embedded into the stereoscopic image, and it assures the quality of tampered recovery. For the nonmatching block recovery, two copies of the partner block are embedded into their chaotic mapping blocks, which supply the second chance for tamper recovery. For the matching block recovery, the inter-view relationship between tampers of left and right views supplies the third chance for tamper recovery. Experimental results show that the proposed algorithm can not only detect and locate tampers in stereoscopic image more accurately but also recover the tampered regions better, compared with other algorithms.
KEYWORDS: Distortion, 3D modeling, Video coding, Video, Volume rendering, Quantization, Optimization (mathematics), 3D video compression, Image compression, Video compression
In this letter, a novel optimized view synthesis distortion model is proposed for bit allocation in three-dimensional video coding. The proposed model separates the view synthesis distortion into two independent terms, and the two terms are modeled respectively by quadratic distortion models. Finally, the optimal quantization parameters for texture and depth can be determined by minimizing the view synthesis distortion under the total bitrate constraint. Experimental results show that compared with a fixed 5:1 method, the proposed method can obtain higher view synthesis rate-distortion performance.
KEYWORDS: Video, 3D image processing, Cameras, Optical engineering, Image segmentation, 3D displays, Video coding, Computer programming, Optimization (mathematics), 3D acquisition
Three-dimensional (3-D) video technologies are becoming increasingly popular because they can provide high quality and immersive experience to end users. Depth image-based rendering (DIBR) is a key technology in 3-D video systems due to its low bandwidth cost as well as the arbitrary rendering viewpoint. We propose an object-based DIBR method by color-correction optimization. The proposed method first performs temporal consistent rendering to reduce the rendering complexity. Then, by segmenting the depth map into foreground and background, the object-based scalable rendering is performed to improve the rendering quality and reduce the rendering complexity. Finally, the rendered virtual view is further optimized by color-correction operation. Experimental results show that, compared to the results without the above optimization operations, the proposed method can reduce >40% computational complexity while maintaining high rendering quality.
Three-dimensional (3-D) video systems are expected to be a next-generation visual application. Since multiview video for 3-D video systems is composed of color and associated depth information, its huge requirement for data storage and transmission is an important problem. We propose a rendering-oriented multiview video coding (MVC) method based on chrominance information reconstruction that incorporates the rendering technique into the MVC process. The proposed method discards certain chrominance information to reduce bitrates, and performs reasonable bitrate allocation between color and depth videos. At the decoder, a chrominance reconstruction algorithm is presented to achieve accurate reconstruction by warping the neighboring views and colorizing the luminance-only pixels. Experimental results show that the proposed method can save nearly 20% on bitrates against the results without discarding the chrominance information. Moreover, under a fixed bitrate budget, the proposed method can greatly improve the rendering quality.
Research on stereoscopic image and video processing has been the new trend in the recent years. Measurement of visual quality is of fundamental importance for numerous stereoscopic image and video processing applications. The goal of quality assessment is to automatically assess the quality of images or videos in agreement with human quality judgments and optimize the stereoscopic image or video systems. Unfortunately, human are lack of the knowledge for the perception quality of stereoscopic images. In this paper, we present experimental results of an extensive subjective quality assessment experiment for stereoscopic images in which a total 400 distorted stereoscopic images were evaluated by about twenty human subjects. The stereoscopic images quality data obtained from 8,000 individual human quality judgments is used to build a database that can be exploited for understanding perception of stereoscopic images and provide data for objective assessment metrics designing. The experimental results indicated that the quality perception of the distorted stereoscopic images is content and distortion types dependent.
KEYWORDS: Computer programming, Video, Video coding, Video surveillance, Lawrencium, Error control coding, Telecommunications, Cameras, Video compression, Motion estimation
For wireless multi-view video system, whose abilities of storage and computation are all very weak, it is essential to
have an encoder device with low-power consumption and low-complexity. In this paper, a DCT-domain Wyner-Ziv
residual coding scheme with low encoding complexity is proposed for wireless multi-view video coding (WZRC-WMS).
The scheme is designed to encode the residual frames of each view independently without any motion or disparity
estimation at the encoder, so as to shift the large computational complexity to the decoder. At the decoder, the proposed
scheme performs joint decoding with side information interpolated from current view and adjacent views. Experimental
results show that the proposed WZRC-WMS scheme outperforms the H.263+ interframe coding about 1.9dB in rate-distortion
performance, while the encoding complexity is only 1/17 of that of H.264 interframe coding.
Ray-Space representation has superiority in rendering arbitrary viewpoint images of complicated scene in real-time. Ray-Space interpolation is one of the key techniques to make Ray-Space based Free Viewpoint Television (FTV) feasible. This paper presents a directionality based interpolation method for Ray-Space based FTV system, in which characteristic pixels are first extracted from sparse Ray-Space slice, and their directionalities are determined by block matching, while directionalities of other pixels to be interpolated are obtained by interpolating with the directionalities of these characteristic pixels. Experimental results show that the proposed method improves visual quality as well as PSNRs of rendered intermediate viewpoint image greatly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.