Estimating human body pose and shape from a single-view image has been highly successful, but most existing methods require a model with a large number of parameters that are difficult to run on low performance devices. Light weight networks are struggle to extract sufficient information for human pose and shape estimation, making accurate prediction challenging. In this paper, we propose a lightweight model for predicting human body shape and pose parameters of a parametric human body model. Our method comprises a lightweight multi-stage encoder based on Litehrnet and Shufflenet, and a decoder composed of cascaded MLPs based on human kinematic tree, which achieves comparable performance to HMR while the model size is only one-ninth of HMR. In addition, our model can achieve an inference speed of 19.2 times per second on the Qualcomm Snapdragon 888+
KEYWORDS: Feature extraction, Image restoration, Visual process modeling, Image compression, Object detection, Machine vision, Education and training, Semantics, Image segmentation, Human vision and color perception
Latent representation features in deep learning (DL) exhibit excellent potential for visual data applications. For example, in traffic monitoring and video surveillance, the features simultaneously perform image analysis for machine vision and image reconstruction for human viewing. However, the existing deep features that appeal to machine and human receivers are always combinations of separated pieces and specific features. Due to these features being extracted from different branches in collaboration frameworks, the inherent relations between machine and human vision are insufficiently explored. Therefore, to obtain one set of representative and generic features, we propose a dynamic groupwise splitting network based on image content to explore and extract generic features for the two different receivers. First, we analyze the characteristics of the latent features and adopt intermediate features as the base features. Then, a feature classification and transformation mechanism based on image content is proposed to enhance the base features for further image reconstruction and analysis. Consequently, an end-to-end model with multimodel cascading and multistage training realizes both machine and human vision tasks. Extensive experiments show that our human–machine vision collaboration framework has high practical value and performance.
In the past few years, deep learning-based image inpainting has made significant progress. However, many existing methods do not take into account the rationality of the structure and the fineness of the texture, which leads to the scattered structure or excessive smoothness of the repaired image. To solve this problem, we propose a two-stage image inpainting model composed of structure generation network and texture generation network. The structure generation network focuses on the structure and color domain and uses the damaged structure map extracted from the mask image to reasonably fill the mask area to generate a complete structure map. The texture generation network uses the repaired structure map to guide the refinement process. We train the two-stage network on the public datasets Places2, CelebA, and Paris StreetView, and the experimental results show the superiority of our method over the previous methods.
Virtual reality (VR) refers to a technology that allows people to experience the virtual world in an artificial environment. As one of the most important forms of VR media content, panoramic video can provide viewers with 360-degree free viewing angles. However, the acquisition, stitching, transmission and playback of panoramic video may damage the video quality and seriously affect the viewer's quality of experience. Therefore, how to improve the display quality and provide users with a better visual experience has become a hot topic in this field. When watching the videos, people pay attention to the salient areas, especially for the panoramic videos that people can choose the regions of interest freely. Considering this characteristic, the saliency information needs to be utilized when performing quality assessment. In this paper, we use two cascaded networks to calculate the quality score of panoramic video without reference video. First, the saliency prediction network is used to compute the saliency map of the image, and the patches with higher saliency are selected through the saliency map. In this way, we can exclude the areas in the panoramic image that have no positive effect on the quality assessment task. Then, we input the selected small salient patches into the quality assessment network for prediction, and obtain the final image quality score. Experimental results show that the proposed method can achieve more accurate quality scores for the panoramic videos compared with the state-of-the-art works due to its special network structure.
Video inpainting is a very challenging task. Directly using the image inpainting method to repair the damaged video leads to the inter-frame contents flicker due to temporal discontinuities. In this paper, we introduce spatial structure and temporal edge information guided video inpainting model to repair the missing regions in high-resolution video. The model uses a convolutional neural network with residual blocks to fix up the missing contents in intra-frame according to spatial structure. At the same time, temporal edge of reference frame is introduced in the temporal domain, which has a large guiding effect on improving the texture and reducing the inter-frame flicker. We train the model with regular and irregular masks on the YouTube high resolution video datasets, and the trained model is qualitatively and quantitatively evaluated on the test set, and the results show our method is superior to the previous methods.
Objective quality assessment plays a vital role in the evaluation and optimization of panoramic video. However, most of the current methods only consider the structural distortion caused by the projection format, and do not consider the effect of clarity on quality evaluation. For this reason, we propose a new objective video quality assessment method for panoramic video. First, the source image and the distorted image are down-sampled to obtain five sets of images with different scales. Second, calculate WS-SSIM at different scales. Finally, according to the degree of influence of different scales on the subjective evaluation, different coefficients are assigned to the corresponding WS-SSIM, and the overall score is calculated. Experiments on the database established in our laboratory have proved its effectiveness through comparison.
Collaborative intelligence is a new strategy to deploy deep neural network model for AI-based mobile devices, which runs a part of model on the mobile to extract features, the rest part in the cloud. In such case, feature data but not the raw image needs to be transmitted to cloud, and the features uploaded to cloud need have generalization capability to complete multitask. To this end, we design an encoder-decoder network to get intermediate deep features of image, and propose a method to make the features complete different tasks. Finally, we use a lossy compression method for intermediate deep features to improve transmission efficiency. Experimental results show that the features extracted by our network can complete input reconstruction and object detection simultaneously. Besides, with the deep-feature compression method proposed in our work, the quality of reconstructed image is good in visual and index of quantitative assessment, and object detection also has a good result in accuracy.
Based on the original exemplar-based Criminisi algorithm, we proposed two points to improve the result of image inpainting. First, in order to solve the problem that the searched matching block existing in the optimal block search is not optimal, this paper proposes a fusion repair strategy. The first n blocks are selected as matching blocks in the search of the optimal block, and then the weighted averages are performed on the matching blocks, which is used as the target block to be repaired. Second, considering the size of the block to be repaired, a layered repair strategy is adopted. An image to be repaired is first downsampled to obtain images at different scales, and then repaired from the topmost image. The experimental results show that the proposed algorithm improves the quality of the repair subjectively and objectively.
KEYWORDS: 3D image processing, 3D modeling, Databases, Image quality, Visualization, 3D displays, Feature extraction, 3D image enhancement, Statistical analysis, Convolution
Perceptual quality assessment of a three-dimensional (3-D) image is one of the most important tasks in various applications such as 3-D image coding, processing, enhancement, and monitoring system. But objective quality assessment of 3-D images is still a challenging task. Especially, blind quality assessment of 3-D images encounters an arduous challenge due to lack of prior information about the original images. To solve this problem, we propose a blind 3-D images’ quality evaluator by simulating binocular rivalry and orientation responses of the human visual system. As a main technical contribution of this research, both the low- and high-level binocular rivalry responses (BRR), as well as binocular orientation-tuned (BOT) responses, are considered for blind quality assessment of 3-D images. Specifically, the self-similarity of the BRR and BOT responses is extracted from the distorted 3-D images, which will change in the presence of distortions. Subsequently, all quality-aware features are mapped to subjective quality scores of the distorted 3-D images via using support vector regression. The performance of our algorithm is evaluated over two popular LIVE 3-D phase I and phase II databases and shown to be competitive with the state-of-the-art algorithms.
A low-resolution depth map can be upsampled through the guidance from the registered high-resolution color image. This type of method is so-called guided depth map upsampling. Among the existing methods based on Markov random field (MRF), either data-driven or model-based prior is adopted to construct the regularization term. The data-driven prior can implicitly reveal the relation between color-depth image pair by training on external data. The model-based prior provides the anisotropic smoothness constraint guided by high-resolution color image. These types of priors can complement each other to solve the ambiguity in guided depth map upsampling. An MRF-based approach is proposed that takes both of them into account to regularize the depth map. Based on analysis sparse coding, the data-driven prior is defined by joint cosparsity on the vectors transformed from color-depth patches using the pair of learned operators. It is based on the assumption that the cosupports of such bimodal image structures computed by the operators are aligned. The edge inconsistency measurement is explicitly calculated, which is embedded into the model-based prior. It can significantly mitigate texture-copying artifacts. The experimental results on Middlebury datasets demonstrate the validity of the proposed method that outperforms seven state-of-the-art approaches.
This paper presents a temporal video error concealment method specially designed for H.265/HEVC. We propose the quad-tree partitioning prediction and the coherency sensitive hashing in order for better error concealment performance in the corrupted frames via HEVC codec. First, we try to deduce the most probable partitioning of the missing coding tree unit (CTU) using the proposed quad-tree partitioning prediction, which generates several CUs that constitute the CTU. Then, a coding unit (CU) priority choosing method is applied to select the best one from these CUs for prior concealment. Last, the coherency sensitive hashing is adopted for concealing the chosen best CU for better searching quality. The experiments shows that the recovery performance of the purposed method surpasses the compared state-of-the-art methods since the quad-tree partitioning prediction, the priority choosing process, and the coherency sensitive hashing help to improve the overall performance.
Three-dimensional (3-D) holoscopic imaging is a candidate promising 3-D technology that can overcome some drawbacks of current 3-D technologies. Due to the particular optical structure, a holoscopic image consists of an array of two-dimensional microimages (MIs) that represent different perspectives of the scene. To address the data-intensive characteristics and specific structure of holoscopic images, efficient coding schemes are of utmost importance for efficient storage and transmission. We propose a 3-D holoscopic image-coding scheme using a sparse viewpoint image (VI) array and disparities. In the proposed scheme, a holoscopic image is decomposed into a VI array totally and the VI array is sampled into a sparse VI array. To reconstruct the full holoscopic image, disparities between adjoining MIs are calculated. Based on the remainder set of VIs and disparities, a full holoscopic image is reconstructed and encoded as a reference frame for the coding of the full holoscopic image. As an outcome of the representation, we propose a multiview plus depth compression scheme for 3-D holoscopic images coding. Experimental results show that the proposed coding scheme can achieve an average of 51% bit-rate reduction compared with high efficiency video coding intracoding.
KEYWORDS: Video, Video coding, Video compression, Computer programming, 3D video compression, Visual system, Quantization, Copper, Visualization, Volume rendering
As an extension of High Efficiency Video Coding ( HEVC), 3D-HEVC has been widely researched under the impetus of
the new generation coding standard in recent years. Compared with H.264/AVC, its compression efficiency is doubled
while keeping the same video quality. However, its higher encoding complexity and longer encoding time are not
negligible. To reduce the computational complexity and guarantee the subjective quality of virtual views, this paper
presents a novel video coding method for 3D-HEVC based on the saliency informat ion which is an important part of
Human Visual System (HVS). First of all, the relationship between the current coding unit and its adjacent units is used
to adjust the maximum depth of each largest coding unit (LCU) and determine the SKIP mode reasonably. Then,
according to the saliency informat ion of each frame image, the texture and its corresponding depth map will be divided
into three regions, that is, salient area, middle area and non-salient area. Afterwards, d ifferent quantization parameters
will be assigned to different regions to conduct low complexity coding. Finally, the compressed video will generate new
view point videos through the renderer tool. As shown in our experiments, the proposed method saves more bit rate than
other approaches and achieves up to highest 38% encoding time reduction without subjective quality loss in compression
or rendering.
KEYWORDS: 3D video compression, Video compression, Video, Laser Doppler velocimetry, 3D displays, Copper, Digital filtering, Communication engineering, Telecommunications, Data compression
Layered depth video (LDV) is a sparse representation of MVD, which is considered as a promising 3D video format for supporting 3D video services. This format consists of one full view and additional residual data that represents side views. However, the amount of residual data becomes larger when the distance between the central view and side views increases. To address this problem, a new inpainting-based residual data generation method is proposed in this paper. Then, the inpainting-induced artifacts is considered as new residual data and the residual data of two side views is merged into one buffer to further reduce the amount of data. On the other hand, the block wise alignment is used for higher coding efficiency. And in order to fit the shape or distribution of residual data, a new compression algorithm for coding residual data is proposed. The experiments show high compression efficiency of the proposed method. The proposed method allows reduction of required bitrate of at least 30% comparing to classical LDV method, while they have the similar quality of intermediate virtual view in the terminal’s display.
KEYWORDS: 3D image processing, Image compression, Statistical analysis, Video coding, 3D displays, Visualization, Error analysis, Computer programming, Prototyping, Imaging systems
Three-dimensional (3-D) holoscopic imaging, also known as integral imaging, light field imaging, or plenoptic imaging, can provide natural and fatigue-free 3-D visualization. However, a large amount of data is required to represent the 3-D holoscopic content. Therefore, efficient coding schemes for this particular type of image are needed. A 3-D holoscopic image coding scheme with kernel-based minimum mean square error (MMSE) estimation is proposed. In the proposed scheme, the coding block is predicted by an MMSE estimator under statistical modeling. In order to obtain the signal statistical behavior, kernel density estimation (KDE) is utilized to estimate the probability density function of the statistical modeling. As bandwidth estimation (BE) is a key issue in the KDE problem, we also propose a BE method based on kernel trick. The experimental results demonstrate that the proposed scheme can achieve a better rate-distortion performance and a better visual rendering quality.
KEYWORDS: Super resolution, Edge detection, Detection and tracking algorithms, RGB color model, Autoregressive models, Distortion, Resolution enhancement technologies, Image processing, Color difference, Communication engineering
The paper presents a depth map super-resolution method of which the core content is a novel edge enhancement algorithm. Auto-regressive algorithm is applied to generate an initial upsampled depth map before the edge enhancement. Except for the low-resolution depth map, an intensity image derived from high-resolution color image is also utilized to extract accurate depth edge, which is finally rectified by combining color, depth and intensity information. The experimental results show that our approach is able to recover high-resolution (HR) depth maps with high quality. What’s more, in comparison with the previous state-of-art algorithms, our approach can generally achieve better results.
KEYWORDS: Copper, Scalable video coding, Video coding, Computer programming, Video, Mechanical efficiency, Communication engineering, Telecommunications, Video compression, Temporal resolution
A scalable extension design is proposed for High Efficiency Video Coding (HEVC), which can provide temporal, spatial,
and quality scalability. This technique achieves high coding efficiency and error resilience, but increases the
computational complexity. To reduce the complexity of the quality scalable video coding, this paper proposes a fast
mode selection method based on mode distribution of coding units(CUs). Some experiments are tested which show that
the proposed algorithm can achieve up to 63.70% decrease in encoding time with a negligible loss of video quality.
Hole filling of depth maps is a core technology of the Kinect based visual system. In this paper, we
propose a hole filling algorithm for Kinect depth maps based on separately repairing of the foreground
and background. There are two-part processing in the proposed algorithm. Firstly, a fast pre-processing
to the Kinect depth map holes is performed. In this part, we fill the background holes of Kinect depth
maps with the deepest depth image which is constructed by combining the spatio-temporal information
of the pixels in Kinect depth map with the corresponding color information in the Kinect color image.
The second step is the enhancement for the pre-processing depth maps. We propose a depth
enhancement algorithm based on the joint information of geometry and color. Since the geometry
information is more robust than the color, we correct the depth by affine transform in prior to utilizing
the color cues. Then we determine the filter parameters adaptively based on the local features of the
color image which solves the texture copy problem and protects the fine structures. Since L1 norm
optimization is more robust to data outliers than L2 norm optimization, we force the filtered value to be
the solution for L1 norm optimization. Experimental results show that the proposed algorithm can
protect the intact foreground depth, improve the accuracy of depth at object edges, and eliminate the
flashing phenomenon of depth at objects edges. In addition, the proposed algorithm can effectively fill
the big depth map holes generated by optical reflection.
A new multiview just-noticeable-depth-difference(MJNDD) Model is presented and applied to compress the joint
multiview video plus depth. Many video coding algorithms remove spatial and temporal redundancies and statistical
redundancies but they are not capable of removing the perceptual redundancies. Since the final receptor of video is the
human eyes, we can remove the perception redundancy to gain higher compression efficiency according to the properties
of human visual system (HVS). Traditional just-noticeable-distortion (JND) model in pixel domain contains luminance
contrast and spatial-temporal masking effects, which describes the perception redundancy quantitatively. Whereas HVS
is very sensitive to depth information, a new multiview-just-noticeable-depth-difference(MJNDD) model is proposed by
combining traditional JND model with just-noticeable-depth-difference (JNDD) model. The texture video is divided into
background and foreground areas using depth information. Then different JND threshold values are assigned to these two
parts. Later the MJNDD model is utilized to encode the texture video on JMVC. When encoding the depth video, JNDD
model is applied to remove the block artifacts and protect the edges. Then we use VSRS3.5 (View Synthesis Reference
Software) to generate the intermediate views. Experimental results show that our model can endure more noise and the
compression efficiency is improved by 25.29 percent at average and by 54.06 percent at most compared to JMVC while
maintaining the subject quality. Hence it can gain high compress ratio and low bit rate.
KEYWORDS: Distortion, Volume rendering, Image compression, Video coding, Quantization, Video, 3D video compression, 3D image processing, Video compression, Communication engineering
In multi-view plus depth (MVD) 3D video coding, texture maps and depth maps are coded jointly. The depth maps
provide the scene geometry information and are used to render the virtual view at the terminal through a
Depth-Image-Based-Rendering (DIBR) technique. The distortion of the coded texture maps and depth maps will induce
synthesized virtual view distortion. Besides the coding efficiency of texture maps and depth maps, bit allocation between
texture maps and depth maps also has a great effect on the virtual view quality. In this paper, the virtual view distortion
is divided into texture maps induced distortion and depth maps induced distortion separately, models of texture maps
induced virtual view distortion and depth maps induced virtual view distortion are derived respectively. Based on the
depth maps induced virtual view distortion model, depth maps coding Rate Distortion Optimization (RDO) is modified
and the depth maps coding efficiency is increased. Meanwhile, we also propose a Rate-distortion (R-D) model to solve
the joint bit allocation problem. Experimental results demonstrate the high accuracy of the proposed virtual view
distortion model. The R-D performance of the proposed algorithm is close to the full search algorithm that can give the
best R-D performance, while the coding complexity of the proposed algorithm is lower. Compared with fixed texture and
depth bits ratio (5:1), an average 0.3 dB gains can be achieved by the proposed algorithm. The proposed algorithm has
high rate control accuracy with the average error less than 1%.
Three-dimensional (3-D) video brings people strong visual perspective experience, but also introduces large data and complexity processing problems. The depth estimation algorithm is especially complex and it is an obstacle for real-time system implementation. Meanwhile, high-resolution depth maps are necessary to provide a good image quality on autostereoscopic displays which deliver stereo content without the need for 3-D glasses. This paper presents a hardware implementation of a full high-definition (HD) depth estimation system that is capable of processing full HD resolution images with a maximum processing speed of 125 fps and a disparity search range of 240 pixels. The proposed field-programmable gate array (FPGA)-based architecture implements a fusion strategy matching algorithm for efficiency design. The system performs with high efficiency and stability by using a full pipeline design, multiresolution processing, synchronizers which avoid clock domain crossing problems, efficient memory management, etc. The implementation can be included in the video systems for live 3-D television applications and can be used as an independent hardware module in low-power integrated applications.
Variable transform sizes have been adopted by the emerging international standard of high-efficiency video coding (HEVC). The HEVC test model (HM) supports four different transform sizes in the range of 4×4 to 32×32. The transform size is chosen based on the rate-distortion optimization techniques. Variable transform sizes have a coding performance superior to the traditional fixed block-size transform. Meanwhile, this causes high computational complexity at the encoders. We propose a fast transform size decision algorithm incorporating three methods: (1) transform bypass based on the SKIP mode detection, (2) content-based transform size decision, and (3) early termination of variable-sized transform. The proposed algorithm first checks coding information from nearby spatially and temporally coding units (CUs) or treeblocks and the parent CU in the upper depth level. Based on the coding information from nearby CUs, only a small number of transform sizes are selected in the procedure of transform size decision. Simulation results show that our algorithm can reduce the complexity of the transform procession by more than 55% with a negligible loss of coding efficiency.
In the emerging international standard for scalable video coding (SVC) as an extension of H.264/AVC, a computationally expensive exhaustive mode decision is employed to select the best prediction mode for each macroblock (MB). Although this technique achieves the highest possible coding efficiency, it results in extremely large computation complexity, which obstructs SVC from the practical application. We propose a fast mode decision algorithm for SVC, comprising two fast mode decision techniques: early SKIP mode decision and adaptive early termination for mode decision. They make use of the coding information of spatial neighboring MBs in the same frame and neighboring MBs from base layer to early terminate the mode decision procedure. Experimental results show that the proposed fast mode decision algorithm can achieve the average computational savings of about 70% with almost no loss of rate distortion performance in the enhancement layer.
KEYWORDS: Motion estimation, Video, Computer programming, Video coding, 3D video compression, Motion analysis, Cameras, Video compression, Statistical analysis, 3D displays
Multiview video coding (MVC) is an ongoing standard. In the working draft, motion estimation and disparity estimation are both employed in the encoding procedure. It achieves the highest possible coding efficiency, but results in extremely large encoding time, which obstructs it from practical applications. We propose a macroblock (MB) level adaptive search range algorithm utilizing inter-view correlation for motion estimation in MVC to reduce the complexity of the coder. For multi-view sequences, the motion vectors of the corresponding MBs in previously coded view are first extracted to analyze motion homogeneity. On the basis of motion homogeneity, MBs are classified into three types (MB in the region with homogeneous motion, with medium homogeneous motion, or with complex motion), and search range is adaptively determined for each type MB. Experimental results show that our algorithm can save 75% average computational complexity of motion estimation, with negligible loss of coding efficiency.
KEYWORDS: Video, Video compression, Video processing, Detection and tracking algorithms, Tolerancing, Video coding, Computer programming, Feature extraction, Roads, Target detection
Moving object retrieval technique in compressed domain plays an important role in many real-time applications, e.g. Vehicle Detection and Classification. A number of retrieval techniques that operate in compressed domain have been reported in the literature. H.264/AVC is the up-to-date video-coding standard that is likely to lead to the proliferation of retrieval techniques in the compressed domain. Up to now, few literatures on H.264/AVC compressed video have been reported. Compared with the MPEG standard, H.264/AVC employs several new coding block types and different entropy coding method, which result in moving object retrieval in H.264/ AVC compressed video a new task and challenging work. In this paper, an approach to extract and retrieval moving traffic object in H.264/AVC compressed video is proposed. Our algorithm first Interpolates the sparse motion vector of p-frame that is composed of 4*4 blocks,
4*8 blocks and 8*4 blocks and so on. After forward projecting each p-frame vector to the immediate adjacent I-frame and calculating the DCT coefficients of I-frame using information of spatial intra-prediction, the method extracts moving VOPs (video object plan) using an interactive 4*4 block classification process. In Vehicle Detection application, the segmented VOP in 4*4 block-level accuracy is insufficient. Once we locate the target VOP, the actual edges of the VOP in 4*4 block accuracy can be extracted by applying Canny Edge Detection only on the moving VOP in 4*4 block accuracy. The VOP in pixel accuracy is then achieved by decompressing the DCT blocks of the VOPs. The edge-tracking algorithm is applied to find the missing edge pixels. After the segmentation process a retrieval algorithm that based on CSS (Curvature Scale Space) is used to search the interested shape of vehicle in H.264/AVC compressed video sequence. Experiments show that our algorithm can extract and retrieval moving vehicles efficiency and robustly.
It is a hot focus of current researches in video standards that how to transmit video streams over Internet and wireless
networks. One of the key methods is FGS(Fine-Granular-Scalability), which can always adapt to the network bandwidth
varying but with some sacrifice of coding efficiency, is supported by MPEG-4. Object-based video coding algorithm
has been firstly included in MPEG-4 standard that can be applied in interactive video. However, the real time
segmentation of VOP(video object plan) is difficult that limit the application of MPEG-4 standard in interactive video.
H.264/AVC is the up-to-date video-coding standard, which enhance compression performance and provision a network-friendly
video representation. In this paper, we proposed a new Object Based FGS(OBFGS) coding algorithm embedded
in H.264/AVC that is different from that in mpeg-4. After the algorithms optimization for the H.264 encoder, the FGS
first finish the base-layer coding. Then extract moving VOP using the base-layer information of motion vectors and
DCT coefficients. Sparse motion vector field of p-frame composed of 4*4 blocks, 4*8 blocks and 8*4 blocks in base-layer
is interpolated. The DCT coefficient of I-frame is calculated by using information of spatial intra-prediction. After
forward projecting each p-frame vector to the immediate adjacent I-frame, the method extracts moving VOPs (video
object plan) using a recursion 4*4 block classification process. Only the blocks that belong to the moving VOP in 4*4
block-level accuracy is coded to produce enhancement-layer stream. Experimental results show that our proposed
system can obtain high interested VOP quality at the cost of fewer coding efficiency.
Context-based Adaptive Binary Arithmetic Coding (CABAC) is a new entropy coding method presented in H.264/AVC that is highly efficient in video coding. In the method, the probability of current symbol is estimated by using the wisely designed context model, which is adaptive and can approach to the statistic characteristic. Then an arithmetic coding mechanism largely reduces the redundancy in inter-symbol. Compared with UVLC method in the prior standard, CABAC is complicated but efficiently reduce the bit rate. Based on thorough analysis of coding and decoding methods of CABAC, This paper proposed two methods, sub-table method and stream-reuse methods, to improve the encoding efficiency implemented in H.264 JM code. In JM, the CABAC function produces bits one by one of every syntactic element. Multiplication operating times after times in the CABAC function lead to it inefficient.The proposed algorithm creates tables beforehand and then produce every bits of syntactic element. In JM, intra-prediction and inter-prediction mode selection algorithm with different criterion is based on RDO(rate distortion optimization) model. One of the parameter of the RDO model is bit rate that is produced by CABAC operator. After intra-prediction or inter-prediction mode selection, the CABAC stream is discard and is recalculated to output stream. The proposed Stream-reuse algorithm puts the stream in memory that is created in mode selection algorithm and reuses it in encoding function. Experiment results show that our proposed algorithm can averagely speed up 17 to 78 MSEL higher speed for QCIF and CIF sequences individually compared with the original algorithm of JM at the cost of only a little memory space. The CABAC was realized in our progressive h.264 encoder.
Video streaming over the Internet usually encounters with bandwidth variations and packet losses, which impacted badly on the reconstructed video quality. Fine Granularity Scalability (FGS) can well provide bit-rate adaptability to different bandwidth conditions over the Internet, due to its fine granular and error resilience. However, the effective solution of packet losses is Multiple Description Coding (MDC), but a great deal of redundancy information is brought up. For an FGS video bit-stream, the base layer is usually very small and of high importance, error-free transmission could be achieved through classical error resilience technique. As a result, the overall streaming quality is mostly dependent on the enhancement layer. Moreover, it is worthy of note that the different bit-planes are of different importance, which are suitable to unequal protection (UEP) strategy. So, a new joint MDC and UEP method is proposed to protect the enhancement layer in this paper. In the proposed method, the MDC encoder/decoder is embedded into the normal enhancement layer encoder/decoder. By considering of the unequal protection of bit-plane and the redundancy of MDC, the two most significant bit-planes adopt the MDC-based strategy. While, the remaining bit-planes only encoded by normal enhancement layer coding system. Experimental results are demonstrated to testify the efficiency of our proposed method.
KEYWORDS: Image segmentation, Video, Video surveillance, Image processing, Image processing algorithms and systems, Information fusion, Image fusion, Motion detection, Video compression, Video processing
An algorithm for object segmentation from stereo sequences based on fusion of multi-cues of edge, disparity, motion and color is presented in this paper. Firstly, the accurate disparity field is obtained using a two-level disparity matching method based on image edge information. The morphological operators are then performed on the given disparity field to obtain coarse objects segments. "Split and merge" process is applied to extract the objects regions, and "erosion and dilation" process is used to fill some small inner holes in the target regions or smooth the discontinuous regions. On the other hand, spatial-temporal segments are obtained with image edge structure and motion change detection. Different object boundaries can be articulated according to disparity and spatial-temporal segments. At last, the multi-objects are extracted by further fusion of the color information. Experiments indicate this algorithm is an effective method for segmenting multi-objects overlapped each other from stereoscopic video that usually is difficult to be done in the case of monocular video.
KEYWORDS: Edge detection, Genetic algorithms, Detection and tracking algorithms, Genetics, Nonlinear filtering, Linear filtering, Signal to noise ratio, Communication engineering, Roads, Image filtering
A new stereo matching scheme based on image edge and genetic algorithm (GA) is presented to improve the conventional stereo matching method in this paper. In order to extract robust edge feature for stereo matching, infinite symmetric exponential filter (ISEF) is firstly applied to remove the noise of image, and nonlinear Laplace operator together with local variance of intensity are then used to detect edges. Apart from the detected edge, the polarity of edge pixels is also obtained. As an efficient search method, genetic algorithm is applied to find the best matching pair. For this purpose, some new ideas are developed for applying genetic algorithm to stereo matching. Experimental results show that the proposed methods are effective and can obtain good results.
This paper presents a hybrid Bayesian approach based on MRF/GRF and active contour models for disparity estimation and segmentation using stereo images. Smooth and accurate disparity field is obtained by using hierarchical MRF and GRF models. In the procedure of disparity estimation, hierarchical overlapped block matching and a fast search method are incorporated to improve the precision and computation. Then pixel-wise refinement is performed on the initial disparity field with edge information to get a smooth and consistent disparity field with sharp boundary. Finally, the active contour model is used to extract the disparity contours by jointly exploiting the information of edge and disparity. The resulting disparity field and corresponding contours are very useful in object-based stereo image coding and object segmentation. Experimental results illustrate the performance of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.