Versatile Video Coding (VVC) is the most recent and efficient video-compression standard of ITU-T and ISO/IEC. It follows the principle of a hybrid, block-based video codec and offers a high flexibility to select a coded representation of a video. While encoders can exploit this flexibility for compression efficiency, designing algorithms for fast encoding becomes a challenging problem. This problem has recently been attacked with data-driven methods that train suitable neural networks to steer the encoder decisions. On the other hand, an optimized and fast VVC software implementation is provided by Fraunhofer’s Versatile Video Encoder VVenC. The goal of this paper is to investigate whether these two approaches can be combined. To this end, we exemplarily incorporate a recent CNN-based approach that showed its efficiency for intra-picture coding in the VVC reference software VTM to VVenC. The CNN estimates parameters that restrict the multi-type tree (MTT) partitioning modes that are tested in rate-distortion optimization. To train the CNN, the approach considers the Lagrangian rate-distortion-time cost caused by the parameters. For performance evaluation, we compare the five operational points reachable with the VVenC presets to operational points that we reach by using the CNN jointly with the presets. Results show that the combination of both approaches is efficient and that there is room for further improvements.
KEYWORDS: Computer programming, High dynamic range imaging, Video coding, Video, Video compression, Standards development, Signal processing, Classification systems
Versatile Video Coding (H.266/VVC) was standardized in July 2020, around seven years after its predecessor, High Efficiency Video Coding (H.265/HEVC). Typical for a successor standard, VVC aims to offer 50% bitrate savings at similar visual quality, which was confirmed in official verification tests. While HEVC provided large compression efficiency improvements over Advanced Video Coding (H.264/AVC), fast development of video technology ecosystem required more in terms of functionality. This resulted in various amendments being specified for HEVC including screen content, scalability and 3D-video extensions, which fragmented the HEVC market, rendering only the base specification being widely supported across a wide range of devices. To mitigate this, the VVC standard was from the start designed with versatile use cases in mind, and provides wide-spread support already in the first version. Shortly after the finalization of VVC, an open optimized encoder implementation VVenC was published, aiming to provide the potential of VVC at shorter runtime than the VVC reference software VTM. VVenC also supports additional features like multi-threading, rate control and subjective quality optimizations. While the software is optimized for random-access high-resolution video encoding, it can be configured to be used in alternative use cases. This paper discusses the performance of VVenC beyond its main use case, using different configurations and content types. Application specific performance is also discussed. It is shown that VVenC can mostly match VTM performance with less computation, and provides attractive additional faster working points with bitrate reduction tradeoffs.
The Intra Subpartition (ISP) mode is one of the intra prediction tools incorporated to the new Versatile Video Coding (VVC) standard. ISP divides a luma intra-predicted block along one dimension into 2 or 4 smaller blocks, called subpartitions, that are predicted using the same intra mode. This paper describes the design of this tool and its encoder search implementation in the VVC Test Model 7.3 (VTM-7.3) software. The main challenge of the ISP encoder search is the fact that the mode pre-selection based on the sum of absolute transformed differences typically utilized for intra prediction tools is not feasible in the ISP case, given that it would require knowing beforehand the values of the reconstructed samples of the subpartitions. For this reason, VTM employs a different strategy aimed to overcome this issue. The experimental tool-off tests carried out for the All Intra configuration show a gain of 0.52% for the 22-37 Quantization Parameter (QP) range with an associated encoder runtime of 85%. The results are improved to a 1.06% gain and an 87% encoder runtime in the case of the 32-47 QP range. Analogously, for the tool-on case the results for the 22-37 QP range are a 1.17% gain and a 134% encoder runtime and this improves in the 32-47 QP range to a 1.56% gain and a 126% encoder runtime.
The development of the emerging Versatile Video Coding (VVC) standard was motivated by the need of significant bit-rate reductions for natural video content as well as content for different applications, such as computer generated screen content. The signal characteristics of screen content video are different to the ones of natural content. These include sharp edges as well as at areas of the same color. In block-based hybrid video coding designs, as employed in VVC and its predecessors standards, skipping the transform stage of the prediction residual for screen content signals can be beneficial due to the different residual signal characteristics. In this paper, a modified transform coefficient level coding tailored for transform skip residual signals is presented. This includes no signaling of the last significant position, a coded block ag for every subblock, modified context modeling and binarization as well as a limit for the number of context coded bins per sample. Experimental results show bit-rate savings up to 3.45% and 9.55% for two different classes of screen content test sequences coded in a random access configuration.
This paper provides a technical overview of the most probable modes (MPM)-based multiple reference line (M-MRL) intra-picture prediction that was adopted into the Versatile Video Coding (VVC) standard draft at the 12th JVET meeting. M-MRL applies not only the nearest reference line but also farther reference lines to MPMs for intra-picture prediction. The highlighted aspects of the adopted M-MRL scheme include the signaling of the reference line index, discontinuous reference lines, the reference sample construction and prediction for farther reference lines, and the joint reference line and intra mode decisions at encoder side. Experimental results are provided to evaluate the performance of M-MRL on top of the VVC test model VTM-2.0.1 together with an analysis of discontinuous reference lines. The presented M-MRL provides 0.5% bitrate savings for an all intra and 0.2% for a random access configuration on average.
Today’s hybrid video coding systems typically perform an intra-picture prediction whereby blocks of samples are predicted from previously decoded samples of the same picture. For example, HEVC uses a set of angular prediction patterns to exploit directional sample correlations. In this paper, we propose new intra-picture prediction modes whose construction consists of two steps: First, a set of features is extracted from the decoded samples. Second, these features are used to select a predefined image pattern as the prediction signal. Since several intra prediction modes are proposed for each block-shape, a specific signalization scheme is also proposed. Our intra prediction modes lead to significant coding gains over state of the art video coding technologies.
KEYWORDS: Computer programming, Video coding, Video, Quantization, Standards development, Video compression, Bragg cells, 3D video compression, 3D video streaming, Spatial resolution
This work presents a performance evaluation of the current status of two distinct lines of development in future video coding technology: the so-called AV1 video codec of the industry-driven Alliance for Open Media (AOM) and the Joint Exploration Test Model (JEM), as developed and studied by the Joint Video Exploration Team (JVET) on Future Video Coding of ITU-T VCEG and ISO/IEC MPEG. As a reference, this study also includes reference encoders of the respective starting points of development, as given by the first encoder release of AV1/VP9 for the AOM-driven technology, and the HM reference encoder of the HEVC standard for the JVET activities. For a large variety of video sources ranging from UHD over HD to 360° content, the compression capability of the different video coding technology has been evaluated by using a Random Access setting along with the JVET common test conditions. As an outcome of this study, it was observed that the latest AV1 release achieved average bit-rate savings of ~17% relative to VP9 at the expense of a factor of ~117 in encoder run time. On the other hand, the latest JEM release provides an average bit-rate saving of ~30% relative to HM with a factor of ~10.5 in encoder run time. When directly comparing AV1 and JEM both for static quantization parameter settings, AV1 produces an average bit-rate overhead of more than 100% relative to JEM at the same objective reconstruction quality and, in addition, with a factor of ~2.7 in encoder run time. Even when operated in a two-pass ratecontrol mode, AV1 lags behind both the JEM and HM reference encoder with average bit-rate overheads of ~55% and ~9.5%, respectively, although the latter being configured along one-pass static quantization parameter settings.
The H.265/MPEG-H High Efficiency Video Coding (HEVC) standard provides a significant increase in coding efficiency compared to its predecessor, the H.264/MPEG-4 Advanced Video Coding (AVC) standard, which however comes at the cost of a high computational burden for a compliant encoder. Motion estimation (ME), which is a part of the inter-picture prediction process, typically consumes a high amount of computational resources, while significantly increasing the coding efficiency. In spite of the fact that both H.265/MPEG-H HEVC and H.264/MPEG-4 AVC standards allow processing motion information on a fractional sample level, the motion search algorithms based on the integer sample level remain to be an integral part of ME. In this paper, a flexible integer sample ME framework is proposed, thereby allowing to trade off significant reduction of ME computation time versus coding efficiency penalty in terms of bit rate overhead. As a result, through extensive experimentation, an integer sample ME algorithm that provides a good trade-off is derived, incorporating a combination and optimization of known predictive, pattern-based and early termination techniques. The proposed ME framework is implemented on a basis of the HEVC Test Model (HM) reference software, further being compared to the state-of-the-art fast search algorithm, which is a native part of HM. It is observed that for high resolution sequences, the integer sample ME process can be speed-up by factors varying from 3.2 to 7.6, resulting in the bit-rate overhead of 1.5% and 0.6% for Random Access (RA) and Low Delay P (LDP) configurations, respectively. In addition, the similar speed-up is observed for sequences with mainly Computer-Generated Imagery (CGI) content while trading off the bit rate overhead of up to 5.2%.
The popularity of low-delay video applications dramatically increased over the last years due to a rising demand for realtime video content (such as video conferencing or video surveillance), and also due to the increasing availability of relatively inexpensive heterogeneous devices (such as smartphones and tablets). To this end, this work presents a comparative assessment of the two latest video coding standards: H.265/MPEG-HEVC (High-Efficiency Video Coding), H.264/MPEG-AVC (Advanced Video Coding), and also of the VP9 proprietary video coding scheme. For evaluating H.264/MPEG-AVC, an open-source x264 encoder was selected, which has a multi-pass encoding mode, similarly to VP9. According to experimental results, which were obtained by using similar low-delay configurations for all three examined representative encoders, it was observed that H.265/MPEG-HEVC provides significant average bit-rate savings of 32.5%, and 40.8%, relative to VP9 and x264 for the 1-pass encoding, and average bit-rate savings of 32.6%, and 42.2% for the 2-pass encoding, respectively. On the other hand, compared to the x264 encoder, typical low-delay encoding times of the VP9 encoder, are about 2,000 times higher for the 1-pass encoding, and are about 400 times higher for the 2-pass encoding.
KEYWORDS: Scalable video coding, Video, Copper, Video coding, Computer programming, Electronic filtering, Video surveillance, Spatial resolution, Quantization, Semantic video
This paper describes an extension of the upcoming High Efficiency Video Coding (HEVC) standard for supporting
spatial and quality scalable video coding. Besides scalable coding tools known from scalable profiles of prior video
coding standards such as H.262/MPEG-2 Video and H.264/MPEG-4 AVC, the proposed scalable HEVC extension
includes new coding tools that further improve the coding efficiency of the enhancement layer. In particular, new coding modes by which base and enhancement layer signals are combined for forming an improved enhancement layer prediction signal have been added. All scalable coding tools have been integrated in a way that the low-level syntax and decoding process of HEVC remain unchanged to a large extent. Simulation results for typical application scenarios demonstrate the effectiveness of the proposed design. For spatial and quality scalable coding with two layers, bit-rate savings of about 20-30% have been measured relative to simulcasting the layers, which corresponds to a bit-rate overhead of about 5-15% relative to single-layer coding of the enhancement layer.
The most recent video compression technology is High Efficiency Video Coding (HEVC). This soon to be completed standard is a joint development by Video Coding Experts Group (VCEG) of ITU-T and Moving Picture Experts Group (MPEG) of ISO/IEC. As one of its major technical novelties, HEVC supports variable prediction and transform block sizes using the quadtree approach for block partitioning. In terms of entropy coding, the Draft International Standard (DIS) of HEVC specifies context-based adaptive binary arithmetic coding (CABAC) as the single mode of operation. In this paper, a description of the specific CABAC-based entropy coding part in HEVC is given that is related to block structures and transform coefficient levels. In addition, experimental results are presented that indicate the benefit of the transform-coefficient level coding design in HEVC in terms of improved coding performance and reduced complexity.
With the prospective High Effciency Video Coding (HEVC) standard as jointly developed by ITU-T VCEG and ISO/IEC MPEG, a new step in video compression capability is achieved. Technically, HEVC is a hybrid video-coding approach using quadtree-based block partitioning together with motion-compensated prediction. Even though a high degree of adaptability is achieved by quadtree-based block partitioning, this approach is intrinsically tied to certain drawbacks which may result in redundant sets of motion parameters to be transmitted. In order to remove those redundancies, a block-merging algorithm for HEVC is proposed. This algorithm generates a single motion-parameter set for a whole region of contiguous motion-compensated blocks. Simulation results show that the proposed merging technique works more effciently than a conceptually similar direct mode.
KEYWORDS: Video, Video surveillance, Scalable video coding, Signal to noise ratio, Spatial resolution, Video coding, Computer programming, Signal processing, Quantization, Temporal resolution
The extension of H.264/AVC hybrid video coding towards scalable video coding (SVC) using motion-compensated temporal filtering (MCTF) is presented. Utilizing the lifting approach to implement MCTF, the motion compensation features of H.264/AVC can be re-used for the MCTF prediction step and extended in a straightforward way for the MCTF update step. The MCTF extension of H.264/AVC is also incorporated into a video codec that provides SNR, spatial, and (similar to hybrid video coding) temporal scalability. The paper provides a description of these techniques and presents experimental results that validate their efficiency. In addition applications of SVC to video transmission and video surveillance are described.
Recently, two new international image and video coding standards have been released: the wavelet-based JPEG2000 standard designed basically for compressing still images, and H.264/AVC, the newest generic standard for video coding. As part of the JPEG2000 suite, Motion-JPEG2000 extends JPEG2000 to a range of applications originally associated with a pure video coding standard like H.264/AVC. However, currently little is known about the relative performance of Motion-JPEG2000 and H.264/AVC in terms of coding efficiency on their overlapping domain of target applications requiring the random access of individual pictures. In this paper, we report on a comparative study of the rate-distortion performance of Motion-JPEG2000 and H.264/AVC using a representative set of video material. Our experimental coding results indicate that H.264/AVC performs
surprisingly well on individually coded pictures in comparison to the highly sophisticated still image compression technology of JPEG2000. In addition to the rate-distortion analysis, we also provide a brief comparison of the evaluated coding algorithms in terms of complexity and functionality.
There is a considerable amount of literature about image denoising using wavelet-based methods. Some new ideas where also reported using fractal methods. In this paper we propose a hybrid wavelet-fractal denoising method. Using a non-subsampled overcomplete wavelet transform we present the image as a collection of translation invariant copies in different frequency subbands. Within this multiple representation we do a fractal coding which tries to approximate a noise free image. The inverse wavelet transform of the fractal collage leads to the denoised image. Our results are comparable to some of the most efficient known denoising methods.
In this paper, we propose a spatially adaptive wavelet thresholding method using a context model that has been inspired by our prior work on image coding. The proposed context model relies on an estimation of the weighted variance in a local window of scale and space. Appropriately chosen weights are used to model the predominant correlations for a reliable statistical estimation. By iterating the context-based thresholding operation, a more accurate reconstruction can be achieved. Experimental results show that our proposed method yields significantly improved visual quality as well as lower mean squared error compared to the best recently published results in the denoising literature.
In this paper, we present a novel design of a wavelet-based video coding algorithm within a conventional hybrid framework of temporal motion-compensated prediction and transform coding. Our proposed algorithm involves the incorporation of multi-frame motion compensation as an effective means of improving the quality of the temporal prediction. In addition, we follow the rate-distortion optimizing strategy of using a Lagrangian cost function to discriminate between different decisions in the video encoding process. Finally, we demonstrate that context-based adaptive arithmetic coding is a key element for fast adaptation and high coding efficiency. The combination of overlapped block motion compensation and frame-based transform coding enables blocking-artifact free and hence subjectively more pleasing video. In comparison with a highly optimized MPEG-4 Advanced Simple Profile coder, our proposed scheme provides significant performance gains in objective quality of 2.0-3.5 dB PSNR.
This paper describes a video coding algorithm that combines new ideas in motion estimation, wavelet filter design, and wavelet-based coding techniques. A motion compensation technique using image warping and overlapped block motion compensation is employed to reduce temporal redundancies in a given image sequence. This combined motion model has the advantage of representing more complex motion than simple block matching schemes. Spatial decorrelation of the motion compensated residual images is performed using an one- parametric family of biorthogonal IIR wavelet filters coupled with a highly efficient pre-coding scheme. Experimental results demonstrate substantial improvements in objective quality of 1.0 - 2.2 dB PSNR compared to the H.263+ standard. Especially at very low bit-rates where the reconstruction quality of block-based coders suffers from visually annoying blocking artifacts the proposed coding scheme produces a superior subjective quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.