The diffusion model has been widely applied in various aspects of artificial intelligence due to its flexible and diverse generative performance. However, there is a lack of research on applying diffusion models in the field of depth map restoration. This is primarily because the diverse generation capabilities are not sufficient to meet the requirements for depth map completion. Depth map completion requires rational completion in the current scene. In order to adapt diffusion models for depth map completion tasks, this paper proposes the Multi Condition Diffusion Model (MCDM). It allows the addition of conditional information to constrain the model’s rational completion of ill-regions. The MultiConditionLN module effectively adds multiple conditions to the depth map completion task. This module uses the completion region mask and the input image as conditions to constrain the model’s generation process. This enables the model to complete the regions that need restoration based on the scene of the input image. The proposed model achieves promising results on depth map datasets.
We present a stereo matching approach referred to as HLocalExp-CM by exploiting the hierarchical local contextual information and a confidence map based on a new grid structure. The proposed approach preserves fine depth edges and extracts accurate disparities in weak texture, textureless, and repeated texture regions. The proposed approach adopts a two-stage optimization strategy. In the framework of first stage, a multiresolution cost aggregation is minimized to reduce the search space of the disparity plane of each pixel. The second stage iteratively optimizes the confidence map and a global energy function to progressively improve the disparity accuracy for each pixel. The confidence map is estimated through classifying the pixels into distinctive and ambiguous ones by computing the decreasing rate of the multiresolution cost aggregation and then performs a spatial propagation and plane refinement for the update of the disparity of each pixel, thereby successfully eliminating the ambiguity of nondistinctive pixels. The global energy function based on a pairwise Markov random field uses cross-scale cost aggregation for taking advantage of context information of objects in different scenarios on local grid regions, which is different from the deep learning technique uses convolution layers extracting the context information. The proposed approach is evaluated on Middlebury benchmark V3, and is ranked first based on “bad 2.0 all metric,” a widely used criterion for the evaluation of stereo images, while the eighth place on “bad 2.0 nonocc metric” (recorded on July 24, 2021).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.