Sparse-view computed tomography (CT) has been attracting attention for its reduced radiation dose and scanning time. However, analytical image reconstruction methods such as filtered back-projection (FBP) suffer from streak artifacts due to sparse-view sampling. Because the streak artifacts are deterministic errors, we argue that the same artifacts can be reasonably estimated using a prior image (i.e., smooth image of the same patient) with known imaging system parameters. Based on this idea, we reconstruct an FBP image with sparse-view projection data, regenerate the streak artifacts by forward and back-projection of a prior image with sparse views, and then subtract them from the original FBP image. For the success of this approach, the prior image needs to be patient specific and easily obtained from given sparse-view projection data. Therefore, we introduce a new concept of implicit neural representations for modeling attenuation coefficients. In the implicit neural representations, neural networks output a patient-specific attenuation coefficient value for an input pixel coordinate. In this way, network’s parameters serve as an implicit representation of a CT image. Unlike conventional deep learning approaches that utilize a large, labeled dataset, an implicit neural representation is optimized using only sparse-view projection data of a single patient. This avoids having a bias toward a group of patients in the dataset and helps to capture unique characteristics of the individual properly. We validated the proposed method using fan-beam CT simulation data of an extended cardiac-torso phantom and compared the results with total variation-based iterative reconstruction and an image-based convolutional neural network.
In this work, we proposed a non-linear observer model based on convolutional neural network and compare its performance with LG-CHO for four alternative forced choice detection task using simulated breast CT images. In our network, each convolutional layer contained 3×3 filters and a leaky-ReLU as an activation function, but a pooling layer and a zero padding to the output of each convolutional layer were not used unlike general convolutional neural network. Network training was conducted using ADAM optimizer with two design parameters (i.e., network depth and width). The optimal value of the design parameter was found by brute force searching, which spanned up to 30 for depth and 128 for channel, respectively. To generate training and validation dataset, we generated anatomical noise images using a power law spectrum of breast anatomy. 50% volume glandular fraction was assumed, and 1 mm diameter signal was used for detection task. The generated images were recon- structed using filtered back-projection with a fan beam CT geometry, and ramp and Hanning filters were used as an apodization filter to generate different noise structures. To train our network, 125,000 signal present images and 375,000 signal absent images were reconstructed for each apodization filter. To measure detectability, we used percent correction with 4,000 images, generated independently from training and validation dataset. Our results show that the proposed network composed of 30 layers and 64 channels provides higher detectability than LG-CHO. We believe that the improved detectability is achieved by the presence of the non-linear module (i.e., leaky-ReLU) in the network.
Convolutional neural network (CNN) is now the most promising denoising methods for low-dose computed tomography (CT) images. The goal of denoising is to restore original details as well as to reduce noise, and the performance is largely determined by the loss function of the CNN. In this work, we investigate the denoising performance of CNN for three different loss functions in low dose CT images: mean squared error (MSE), perception loss using the pretrained VGG network (VGG loss), and the weighted summation of MSE and VGG losses (VGGMSE loss). CNNs are trained to map the quarter dose CT images to normal dose CT images in a supervised fashion. The image quality of denoised images is evaluated by normalized root mean squared error (NRMSE), structural similarity index (SSIM), mean and standard deviation (SD) of HU values, and the task SNR of non-prewhitening eye filter observer model (NPWE). Our results show that the CNN trained with MSE loss achieves the best performance in NRMSE and SSIM despite significant image blurs. On the other hand, the CNN trained with VGG loss reports the best score in the SD with well-preserved details but has the worst accuracy in the mean HU value. CNN trained with VGGMSE loss shows the best performance in terms of tSNR and the mean HU value and consistently high performance in other metrics. In conclusion, VGGMSE loss can subside the drawbacks of MSE or VGG loss, thus much more effective than them for CT denoising tasks.
In recent years, CNN has been gaining attention as a powerful denoising tool after the pioneering work [7], developing 3-layer convolutional neural network (CNN). However, the 3-layer CNN may lose details or contrast after denoising due to its shallow depth. In this study, we propose a deeper, 7-layer CNN for denoising low-dose CT images. We introduced dimension shrinkage and expansion steps to control explosion of the number of parameters, and also applied the batch normalization to alleviate difficulty in optimization. The network was trained and tested with Shepp-Logan phantom images reconstructed by FBP algorithm from projection data generated in a fan-beam geometry. For a training set and a test set, the independently generated uniform noise with different noise levels was added to the projection data. The image quality improvement was evaluated both qualitatively and quantitatively, and the results show that the proposed CNN effectively reduces the noise without resolution loss compared to BM3D and the 3-layer CNN.
We conducted a feasibility study to generate mammography images using a deep convolutional generative adversarial network (DCGAN), which directly produces realistic images without 3-D model passing through any complex rendering algorithm, such as ray tracing. We trained DCGAN with breast 2D mammography images, which were generated from anatomical noise. The generated X-ray mammography images were successful in that the image preserves reasonable quality and retains the visual patterns similar to training images. Especially, generated images share the distinctive structure of training images. For the quantitative evaluation, we used the mean and variance of beta values of generated images and observed that they are very similar to those of training images. Although the general distribution of generated images matches well with those of training images, there are several limitations of the DCGAN. First, checkboard pattern like artifacts are found in generated images, which is a well-known issue of deconvolution algorithm. Moreover, training GAN is often unstable so to require manual fine-tunes. To overcome such limitations, we plan to extend our idea to conditional GAN approach for improving training stability, and employ an auto-encoder for handling artifacts. To validate our idea on real data, we will apply clinical images to train the network. We believe that our framework can be easily extended to generate other medical images.
A depth camera is widely used in various applications because it provides a depth image of the scene in real time. However, due to the limited power consumption, the depth camera presents severe noises, incapable of providing the high quality 3D data. Although the smoothness prior is often employed to subside the depth noise, it discards the geometric details so to degrade the distance resolution and hinder achieving the realism in 3D contents.
In this paper, we propose a perceptual-based depth image enhancement technique that automatically recovers the depth details of various textures, using a statistical framework inspired by human mechanism of perceiving surface details by texture priors. We construct the database composed of the high quality normals. Based on the recent studies in human visual perception (HVP), we select the pattern density as a primary feature to classify textures. Upon the classification results, we match and substitute the noisy input normals with high quality normals in the database. As a result, our method provides the high quality depth image preserving the surface details. We expect that our work is effective to enhance the details of depth image from 3D sensors and to provide a high-fidelity virtual reality experience.
This paper provides a novel approach to estimating the lighting given a pair of color and depth image of non-homogeneous objects. Existing methods can be classified into two groups depending on the lighting model, either the basis model or point light model. In general, the basis model is effective for low frequency lighting while the point model is suitable for high frequency lighting. Later, a wavelet based method combines the advantages from both sides of the basis model and point light model. Because it represents all frequency lighting efficiently, we use the wavelets to reconstruct the lighting. However, all of the previous methods cannot reconstruct lighting from non-homogeneous objects. Our main contribution is to process the non-homogeneous object by dividing it into multiple homogeneous segments. From these segments, we first initialize material parameters and extract lighting coefficients accordingly. We then optimize material parameters with the estimated lighting. The iteration is repeated until the estimated lighting converged. To demonstrate the effectiveness of our method, we conduct six different experiments corresponding to the different number, size, and position of lighting. Based on the experiment study, we confirm that our algorithm is effective for identifying the light map.
Computed tomography (CT) is a medical imaging technology that projects computer-processed X-rays to acquire tomographic images or the slices of specific organ of body. A motion artifact caused by patient motion is a common problem in CT system and may introduce undesirable artifacts in CT images. This paper analyzes the critical problems in motion artifacts and proposes a new CT system for motion artifact compensation. We employ depth cameras to capture the patient motion and account it for the CT image reconstruction. In this way, we achieve the significant improvement in motion artifact compensation, which is not possible by previous techniques.
Time-of-flight (ToF) and structured light depth cameras capture dense three-dimensional (3-D) geometry that is of great benefit for many computer vision problems. For the past couple of years, depth image based gesture recognition, 3-D reconstruction, and robot localization have received explosive interest in the literature. However, depth measurements present unique systematic errors, specifically when objects have specularity or translucency. We present a quantitative evaluation and analysis of depth errors using both ToF and structured light depth cameras. The evaluation framework used includes a dataset of carefully taken depth images with radiometric/geometric variations of real world objects and their ground truth depth. Our analysis and experiments reveal the different characteristics of the two sensor types and indicate that obtaining high quality depth image from real-world scene still remains a challenging, unsolved problem.
A light probe is commonly used for measuring the illumination of a real scene. Instead of equipping a man-made light probe such as a mirror ball, we propose to use a face in images as a natural light probe. To that end, we construct a statistical reflectance model for faces and use this model to extract the lighting and the reflectance field of an input face. With an iterative procedure, we can obtain the lighting condition from an unknown face image. As a byproduct of this procedure, we also estimate the reflectance field of the same face. By identifying the lighting condition of scene, we can provide an effective solution for various practical applications. First, we can insert a virtual object seamlessly into a real scene by illuminating the virtual object under the lighting present in the real scene. Second, we can relight a face under an arbitrary lighting condition using the estimated reflectance fields. Third, we can swap two unknown faces using the estimates of both lighting and reflectance fields. Based on various experiments, we show that the proposed algorithm is an effective tool for many practical applications: inserting a virtual object into a real scene, face relighting, and face swapping.
In this paper, we propose a new face relighting algorithm powered by a large database of face images captured
under various known lighting conditions (a Multi-PIE database). Key insight of our algorithm is that a face
can be represented by an assemble of patches from many other faces. The algorithm finds the most similar face
patches in the database in terms of the lighting and the appearance. By assembling the matched patches, we can
visualize the input face under various lighting conditions. Unlike existing face relighting algorithms, we neither
use any kinds of face model nor make a physical assumption. Instead, our algorithm is a data-driven approach,
synthesizing the appearance of the image patch using the appearance of the example patch. Using a data-driven
approach, we can account for various intrinsic facial features including the non-Lambertian skin properties as
well as the hair. Also, our algorithm is insensitive to the face misalignment. We demonstrate the performance of
our algorithm by face relighting and face recognition experiments. Especially, the synthesized results show that
the proposed algorithm can successfully handle various intrinsic features of an input face. Also, from the face
recognition experiment, we show that our method is comparable to the most recent face relighting work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.