Compared to traditional RGB images, light field images fulfill the demand for high-dimensional information in salient object detection. In this paper, we propose a salient object detection method based on light field depth estimation. In particular, we adopted a supervised learning approach for the design of the light field depth estimation algorithm. Subaperture images in each of the 4 viewing directions are constructed into epipolar plane images (EPI) as inputs to the multistream network. The multi-stream network consists of 4 branches, each containing a specific number of convolutional modules to extract the depth information in the epipolar plane images of the corresponding directions. The features extracted from the 4 branches are fed into the merged network to compute the correlation between the different epipolar plane images. The obtained disparity maps and RGB images are simultaneously inputted into a two-stream convolutional neural network for training. The trained model can achieve highly generalized and robust light field salient object detection. Experiments with real-world light field images indicate the superior performance of our method for clearly plotting the boundaries of saliency objects.
Light field salient object detection (SOD) is an essential research topic in computer vision, but robust saliency detection in complex scenes is still very challenging. We propose a new method for accurate and robust light field SOD via convolutional neural networks containing feature enhancement modules. First, the light field dataset is extended by geometric transformations such as stretching, cropping, flipping, and rotating. Next, two feature enhancement modules are designed to extract features from RGB images and depth maps, respectively. The obtained feature maps are fed into a two-stream network to train the light field SOD. We propose a mutual attention approach in this process, extracting and fusing features from RGB images and depth maps. Therefore, our network can generate an accurate saliency map from the input light field images after training. The obtained saliency map can provide reliable a priori information for tasks such as semantic segmentation, target recognition, and visual tracking. Experimental results show that the proposed method achieves excellent detection performance in public benchmark datasets and outperforms the state-of-the-art methods. We also verify the generalization and stability of the method in real-world experiments.
The reconstruction accuracy of polarization images is limited by the performance of the modulators such as polarizers/retarders etc. and mostly requires calibration. Mathematical models are usually used to find the optimal modulation conditions, and involve trade-offs between measurement time and accuracy. In this paper, we propose a full-Stokes image reconstruction (FIR) network to reconstruct polarization images from random modulated images. The network is constructed based on a cycled pixel-to-pixel conditional adversarial network, which has clear advantages in learning the mapping from input images to output images. The network with losses that utilize the physical features between the Stokes vectors is trained, and the reconstruction of complete polarization information is achieved without polarimetric calibration by our method. Simulations and experiments demonstrate the network’s wavelength independence and modulation independence, proving the effectiveness and robustness of the FIRNet in this paper.
In this paper, a new method is proposed for light field SOD by using convolutional neural networks. First, the light field dataset is extended by geometric transformations such as stretching, cropping, flipping, rotating, etc. The augmented data are then weighted with natural data to train the light field SOD. We propose a mutual attention approach in this process, extracting and fusing features from RGB images as well as depth maps. Therefore, our network can generate an accurate saliency map from the input light field images after training. The obtained saliency map can provide reliable a priori information for tasks such as semantic segmentation, target recognition, and visual tracking.
Specular highlight in images is detrimental to accuracy in object recognition tasks. The prior model-based methods for single image highlight removal (SHIR) are limited in images with large highlight regions or achromatic regions, and recent learning-based methods do not perform well due to lack of proper datasets for training either. A network for SHIR is proposed, which is trained with losses that utilize image intrinsic features and can reconstruct a smooth and natural specular-free image from a single input highlight image. Dichromatic reflection model is used to compute the pseudo specular-free image for providing complementary information for the network. A real-world dataset with highlight images and the corresponding ground-truth specular-free images is collected for network training and quantitative evaluation. The proposed network is validated by comprehensive quantitative experiments and outperforms state-of-the-art highlight removal approaches in structural similarity and peak signal-to-noise ratio. Experimental results also show that the network could improve the recognition performance in applications of computer vision. Our source code is available at https://github.com/coach-wang/SIHRNet.
Due to the low cost and easy deployment, the depth estimation of monocular cameras has always attracted attention of researchers. As good performance based on deep learning technology in depth estimation, more and more training models has emerged for depth estimation. Most existing works have required very promising results that belongs to supervised learning methods, but corresponding ground truth depth data for training is inevitable that makes training complicated. To overcome this limitation, an unsupervised learning framework is used for monocular depth estimation from videos, which contains depth map and pose network. In this paper, better results can be achieved by optimizing training models and improving training loss. Besides, training and evaluation data is based on standard dataset KITTI (Karlsruhe Institute of Technology and Toyota Institute of Technology). In the end, the results are shown through comparing with different training models used in this paper.
Spectral confocal technology is an important three-dimensional measurement technology with high accuracy and non-contact; however, traditional spectral confocal system usually consists of prisons and several lens whose volume and weight is enormous and heavy, besides, due to the chromatic aberration characteristics of ordinary optical lenses, it is difficult to perfectly focus light in a wide bandwidth. Meta-surfaces are expected to realize the miniaturization of conventional optical element due to its superb abilities of controlling phase and amplitude of wavefront of incident at subwavelength scale, and in this paper, an efficient spectral confocal meta-lens (ESCM) working in the near infrared spectrum (1300nm-2000nm) is proposed and numerically demonstrated. ESCM can focus incident light at different focal lengths from 16.7 to 24.5μm along a perpendicular off-axis focal plane with NA varying from 0.385 to 0.530. The meta-lens consists of a group of Si nanofins providing high polarization conversion efficiency lager than 50%, and the phase required for focusing incident light is well rebuilt by the resonant phase which is proportional to the frequency and the wavelength-independent geometric phase, PB phase. Such dispersive components can also be used in implements requiring dispersive device such as spectrometers.
A novel method is proposed in this paper for light field depth estimation by using a convolutional neural network. Many approaches have been proposed to make light field depth estimation, while most of them have a contradiction between accuracy and runtime. In order to solve this problem, we proposed a method which can get more accurate light field depth estimation results with faster speed. First, the light field data is augmented by proposed method considering the light field geometry. Because of the large amount of the light field data, the number of images needs to be reduced appropriately to improve the operation speed, while maintaining the confidence of the estimation. Next, light field images are inputted into our network after data augmentation. The features of the images are extracted during the process, which could be used to calculate the disparity value. Finally, our network can generate an accurate depth map from the input light field image after training. Using this accurate depth map, the 3D structure in real world could be accurately reconstructed. Our method is verified by the HCI 4D Light Field Benchmark and real-world light field images captured with a Lytro light field camera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.