Multi-modal pedestrian detection, which integrates visible and thermal sensors, has been developed to overcome many limitations of visible-modal pedestrian detection, such as poor illumination, cluttered background, and occlusion. By adopting the combination of multiple modalities, we can efficiently detect pedestrians even with poor visibility. Nevertheless, the critical assumption of multi-modal pedestrian detection is that multi-modal images are perfectly aligned. In general, however, this assumption often becomes invalid in real-world situations. Viewpoints of the different modal sensors are usually different. Then, the positions of pedestrians on the different modal images have disparities. We proposed a multi-modal faster-RCNN specifically designed to handle misalignment between two modalities. The faster-RCNN consists of a region proposal network (RPN) and a detector. We introduce position regressors for both modalities in the RPN and the detector. Intersection over union (IoU) is one of the useful metrics for object detection but is defined only for a single-modal image. We extend it into multi-modal IoU to evaluate the preciseness of both modalities. Our experimental results with the proposed evaluation metrics demonstrate that the proposed method has comparable performance with state-of-the-art methods and outperforms them for data with significant misalignment.
We present a gradient-domain image reconstruction framework with the chroma-preserving pixel-wise intensity-range constraint and the base-structure constraint. The existing methods for manipulating base structures and detailed textures are classifiable into two major streams: gradient-domain and layer-decomposition. To generate detail-preserving and artifact-free output images, we combine the two approaches’ benefits into the proposed framework by introducing the chroma-preserving intensity-range constraint and the base-structure constraint. To preserve details of the input image, the proposed method takes advantage of reconstructing the output image in the gradient domain, whereas the output intensity is guaranteed to lie within the specified intensity range by the intensity-range constraint. The reconstructed image lies close to the base structure by the base-structure constraint, which is effective for restraining artifacts. Using this chroma-preserving pixel-wise luminance constraint, the proposed algorithm does not require post-processing such as intensity clipping or rescaling. The proposed framework directly generates the output luminance, which guarantees that the output RGB intensities are within the target intensity range, preserving the chromatic component. Experiments demonstrated that (1) the proposed framework is effective for various applications such as tone mapping, seamless image cloning, detail enhancement, and image restoration, and (2) the proposed framework can preserve chroma components compared with the existing methods.
A far-infrared (FIR) image contains important invisible information for various applications such as night vision and fire detection, while a visible image includes colors and textures in a scene. We present a coaxial visible and FIR camera system accompanied to obtain the complementary information of both images simultaneously. The proposed camera system is composed of three parts: a visible camera, a FIR camera, and a beam-splitter made from silicon. The FIR radiation from the scene is reflected at the beam-splitter, while the visible radiation is transmitted through this beam-splitter. Even if we use this coaxial visible and FIR camera system, the alignment between the visible and FIR images are not perfect. Therefore, we also present the joint calibration method which can simultaneously estimate accurate geometric parameters of both cameras, i.e. the intrinsic parameters of both cameras and the extrinsic parameters between both cameras. In the proposed calibration method, we use a novel calibration target which has a two-layer structure where thermal emission property of each layer is different. By using the proposed calibration target, we can stably and precisely obtain the corresponding points of the checker pattern in the calibration target from both the visible and the FIR images. Widely used calibration tools can accurately estimate both camera parameters. We can obtain aligned visible and FIR images by the coaxial camera system with precise calibration using two-layer calibration target. Experimental results demonstrate that the proposed camera system is useful for various applications such as image fusion, image denoising, and image up-sampling.
Recent developments of long wave infrared (LWIR) devices and LWIR sensor technologies enable us to obtain an LWIR image with high bit depth and low signal-noise ratio. To exploit these recent developments, we propose a novel temperature visualization method that can simultaneously represent global distribution and local details of the input temperature. The global temperature distribution is represented by pseudo color. On the other hand, to manipulate the local temperature details, the output luminance is generated by gradient-domain image reconstruction. Experimental results on real LWIR images show the effectiveness of the proposed method.
We present an image fusion algorithm for a visible image and a near-infrared image. The proposed algorithm synthesizes a fused image that includes high-visibility information of both images while reducing artifacts caused by geometric and illumination inconsistencies. In the proposed fusion, the high-visibility area is labeled at each pixel by global optimization based on the local visibility and inconsistency. The local visibility is evaluated using a local contrast. The inconsistency is also locally estimated based on a learning-based approach. The fused luminance is constructed using Poisson image reconstruction that preserves the gradient of the selected high-visibility areas. The proposed fusion framework has various applications, which include denoising, haze removal, and image enhancement. Experimental results show that the proposed method has comparable or even superior performance to existing methods designed for specific applications.
This paper presents a novel image fusion algorithm for a visible image and a near infrared (NIR) image. For
the proposed fusion, the image is selected pixel-by-pixel based on local saliency. In this paper, the local saliency
is measured by a local contrast. Then, the gradient information is fused and the output image is constructed
by a Poisson image editing which preserves the gradient information of both images. The effectiveness of the
proposed fusion algorithm is demonstrated in various applications including denoising, dehazing, and image
enhancement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.