SPIE is working with SAE International to develop lidar measurement standards for active safety systems. This multi-year effort aims to develop standard tests to measure the performance of low-cost lidar sensors developed for autonomous vehicles or advanced driver assistance systems, commonly referred to as automotive lidars. SPIE is sponsoring three years of testing to support this goal. We discuss the second-year test results. In year two, we tested nine models of automotive grade lidars, using child-size targets at short ranges and larger targets at longer ranges. We also tested the effect of high reflectivity signs near the targets, laser safety, and atmospheric effects. We observed large point densities and noise dependencies for different types of automotive lidars based on their scanning patterns and fields of view. In addition to measuring point density at a given range, we have begun to evaluate the point density in the presence of measurement impediments, such as atmospheric absorption or scattering and highly reflective corner cubes. We saw dynamic range effects in which bright objects, such as road signs with corner cubes embedded in the paint, make it difficult to detect low-reflectivity targets that are close to the high-reflectivity target. Furthermore, preliminary testing showed that atmospheric extinction in a water-glycol fog chamber is comparable to natural fog conditions at ranges that are meaningful for automotive lidar, but additional characterization is required before determining general applicability. This testing also showed that laser propagation through water-glycol fog results in appreciable backscatter, which is often ignored in automotive lidar modeling. In year two, we have begun to measure the effect of impediments to measuring the 3D point cloud density; these measurements will be expanded in year three to include interference with other lidars.
Wildfires are a key aspect of many ecosystems, but climate change has created conditions more conducive for devastating wildfires. Thus, it is imperative that relevant agencies know where small fires occur expeditiously. Remote sensing is a key tool for active fire detection (AFD), and satellite imagery in particular is useful due to covering wide areas. Semantic segmentation architectures like U-Net have been used for AFD and have proven very effective. In this paper, we apply a unique variant of U-Net called ResWnet towards AFD, using a large global dataset. ResWnet achieved a precision of 95% and an F-Score of 94.2%, which is better than a U-Net trained on the same dataset.
KEYWORDS: Ice, Image segmentation, Education and training, Databases, Convolution, Performance modeling, Data modeling, Tunable filters, Solar radiation models, Solar radiation
With increasing global temperatures due to anthropogenic climate change, seasonal sea ice in the Arctic has experienced rapid retreat, with increasing areal extent of meltponds that occur on the surface of retreating sea ice. Because meltponds have a much lower albedo than sea ice or snow, more solar radiation is absorbed by the underlying water, further accelerating the melting rate of sea ice. However, the dynamic nature of meltponds, which exhibit complex shapes and boundaries, makes manual analysis of their effects on underlying light and water temperatures tedious and taxing. Several classical image processing approaches have been extensively used for the detection of meltpond regions in the Arctic area. We propose a Convolutional Neural Network (CNN) based multiclass segmentation model termed NABLA-N (∇N) for automated detection and segmentation of meltponds. The architectural framework of NABLA-N consists of an encoding unit and multiple decoding units that decode from several latent spaces. The fusion of multiple feature spaces in the decoding units enables better representation of features due to the combination of low and high-level feature maps. The proposed model is evaluated on high-resolution aerial photographs of Arctic sea ice obtained during the Healy-Oden Trans Arctic Expedition (HOTRAX) in 2005 and NASA’s Operation IceBridge DMS L1B Geolocated and Orthorectified image data in 2016. These images are classified into three classes: meltpond, open water and sea ice. We determined that NABLA-N demonstrates superior performance on segmentation of meltpond data compared to other state-of-the-art networks such as UNet and Recurrent Residual UNet (R2UNet).
KEYWORDS: Data modeling, Diffusion, Image processing, Ice, Education and training, Colorimetry, RGB color model, Model based design, Statistical modeling, Image segmentation
As global warming causes climate change, extreme weather has become more common, posing a significant threat to life on Earth. One of the important indicators of climate change is the formation of melt ponds in the arctic region. Scarcity of large amount of annotated arctic sea ice data is a major challenge in training a deep learning model for the prediction of the dynamics of the melt ponds. In this research work, we use diffusion model, a class of generative models, to generate synthetic arctic sea ice data for further analysis of meltponds. Based on the training data, diffusion models can generate new and realistic data that are not present in the original dataset by focusing on the data distribution from a simple to a more complex distribution. First, simple distribution is transformed into a complex distribution by adding noise, such as a Gaussian distribution and through a series of invertible operations. Once trained, the model can generate new samples by starting from a simple distribution and diffusing it to the complex distribution, capturing the underlying features of the data. During inference, when generating new samples, the conditioning information is provided as input alongside the starting noise vector. This guides the diffusion process to produce samples that adhere to the specified conditions. We used high-resolution aerial photographs of Arctic region obtained during the Healy-Oden Trans Arctic Expedition (HOTRAX) in year 2005 and NASA’s Operation IceBridge DMS L1B Geolocated and Orthorectified data acquired in 2016 for the initial training of the generative model. The original image and synthetic image are assessed based on their chromatic similarity. We employed evaluation metric known as Chromatic Similarity Index (CSI) for the assessment purposes.
The massive shift in temperatures in the Arctic region has caused the increased Albedo effect as higher amount of solar energy is absorbed in the darker surface due to melting ice and snow. This continuous regional warming results in further melting of glaciers and loss of sea ice. Arctic melt ponds are important indicators of Arctic climate change. High-resolution aerial photographs are invaluable for identifying different sea ice features and are great source for validating, tuning, and improving climate models. Due to the complex shapes and unpredictable boundaries of melt ponds, it is extremely tedious, taxing, and time-consuming to manually analyze these remote sensing data that lead to the need for automatizing the technique. Deep learning is a powerful tool for semantic segmentation, and one of the most popular deep learning architectures for feature cascading and effective pixel classification is the UNet architecture. We introduce an automatic and robust technique to predict the bounding boxes for melt ponds using a Multiclass Recurrent Residual UNet (R2UNet) with UNet as a base model. R2UNet mainly consists of two important components in the architecture namely residual connection and recurrent block in each layer. The residual learning approach prevents vanishing gradients in deep networks by introducing shortcut connections, and the recurrent block, which provides a feedback connection in a loop, allows outputs of a layer to be influenced by subsequent inputs to the same layer. The algorithm is evaluated on Healy-Oden Trans Arctic Expedition (HO-TRAX) dataset containing melt ponds obtained during helicopter photography flights between 5 August and 30 September 2005. The testing and evaluation results show that R2UNet provides improved and superior performance when compared to UNet, Residual UNet (Res-UNet) and Recurrent U-Net (R-UNet).
The use of deep learning is particularly effective for biomedical applications involving semantic segmentation. In semantic segmentation, one of the most popular deep learning architectures is U-Net, which is specifically designed for feature cascading for pixel classification. There are several versions of U-Net, such as Residual U-Net (ResU-Net), Recurrent U-Net (RU-Net), and Recurrent Residual U-Net (R2U-Net), which have been proposed for improved performance. The recurrent connection in a layer of the neural network can create a cycle of transferring the output information of a layer back to itself as an input. Each layer's output responses can thus be thought of as additional input variables. The new model is based on Residues in Succession U-Net where the residues from successive layers extract reinforced information from the previous layers in addition to the recurrent feedback loop exhibiting several advantages. The improved learning and accumulation of the features in subsequent layers play a major part. The proposed model produces precise extraction and accumulation of features from each layer reinforcing the learning. The outputs of the combination of recurrent and residues in successive layers ensure better feature representation for segmentation tasks. We use a benchmark expert-annotated dataset viz. Structured Analysis of Retina (STARE) for measuring the abilities of the Residues in Succession Recurrent U-Net (RSR U-Net) to segment blood vessels in retinal images. The testing and evaluation results show that the new model provides improved performance when compared to U-Net, R2U-Net and Residues in Succession U-Net in the same experimentation setup.
Automatic ship detections in complex background during the day and night in infrared images is an important task. Additionally, we want to have the capability to detect the ships in various scales, orientations, and shapes. In this paper, we propose the use of neural network technology for this purpose. The algorithm used for this task is the Deep Neural Machine (DNM), which contains three different parts (backbone, neck, and head). Combining all three steps, this algorithm can extract the features, create prediction layers using different scales of the backbone, and give object predictions at different scales. The experimental results show that our algorithm is robust and efficient in detecting ships in complex background.
This Conference Presentation, “Learning classical image registration features using a deep learning architecture,” was recorded at SPIE Photonics West held in San Francisco, California, United States
This Conference Presentation, “Towards a deep-learning aided point cloud labeling suite,” was recorded at SPIE Photonics West held in San Francisco, California, United States
Multi-object tracking in wide-area motion imagery (WAMI) is facilitating great interest in the field of image processing that leads to numerous real-world applications. Among them, aircraft and unmanned aerial vehicles (UAV) with real-time robust visual trackers for long-term aerial maneuvering are currently attracting attention and have remarkably broadened the scope of applications of object tracking. In this paper, we present a novel attention-based feature fusion strategy, which effectively combines the template and searching region features. Our results demonstrate the efficacy of the proposed system on CLIF and UNICORN datasets.
We propose a modified U-Net architecture incorporating the residues from successive layers for the extraction of features in subsequent layers. The new Residues in Succession U-Net model is evaluated for blood vessel segmentation in retinal images on a benchmark expert-annotated dataset viz. Structured Analysis of Retina (STARE). The testing and evaluation results shows improved performance when compared to U-Net and R2U-Net in the same experimentation setup. A nonlinear image enhancement strategy is employed to improve the fine details in the images so that the network will be able to capture more information in further processing.
This Conference Presentation, “Deep neural machine for multimodal information fusion,” was recorded at SPIE Photonics West held in San Francisco, California, United States
With the emergence of advanced 2D and 3D sensors such as high-resolution visible cameras and less expensive lidar sensors, there is a need for a fusion of information extracted from senor modalities for accurate object detection, recognition, and tracking. To train a system with data captured by multiple sensors the regions of interest in the data must be accurately aligned. A necessary step in this process is a fine, pixel-level registration between multiple modalities. We propose a robust multimodal data registration strategy for automatically registering the visible and lidar data captured by sensors embedded in aerial vehicles. The coarse registration of the data is performed by utilizing the metadata, such as timestamps, GPS, and IMU information, provided by the data acquisition systems. The challenge is these modalities contain very different sets of information and are not able to be aligned using classical methods. Our proposed fine registration mechanism employs deep-learning methodologies for feature extraction of data in each modality. For our experiments, we use a 3D geopositioned aerial lidar dataset along with the visible data (coarsely registered) and extracted SIFT-like features from both of the data streams. These SIFT features are generated by appropriately trained deep-learning algorithms.
Point cloud completion aims to infer missing regions of a point cloud, given an incomplete point cloud. Like image inpainting, in the 2D domain, point cloud completion offers a way to recreate an entire point cloud, given only a subset of the information. However, current applications study only synthetic datasets with artificial point removal, such as the Completion3D dataset. Although these datasets are valuable, they are an artificial problem set that we can not apply to real-world data. This paper draws a parallel between point cloud completion and occlusion reduction in aerial lidar scenes. We propose a crucial change in the hierarchical sampling using selforganizing maps to propose new points representing the scene in a reduced resolution. These new points are a weighted combination of the original set using spatial and feature information. A new set of proposed points is more powerful than simply sampling existing points. We demonstrate this sampling technique by replacing the farthest point sampling in the Skip-attention Network with Hierarchical Folding (SA-Net) and show a significant increase in the overall results using the Chamfers distance as our metric. We also show that we can use this sampling method in the context of any technique which uses farthest point sampling.
Aerial object detection is one of the most important applications in computer vision. We propose a deep learning strategy for detection and classification of objects on the pipeline right of ways by analyzing aerial images captured by flying aircrafts or drones. Due to the limitation of sufficient aerial datasets for accurately training the deep learning systems, it is necessary to create an efficient methodology for object data augmentation of the training dataset to achieve robust performance in various environmental conditions. Another limitation is the computing hardware that could be installed on the aircraft, especially when it is a drone. Hence a balance between the effectiveness and efficiency of object detector needs to be considered. We propose an efficient weighted IOU NMS (intersection over union non-maxima suppression) method to speed up the post-processing time that satisfies the onboard processing requirement. Weighted IOU NMS utilizes confidence scores of all proposed bounding boxes to regenerate a mean box in parallel. It processes the bounding box score at the same instant without removing the bounding box or decreasing the bounding box score. We perform both quantitative and qualitative evaluations of our network architecture on multiple aerial datasets. The experimental results show that our proposed framework achieves better accuracy than the state-of-the-art methods for aerial object detection in various environmental conditions.
In the last few years, Deep Learning (DL) has been showing superior performance in different modalities of bio-medical image analysis. Several DL architectures have been proposed for classification, segmentation, and detection tasks in medical imaging and computational pathology. In this paper, we propose a new DL architecture, the NABLA-N network (∇N-Net), with better feature fusion techniques in decoding units for dermoscopic image segmentation tasks. The ∇N-Net has several advances for segmentation tasks. First, this model ensures better feature representation for semantic segmentation with a combination of low to high-level feature maps. Second, this network shows better quantitative and qualitative results with the same or fewer network parameters compared to other methods. In addition, the Inception Recurrent Residual Convolutional Neural Network (IRRCNN) model is used for skin cancer classification. The proposed ∇N-Net network and IRRCNN models are evaluated for skin cancer segmentation and classification on the benchmark datasets from the International Skin Imaging Collaboration 2018 (ISIC-2018). The experimental results show superior performance on segmentation tasks compared to the Recurrent Residual U-Net (R2U-Net). The classification model shows around 87% testing accuracy for dermoscopic skin cancer classification on ISIC2018 dataset.
In the last few years, deep learning approaches have been applied successfully in different modalities of medical imaging problems and achieved state-of-the-art accuracy. Due to the huge volume and variety of imaging modalities, it remains a large open research area. However, in this paper, we have applied Inception Residual Recurrent Convolutional Neural Network (IRRCNN) model for histopathological image classification where a new publicly available dataset named KIMIA Path960 is used. This database contains 960 histopathological images with 20 different classes (different types of tissue collected from 400 Whole Slide Images). In this implementation, we have evaluated the model with non-overlapping patches size of 64×64 pixels and the variant of samples are generated from each patch with different data augmentation techniques including rotation, shear, zooming, and horizontal and vertical flipping. The experimental results are compared against Linear Binary Pattern (LBP), bag-of-visual words (BoVW), and deep learning method with AlexNet and VGG16 networks. The IRRCNN model shows around 98.79% testing accuracy for augmented patch-level evaluation which is around 2.29% and 4% superior performance compared to Support Vector Machine with histogram intersection kernel (IKSVM) with BoVW and VGG16 methods respectively. Additionally, this evaluation also demonstrates that the deep feature representation-based method outperforms compared to a traditional feature-based method including LBP and BoVW for the histopathological image classification problem.
Semantic segmentation deep learning architectures provide incredible results in segmentation and classification of various scenes. These convolutional-based networks create deep representations for classification and have extended connected weight sets to improve the boundary characteristics of segmentation. We propose a multi-task architecture for these deep learning networks to further improve the boundary characteristics of neural networks. Basic edge detection architectures are able to develop good boundaries, but are unable to fully characterize the necessary boundary information from the imagery. We supplement these deep neural network architectures with specific boundary information to remove boundary features that are not indicative of the boundaries of the classified regions. We utilize various standard semantic segmentation datasets, like Cityscapes and MIT Scene Parsing Benchmark, to test and evaluate the network architectures. When compared to the original architectures, we observe an increase in segmentation accuracy and boundary recreation using this approach. The incorporation of multi-task learning helps improve the semantic segmentation results of the deep learning architectures.
Much research has been done in implementing deep learning architectures in detection and recognition tasks. Current work in auto-encoders and generative adversarial networks suggest the ability to recreate scenes based on previously trained data. It can be assumed that with the ability to recreate information is the ability to differentiate information. We propose a convolutional auto-encoder for both recreating information of the scene and for detection of vehicles from within the scene. In essence, the auto-encoder creates a low-dimensional representation of the data projected in a latent space, which can also be used for classification. The convolutional neural network is based on the concept of receptive fields created by the network, which are part of the detection process. The proposed architecture includes a discriminator network connected in the latent space, which is trained for the detection of vehicles. Through work in multi-task learning, it is advantageous to learn multiple representations of the data from different tasks to help improve task performance. To test and evaluated the network, we use standard aerial vehicle data sets, like Vehicle Detection in Aerial Imagery (VEDAI) and Columbus Large Image Format (CLIF). We observe that the neural network is able to create features representative of the data and is able to classify the imagery into vehicle and non-vehicle regions.
Many human detection algorithms are able to detect humans in various environmental conditions with high accuracy, but they strongly use color information for detection, which is not robust to lighting changes and varying colors. This problem is further amplified with infrared imagery, which only contains gray scale information. The proposed algorithm for human detection uses intensity distribution, gradient and texture features for effective detection of humans in infrared imagery. For the detection of intensity, histogram information is obtained in the grayscale channel. For extracting gradients, we utilize Histogram of Oriented Gradients for better information in the various lighting scenarios. For extraction texture information, center-symmetric local binary pattern gives rotational-invariance as well as lighting-invariance for robust features under these conditions. Various binning strategies help keep the inherent structure embedded in the features, which provide enough information for robust detection of the humans in the scene. The features are then classified using an adaboost classifier to provide a tree like structure for detection in multiple scales. The algorithm has been trained and tested on IR imagery and has been found to be fairly robust to viewpoint changes and lighting changes in dynamic backgrounds and visual scenes.
KEYWORDS: Neural networks, Unmanned aerial vehicles, RGB color model, Video, Detection and tracking algorithms, Optical tracking, Electronic filtering, Video surveillance, Image filtering, Complex systems, Tracking and scene analysis, Real time video processing, Real time image processing
Object trackers for full-motion-video (FMV) need to handle object occlusions (partial and short-term full), rotation, scaling, illumination changes, complex background variations, and perspective variations. Unlike traditional deep learning trackers that require extensive training time, the proposed Progressively Expanded Neural Network (PENNet) tracker methodology will utilize a modified variant of the extreme learning machine, which encompasses polynomial expansion and state preserving methodologies. This reduces the training time significantly for online training of the object. The proposed algorithm is evaluated on the DAPRA Video Verification of Identity (VIVID) dataset, wherein the selected highvalue-targets (HVTs) are vehicles.
Current object tracking implementations utilize different feature extraction techniques to obtain salient features to track objects of interest which change in different types of imaging modalities and environmental conditions.nChallenges in infrared imagery for object tracking include object deformation, occlusion, background variations, and smearing, which demands high performance algorithms. We propose the directional ringlet intensity feature transform to encompass significant levels of detail while being able to track low resolution targets. The algorithm utilizes a weighted circularly partitioned histogram distribution method which outperforms regular histogram distribution matching by localizing information and utilizing the rotation invariance of the circular rings. The image also utilizes directional edge information created by a Frei-Chen edge detector to improve the ability of the algorithm in different lighting conditions. We find the matching features using a weighted Earth Movers Distance (EMD), which results in the specific location of the target object. The algorithm is fused with image registration, motion detection from background subtraction and motion estimation from Kalman filtering to create robustness from camera jitter and occlusions. It is found that the DRIFT algorithm performs very well under different operating conditions in IR imagery and yields better results as compared to other state-of-the-art feature based object trackers. The testing is done on two IR databases, a collected database of vehicle and pedestrian sequences and the Visual Object Tracking (VOT) IR database.
Object tracking in wide area motion imagery is a complex problem that consists of object detection and target tracking over time. This challenge can be solved by human analysts who naturally have the ability to keep track of an object in a scene. A computer vision solution for object tracking has the potential to be a much faster and efficient solution. However, a computer vision solution faces certain challenges that do not affect a human analyst. To overcome these challenges, a tracking process is proposed that is inspired by the known advantages of a human analyst. First, the focus of a human analyst is emulated by doing processing only the local object search area. Second, it is proposed that an intensity enhancement process should be done on the local area to allow features to be detected in poor lighting conditions. This simulates the ability of the human eye to discern objects in complex lighting conditions. Third, it is proposed that the spatial resolution of the local search area is increased to extract better features and provide more accurate feature matching. A quantitative evaluation is performed to show tracking improvement using the proposed method. The three databases, each grayscale sequences that were obtained from aircrafts, used for these evaluations include the Columbus Large Image Format database, the Large Area Image Recorder database, and the Sussex database.
The human brain has the capability to process high quantities of data quickly for detection and recognition tasks. These tasks are made simpler by the understanding of data, which intentionally removes redundancies found in higher dimensional data and maps the data onto a lower dimensional space. The brain then encodes manifolds created in these spaces, which reveal a specific state of the system. We propose to use a recurrent neural network, the nonlinear line attractor (NLA) network, for the encoding of these manifolds as specific states, which will draw untrained data towards one of the specific states that the NLA network has encoded. We propose a Gaussian-weighted modular architecture for reducing the computational complexity of the conventional NLA network. The proposed architecture uses a neighborhood approach for establishing the interconnectivity of neurons to obtain the manifolds. The modified NLA network has been implemented and tested on the Electro-Optic Synthetic Vehicle Model Database created by the Air Force Research Laboratory (AFRL), which contains a vast array of high resolution imagery with several different lighting conditions and camera views. It is observed that the NLA network has the capability for representing high dimensional data for the recognition of the objects of interest through its new learning strategy. A nonlinear dimensionality reduction scheme based on singular value decomposition has found to be very effective in providing a low dimensional representation of the dataset. Application of the reduced dimensional space on the modified NLA algorithm would provide fast and more accurate recognition performance for real time applications.
A new rotation-invariant pattern recognition technique, based on spectral fringe-adjusted joint transform correlator (SFJTC) and histogram representation, is proposed. Synthetic discriminant function (SDF) based joint transform correlation (JTC) techniques have shown attractive performance in rotation-invariant pattern recognition applications. However, when the targets present in a complex scene, SDF-based JTC techniques may produce false detections due to inaccurate estimation of rotation angle of the object. Therefore, we herein propose an efficient rotation-invariant JTC scheme which does not require a priori rotation training of the reference image. In the proposed technique, a Vectorized Gaussian Ringlet Intensity Distribution (VGRID) descriptor is also proposed to obtain rotation-invariant features from the reference image. In this step, we divide the reference image into multiple Gaussian ringlets and extract histogram distribution of each ringlet, and then concatenate them into a vector as a target signature. Similarly, an unknown input scene is also represented by the VGRID which produces a multidimensional input image. Finally, the concept of the SFJTC is incorporated and utilized for target detection in the input scene. The classical SFJTC was proposed for detecting very small objects involving only few pixels in hyperspectral imagery. However, in our proposed algorithm, the SFJTC is applied for a two-dimensional image without limitation of the size of objects and most importantly it achieves rotation-invariant target discriminability. Simulation results verify that the proposed scheme performs satisfactorily in detecting targets in the input scene irrespective of rotation of the object.
In this paper, we evaluate the feature extraction technique of Recoursing Energy Efficiency on electroencephalograph
data for human emotion recognition. A protocol has been established to elicit five distinct emotions (joy, sadness,
disgust, fear, surprise, and neutral). EEG signals are collected using a 256-channel system, preprocessed using band-pass
filters and Laplacian Montage, and decomposed into five frequency bands using Discrete Wavelet Transform. The
Recoursing Energy Efficiency (REE) is calculated and applied to a Multi-Layer Perceptron network for classification.
We compare the performance of REE features with conventional energy based features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.