PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13206, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
AI-based Localization, Fusion, and Robot Surveillance
Utilizing surveillance cameras for passive visual positioning is at the forefront of indoor security research. Various methods have significantly improved the accuracy of passive visual positioning for pedestrians. Nevertheless, the long-distance error of passive vision has recently garnered attention as a critical issue. To address this problem, we propose a novel method named BIPSG, comprising forward propagation and backward propagation processes. In the forward propagation process, we propose a method for constructing the dynamic constraint region model. This method fuses passive visual positioning results with pedestrian detection box information to obtain the model required for geomagnetism in backward propagation, completing the forward propagation of information from passive vision to geomagnetism. In the backward propagation process, firstly, we propose a connected features amendment-based geomagnetic positioning method, using the constructed dynamic constraint region model as a constraint for geomagnetic matching and connected features extraction, subsequently amending matching results by connected features to reduce the probability of geomagnetic mismatching. Then, amended geomagnetic positioning results are fused with passive vision to reduce the long-distance error. The performance of the method was evaluated using the most common scene captured by an indoor surveillance camera. The experimental results show that our BIPSG method has reduced the average positioning error by 41.67% and the root mean square error by 31.87%, compared to the state-of-the-art method proposed in our previous research. The proposed method can effectively reduce the long-distance error of passive vision, achieving outstanding positioning accuracy. Additionally, pedestrian trajectories demonstrate the stability and continuity of the positioning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the broad range of threats experienced across the battlespace, modern defense systems have trended toward high levels of interconnectedness on the assumption that information from systems spanning numerous domains will be fused at the speed of relevance. One regime emblematic of these types of challenges is that of modern air defense, in which threats are increasing in sophistication and numerosity. To ensure the success of next generation defense systems, we need solutions where legacy and next generation sensors coexist and cohesively integrate information across domains and sources. Neural network-based approaches have demonstrated significant capabilities in dealing with complex data processing and fusion systems, however, in the context of safety critical defense systems there are various limitations that hinder their deployment. In particular, the lack of explainable outputs, the need for large amounts of data, which is typically lacking or severely limited in defense settings, and their high computational costs make NN-based solutions unsuitable. Often overlooked are more traditional and intuitive machine learning techniques such as Bayesian networks.
The attributes of Bayesian networks such flexibility, ease of use, lightweight computational needs, and innate explainability and reasoning capabilities has already led to their successful application in air defense for target tracking, identification, and intent classification. These same attributes also make Bayesian networks suitable for use in high level multi-sensor fusion. In this work we showcase the feasibility of using Bayesian networks as an extensible and dynamic multi-sensor fusion system to perform reasoning over any number of disparate black-box approaches, as well as their utility in producing more reliable, trustable, and interpretable results than any individual sensor system operating independently. We also demonstrate the ability of Bayesian networks to produce compelling results without incurring the computational overhead and difficulties associated with interpreting the results of large neural network-based approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The recent advances in mobile robot surveillance have significantly enhanced supervision tasks. Unlike traditional CCTV systems, these robots improve patrolling capabilities, ensuring higher area coverage for situational awareness. Leveraging AI and Convolutional Neural Networks (CNN), these robots demonstrate enhanced visual perception for situational awareness. However, the complexity of deploying multiple, task-specific CNNs poses significant computational challenges, especially due to the robots’ limited hardware capacity. Additionally, defining an optimal supervision strategy is challenging. While existing literature faces these challenges through anomaly and scene change detection (SCD), this paper proposes a novel visual perception methodology for executing at the edge hardware. This approach defines an optimal scheduling method for detecting unauthorized human and vehicle access, potential physical access threat analysis, and scene change detection (SCD). Experimental results validate the methodology’s effectiveness in improving preventive security in restricted zones.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work presents a depth image refinement technique designed to enhance the usability of a commercial camera in underwater environments. Stereo vision-based depth cameras offer dense data that is well-suited for accurate environmental understanding. However, light attenuation in water introduces challenges such as missing regions, outliers, and noise in the captured depth images, which can degrade performance in computer vision tasks. Using the Intel RealSense D455 camera, we captured data in a controlled water tank and proposed a refinement technique leveraging the state-of-the-art Depth-Anything model. Our approach involves first capturing a depth image with the Intel RealSense camera and generating a relative depth image using the Depth-Anything model based on the recorded color image. We then apply a mapping between the Depth-Anything generated relative depth data and the RealSense depth image to produce a visually appealing and accurate depth image. Our results demonstrate that this technique enables precise depth measurement at distances of up to 1.2 meters underwater.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Complex behaviours can make it difficult for human observers to maintain a coherent understanding of a highdimensional system’s state due to the large number of degrees of freedom that have to be monitored and reasoned about. This problem can lead to cognitive overload in operators who are monitoring these systems. An example of this is the problem of observing drone swarms to determine their behaviour and infer possible goals. Generative artificial intelligence techniques, such as variational autoencoders (VAEs), can be used to assist operators in understanding these complex behaviours by reducing the dimensionality of the observations.
This paper presents a modified boid simulation that produces data that is representative of a swarm of coordinated drones. A sensor model is employed to simulate observation noise. A VAE architecture is proposed that can encode data from observations of homogeneous swarms and produce visualisations detailing the potential states of the swarm, the current state of the swarm, and the goals that these states relate to. One of the challenges addressed in this paper is the permutation variance problem of working with large datasets of points which represent interchangeable, unlabelled objects. This is addressed by the proposed VAE architecture through the use of a PointNet-inspired layer that implements a symmetric function approximation, and chamfer distance loss function. An ablation study for the proposed permutation invariance modifications and a sensitivity analysis focused on the algorithm’s behaviour with respect to sensor noise are presented. The use of the decoder to create goal boundaries on the visualisation, the use of the visualisation for swarm trajectories, and the explainability of the visualisation are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Maritime surveillance is crucial for ensuring compliance with regulations and protecting critical maritime infrastructure. Conventional tracking systems, such as AIS or LRIT, are susceptible to manipulation as they can be switched off or altered. To address this vulnerability, there is a growing need for a visual monitoring system facilitated by unmanned systems such as unmanned aerial vehicles (UAVs) and unmanned surface vehicles (USVs). Equipped with sensors and cameras, these unmanned vehicles collect vast amounts of data that often demand time-consuming manual processing. This study presents a robust method for automatic target vessel re-identification from RGB imagery captured by unmanned vehicles. Our approach uniquely combines visual appearance and textual data recognized from the acquired images to enhance the accuracy of target vessel identification and authentication against a known vessel database. We achieve this through utilizing Convolutional Neural Network (CNN) embeddings and Optical Character Recognition (OCR) data, extracted from the vessel’s images. This multi-modal approach surpasses the limitations of methods relying solely on visual or textual information. The proposed prototype was evaluated on two distinct datasets. The first dataset contains small vessels without textual data and serves to test the performance of the fine-tuned CNN model in identifying target vessels, trained with a triplet loss function. The second dataset encompasses medium and large-sized vessels amidst challenging conditions, highlighting the advantage of fusing OCR data with CNN embeddings. The results demonstrate the feasibility of a computer vision model that combines OCR data with CNN embeddings for target vessel identification, resulting in significantly enhanced robustness and classification accuracy. The proposed methodology holds promise for advancing the capabilities of autonomous visual monitoring systems deployed by unmanned vehicles, offering a resilient and effective solution for maritime surveillance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In marine safety and security, the ability to rapidly, autonomously, and accurately detect and identify ships is the highest priority. This study presents a novel approach using deep learning to accurately identify ships based on their International Maritime Organisation (IMO) numbers. The performance of various sophisticated deep learning models, such as YOLOv8, RetinaNet, Faster R-CNN, EfficientDet, and DETR, was assessed in accurately identifying IMO numbers from images. The RetinaNet and Faster R-CNN models achieved the highest mAP50-95 scores of 70.0% and 64.1%, respectively, with inference times of low scale. On the other hand, YOLOv8, with a slightly better mAP50-95 of 65.1%, showed an exceptional balance between accuracy and speed (9.20 ms), making it well-suited for real-time applications. However, models like EfficientDet and DETR experienced difficulties achieving lower mAP50-95 values of 33.65% and 48.7%, respectively, especially when analysing low-resolution images. Following detection, the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) was used to improve the clarity of extracted IMO digits. It is followed by applying Easy Optical Character Recognition (EasyOCR) for accurate extraction. Despite the enhancements, minor identification errors continued, suggesting a requirement for additional refinement. These findings reveal the capacity of deep learning to significantly augment maritime security by enhancing the efficiency and precision of ship identification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Maritime surveillance relies on advanced technologies to ensure the safety and security of national and international waters, particularly in monitoring vessel activities. Distributed Acoustic Sensing (DAS) has emerged as a powerful technology for detecting and analyzing underwater acoustic signatures along fiber-optic cables. However, the lack of annotated DAS datasets in maritime contexts, combined with the high dimensionality and unstructured nature of recorded data streams, hinders the deployment of automated solutions that rely on labeled data for vessel detection. This work introduces DASBoot, a novel annotation toolkit designed to enhance maritime surveillance by aligning vessel signatures from DAS data with Automatic Identification System (AIS) messages. Our approach integrates data processing, fusion, and visualization within a cohesive workflow that significantly reduces the cognitive load on analysts while improving the accuracy of vessel identification. The experimental results demonstrate the effectiveness of our method for dataset annotation and pave the way for future advancements in DAS-based automated maritime surveillance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the area of Joint ISR (Joint Intelligence, Surveillance and Reconnaissance), it is important to have robust (semi- )automatic support for the identification and processing of text-based information products like formal reports. The representation of reports as text unifies contributions from heterogeneous information sources (e.g. delivered by various intelligence disciplines). Such text-based information products also often encapsulate dense information of high-quality. Therefore, the capability for machine processing to adequately integrate various pieces of information from different sources and display them to the user in a coherent and comprehensible manner is essential for maximizing the utility and accessibility of intelligence data/report information. Current AI models and methods from the field of Natural Language Processing (NLP) can make valuable contributions to the processing of text-based information in general, e.g. textsummarization, extraction of named entities or other important information-parts. They are widely used for social media applications. However, to adopt this capability for the military domain, they have to be adapted to the specific vocabulary of the Joint ISR domain and the grammatical structures. Especially challenging is the limited grammatical variance found within these text products, limiting the scarcity of available sample data suitable for training purposes even further. This publication examines the variations in training data for NLP methodologies that emerge when dealing with the Joint ISR domain and its reporting procedures. An approach is presented to capture entities within formalized texts using Named Entity Recognition (NER) and to illustrate how this approach can support the processing of textual information, especially formal reports, in the field of Joint ISR. The value of formal reporting is also emphasized for achieving syntactic and semantic interoperability within Joint ISR networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The application of sensor data obtained from patrol ships, drones, and specific coastal locations may contribute to the development of effective and scalable monitoring systems for enhancing coastal security and maritime domain awareness. Typically, daytime surveillance relies on high-resolution images captured by visible sensors, whereas infrared imaging can be employed under low-visibility conditions. In this study, we focus on a critical aspect of maritime surveillance: deep learning-based person detection. The collected datasets included visible and infrared images of passengers on ships, offshore wind turbine decks, and people in water. In addition, vessel classification was considered. To exploit both spectral domains, we applied a preprocessing strategy to the thermal data, transforming the infrared images to resemble the visible ones. We fine-tuned the detector using this data. Our findings show that the deep learning model can effectively distinguish between human and vessel signatures, despite challenges such as low pixel resolution, cluttered backgrounds, and varying postures of individuals. Moreover, our results suggest that the extracted features from the infrared data significantly improve the detector’s performance in the visible domain by using appropriate preprocessing techniques. However, we observed a limited transferability of models that have been pre-trained on visible images to the infrared spectral domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
AI-based Identification, Authentication and Privacy, and Vision-based AI
Anonymizing personal data in multimedia content (image, audio and text) has become crucial for secure datasharing while adhering to the rigorous data compliance requirements of the European Union (EU) General Data Protection Regulation (GDPR). Given the substantial volume of data involved, manual verification of anonymization accuracy is not feasible due to the high potential for human error and the impracticality of scaling such efforts. Consequently, automated or semi-automated processes are indispensable. However, it is important to note that these methodologies cannot guarantee absolute anonymization, potentially leading to inadvertent disclosure of personal information and associated legal and privacy implications. Therefore, when dealing with extensive multimedia datasets, it is strongly advised to conduct a comprehensive anonymization risk assessment. In response to this challenge, we introduce a novel methodology with an innovative design to quantitatively evaluate the effectiveness and reliability of the anonymization techniques by generating metrics to calculate risk indicators to conduct a comprehensive anonymization risk assessment. This methodology is built based on de-identification techniques to protect personal data while preserving data integrity. Our approach leverages a novel algorithmic framework that helps humans inspect the anonymized dataset, ensuring higher data privacy and security. The methodology detects non-anonymized personal data within an extensive dataset automatically. This is achieved by extracting characteristics related to personal data during the anonymization process and correlating attributes from the surrounding data using sophisticated AI-driven analysis. Afterwards, a rule-based algorithm is applied to the extracted characteristics from both processes to identify and qualitatively assess the anonymization risk. We demonstrate the applicability and effectiveness of our methodology through a focused application on license plates and face anonymization, utilizing a dataset of non-annotated vehicles and human images. By offering a scalable solution to evaluate anonymization risk while data-sharing, our methodology represents a pivotal step towards achieving GDPR compliance and processing practices, facilitating safer data-sharing environments across industries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Artificial-intelligence (AI) applications need a large amount of data to train a reliable model. For document authentication - which is relevant for border management, immigration or visa applications - this data is very sensitive. To develop document authentication technology for authorities from multiple countries, it is essential to train AI models on the distributed datasets provided by each authority. Federated learning (FL) enables the training on datasets of multiple organizations while preserving the privacy by sharing only the model updates (gradients) and not the local data. This helps avoiding the cross-border sharing of personal data. However, there are two main concerns related to FL: the communication costs and the possible leakage of personal data through the model updates. The solution can be found in secure sparse gradient aggregation (SSGA). In this method, we use top-k compression to speed up the communication. Additionally, a residual memory is implemented to improve performance. The aggregation is made more secure by adding pairwise noise to the gradients. In this paper, we show that SSGA can be implemented for various computer-vision tasks, such as image classification, object detection, semantic segmentation, and person re-identification, which are relevant for document authentication and other security applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Understanding the caveats of deploying a Spiking Neural Networks (SNNs) in an embedded system is important, due to their potential to achieve high efficiency in applications using event-based data. This work investigates the effects of the quantisation of SNNs from the perspective of deploying a model onto FPGAs. Three SNN models were trained using Quantisation-aware training (QAT). In addition, three different types of quantisation were applied on all three models. Further, these models are trained while they are represented through various custom bit-depths using Brevitas. Then, the performance metric curves such as accuracy, training loss, and test loss resulted from QAT were viewed as performance distribution, to show that the significant accuracy drop found in these curves manifests itself as a bi-modal distribution This work then investigates whether the decrease in accuracy is consistent across different models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rapid reaction to a specific event is a critical feature for an embedded computer vision system to ensure reliable and secure interaction with the environment in resource-limited real-time applications. This requires high-level scene understanding with ultra-fast processing capabilities and the ability to operate at extremely low power. Existing vision systems, which rely on traditional computation techniques, including deep learning-based approaches, are limited by the compute capabilities due to large power dissipation and slow off-chip memory access. These challenges are evident in environments with constrained power, bandwidth and hardware resources, such as in the applications of drones and robot navigation in expansive areas.
A new NEuromorphic Vision System (NEVIS) is proposed to address the limitations of existing computer vision systems for many resource-limited real-time applications. NEVIS mimics the efficiency of the human visual system by encoding visual signals into spikes, which are processed by neurons with synaptic connections. The potential of NEVIS is explored through an FPGA-based accelerator implementation on a Xilinx Kria board that achieved 40× speed up compared to a Raspberry Pi 4B CPU. This work informs the future potential of NEVIS in embedded computer vision system development.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vision-language foundation models for image classification, such as CLIP, suffer from a poor performance when applied to images of objects dissimilar to the training data. A relevant example of such a mismatch can be observed when classifying military vehicles. In this work, we investigate techniques to extend the capabilities of CLIP for this application. Our contribution is twofold: (a) we study various techniques to extend CLIP with knowledge on military vehicles and (b) we propose a two-stage approach to classify novel vehicles based on only one example image.
Our dataset consists of 13 military vehicle classes, with 50 images per class. Various techniques to extend CLIP with knowledge on military vehicles were studied, including: context optimization (CoOp), vision-language prompting (VLP), and visual prompt tuning (VPT); of which VPT was selected. Next, we studied one-shot learning approaches to have the extended CLIP classify novel vehicle classes based on only one image. The resulting two-stage ensemble approach was used in a number of leave-one-group-out experiments to demonstrate performance.
Results show that, by default, CLIP has a zero-shot classification performance of 48% for military vehicles. This can be improved to >80% by fine-tuning with example data, at the cost of losing the ability to classify novel (previously unseen) military vehicle types. A naive one-shot approach results in a classification performance of 19%, whereas our proposed one-shot approach achieves 70% for novel military vehicle classes.
In conclusion, our proposed two-stage approach can extend CLIP for military vehicle classification. In the first stage, CLIP is provided with knowledge on military vehicles using domain adaptation with VPT. In the second stage, this knowledge can be leveraged for previously unseen military vehicle classes in a one-shot setting.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Categorizing dark web image content is critical for identifying and averting potential threats. However, this remains a challenge due to the nature of the data, which includes multiple co-existing domains and intra-class variations, as well as continuously having newer classes due to the rapidly augmenting amount of criminals in Darkweb. While many methods have been proposed to classify this image content, multi-label multi-class continuous learning classification remains under explored. In this paper, we propose a novel and efficient strategy for transforming a zero-shot single-label classifier into a few-shot multi-label classifier. This approach combines a label empowering methodology with few-shot data. We use CLIP, a conservative learning model that uses image-text pairs, to demonstrate the effectiveness of our strategy. Furthermore, we demonstrate the most appropriate continuous learning methodology to overcome with the challenges of accessing old data and training over and over again for each newly added class. Finally, we compare the performance with multi-label methodologies applied to CLIP, leading multi-label methods and the continuous learning approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Assessing a person’s emotional state may be relevant to security in situations where it may be beneficial to assess one’s intentions or mental state. In various situations, facial expressions that often indicate emotions, may not be communicated or may not necessarily correspond to the actual emotional state. Here we review our study, in which we classify emotional states from very short facial video signals. The emotion classification process does not rely on stereotypical facial expressions or contact-based methods. Our raw data are short facial videos obtained at some different known emotional states. A facial video includes a component of diffused light from the facial skin, affected by the cardiovascular activity that might be influenced by the emotional state. From the short facial videos, we extracted unique spatiotemporal physiological-affected features employed as input features into a deep-learning model. Results show average emotion classification accuracy of about 47.36%, compared to 20% chance accuracy given 5 emotion classes, which can be considered high for the cases where expressions are hardly observed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To be able to use machine learning models in practice, it is important to know when their predictions can be trusted. Confidence estimations can help end users to calibrate their trust, avoiding under- or over-reliance, and to decide when human interference is needed. In our work, we further develop the eXplainable AI (XAI) method PERformance EXplainer (PERFEX), which was originally proposed for tabular datasets. We adapt PERFEX such that it can be used to accurately estimate the image classifier confidence. This was done by applying the method on feature-reduced activation values of the last layer of image classification models. We coin this approach PERFEX-I. We show that PERFEX-I performs on par with existing methods for confidence estimation such as Temperature Scaling and Deep Ensembles. The Expected Calibration Error (ECE) on the ImageNet dataset is reduced from 6.83 to 1.71 for ResNet50 and from 8.84 to 1.44 for Swin-B compared to using the Softmax scores. Additionally, PERFEX-I groups images that may share common reasons for errors, and visual analysis of these groups can reveal patterns of the model’s behavior.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Attention-based Siamese networks have shown remarkable results for occlusion-aware single-camera Multi-Object Tracking (MOT) applied to persons as they can effectively combine motion and appearance features. However, expanding their usage for multi-camera MOT in crowded areas such as train stations and airports is challenging. In these kinds of scenarios, there is a higher visual appearance variability of people as the viewpoints from where they are observed while they move could be very diverse. This adds extra difficulty to the already high variability coming from partial occlusions and body pose differences (standing, sitting, or lying). Besides, attention-based MOT methods are computationally intensive and therefore difficult to scale to multiple cameras. To overcome these problems, in this paper, we propose a method that exploits contextual information of the scenario such as the viewpoint, occlusion, and pose-related visual appearance characteristics of persons to improve the inter and intra feature representations in attention-based Siamese networks. Our approach combines a smart context-aware training data batching and hard triplet mining strategy with an automated model complexity tuning procedure to train the optimal model for the scenario. This method improves the fusion of motion and appearance features of persons for the data association cost matrix of the MOT algorithm. Experimental results, validated on the MOT17 dataset, demonstrate the effectiveness and efficiency of our approach, showcasing promising results for real-world applications requiring robust MOT capabilities in multi-camera setups.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Protecting infrastructures, particularly buildings, requires the development of automatic detection and mapping technologies for essential components such as pipes in fire extinguishing systems. Despite machine learning advancements in instance segmentation, acquiring extensive, well-annotated real data remains challenging. This leads to the exploration of synthetic images as an alternative solution. This study addresses the limitations of synthetic training data, which can lack realistic features present in real-world images, resulting in biased models and decreased performance. Leveraging Unreal Engine 5 (UE5), a synthetic dataset resembling realworld data from a specific scene is generated. Creating such realistic worlds is time-consuming, so varying domain randomization levels and preprocessing using image filters are explored. With different training set combination, consisting of various distribution of real, synthetic and augmented data, multiple models are trained based on Mask R-CNN and YOLO. After the training phase, an optimization procedure is applied to each model, enabling a comparative analysis of pipe instance segmentation quality for different algorithms based on the composition of the training set. The findings shed light on the efficacy and potential risks of employing synthetic data for training various instance segmentation models. This study provides valuable insights into mitigating challenges associated with data limitations in the training of state-of-the-art neural networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Artificial intelligence (AI) models are at the core of improving computer-assisted tasks such as object detection, target recognition, and mission planning. The development of AI models typically requires a large set of representative data, which can be difficult to acquire in the military domain. Challenges include uncertain and incomplete data, complex scenarios, and scarcity of historical or threat data. A promising alternative to real-world data is the use of simulated data for AI model training, but the gap between real and simulated data can impede effective transfer from synthetic to real-world scenarios. In this study, we provide an overview of the state-of-the-art methods for exploiting simulation data to train AI models for military applications. We identify specific simulation considerations and their effects on AI model performance, such as simulation variation and simulation fidelity. We investigate the importance of these aspects by showcasing three studies where simulated data is used to train AI models for military applications, namely vehicle detection, target classification and course of action support. In the first study, we focus on military vehicle detection in RGB images and study the effect of simulation variation and the combination of a large set of simulated data with few real samples. Subsequently, we address the topic of target classification in sonar imagery, investigating how to effectively integrate a small set of simulated objects into a large set of low-frequency synthetic aperture sonar data. We conclude with a study on mission planning, where we experiment with the fidelities of different aspects in our simulation environment, such as the level of realism in movement patterns. Our findings highlight the potential of using simulated data to train AI models, but also illustrate the need for further research on this topic in the military domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Space-based sensor platforms, including both current and planned future satellites, are capable of surveilling Earth-based objects and scenes from high altitudes. Overhead persistent infrared (OPIR) is a growing surveillance technique where thermal-waveband infrared sensors are deployed on orbiting satellites to look down and image the Earth. Challenges include having sufficient image resolution to detect, differentiate and identify ground-based objects while monitoring through the atmosphere. Demonstrations have shown machine learning algorithms to be capable of processing imagebased scenes, detecting and recognizing targets amongst surrounding clutter. Performant algorithms must be robustly trained to successfully complete such a complex task, which typically requires a large set of training data on which statistical predictions can be based. Electro-optical infrared (EO/IR) remote sensing applications, including OPIR surveillance, necessitate a substantial image database with suitable variation for adept learning to occur. Diversity in background scenes, vehicle operational state, season, times of day and weather conditions can be included in training image sets to ensure sufficient algorithm input variety for OPIR applications. However, acquiring such a diverse overhead image set from measured sources can be a challenge, especially in thermal infrared wavebands (e.g., MWIR and LWIR) when adversarial vehicles are of interest. In this work, MuSES™ and CoTherm™ are used to generate synthetic OPIR imagery of several ground vehicles with a range of weather, times of day and background scenes. The performance of a YOLO (“you only look once”) deep learning algorithm is studied and reported, with a focus on how image resolution impacts algorithm detection/recognition performance. The image resolution of future space-based sensor platforms will surely increase, so this study seeks to understand the sensitivity of OPIR algorithm performance to overhead image resolution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The use of simulated data for training deep learning models has shown to be a promising strategy for automated situational awareness, particularly when real data is scarce. Such simulated datasets are important in fields where access to environments or objects of interest is limited, including space, security, and defense. When simulating a dataset for training of a vehicle detector using 3D models, one ideally has access to high-fidelity models for each class of interest. In practice, 3D model quality can vary significantly across classes, often due to different data source or limited detail available for certain objects. In this study, we investigate the impact of this 3D model variation on the performance of a fine-grained military vehicle detector, that distinguishes 15 classes and is trained on simulated data. Our research is driven by the observation that variations in polygon count among 3D models significantly influence class-specific accuracies, leading to imbalances in overall model performance. To address this, we implemented four decimation strategies aimed at standardizing the polygon count across different models. While these approaches resulted in a reduction of overall accuracy, measured in average precision (AP) and AP@50, they also contributed to a more balanced confusion matrix, reducing class prediction bias. Our findings suggest that rather than uniformly lowering the detail level of all models, future work should focus on enhancing the detail in low-polygon models to achieve a more effective and balanced detection performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The military is looking to adopt artificial intelligence (AI)-based computer vision for autonomous systems and decision-support. This transition requires test methods to ensure safe and effective use of such systems. Performance assessment of deep learning (DL) models, such as object detectors, typically requires extensive datasets. Simulated data offers a cost-effective alternative for generating large image datasets, without the need for access to potentially restricted operational data. However, to effectively use simulated data as a virtual proxy for real-world testing, the suitability and appropriateness of the simulation must be evaluated. This study evaluates the use of simulated data for testing DL-based object detectors, focusing on three key aspects: comparing performance on real versus simulated data, assessing the cost-effectiveness of generating simulated datasets, and evaluating the accuracy of simulations in representing reality. Using two automotive datasets, one publicly available (KITTI) and one internally developed (INDEV), we conducted experiments with both real and simulated versions. We found that although simulations can approximate real-world performance, evaluating whether a simulation accurately represents reality remains challenging. Future research should focus on developing validation approaches independent of real-world datasets to enhance the reliability of simulations in testing AI models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This feasibility study investigates the training of DNN-based object detectors for military vehicle detection in intelligent guidance systems for airborne devices. The research addresses the challenges of scarce training images, infrared signatures, and varying flight phases and target distances. To tackle these issues, a database of sanitized military vehicle patches from multiple sources, data augmentation tools and Generative AI (Stable Diffusion XL) are employed to create synthetic training datasets. The objectives include obtaining a robust and performant system based on trustworthy AI, covering vehicle detection, recognition and identification in both infrared and color images within different contexts. In this study various object detection models are trained and evaluated for recall, precision and inference speed based on flight phase and spectral domain, while considering future embedding into airborne devices. The research is still ongoing, with initial results demonstrating the applicability of our approaches for military vehicle detection in aerial imagery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This feasibility study explores training a DNN-based military vehicle detector for airborne guidance systems, addressing challenges of scarce data, numerous similar vehicle classes, and varied real warfare conditions. To this end, a sanitized military vehicle image database is created from multiple 2D and 3D sources with miniature vehicles acquired under various view angles. Complemented by data augmentation tools including AI generated backgrounds, we are able to export controlled, trustworthy and class-equilibrated semi-synthetic datasets. As successful training on a limited number of classes has already been demonstrated, this study further explores the relevancy of this approach on real warfare footage testing with multi-class detector training. By leveraging the combination of data sources and data augmentation techniques and generative AI for creating contextual backgrounds, precision, selectivity, and adaptability of the detectors are evaluated and improved across diverse operational and current situational contexts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Generative Artificial Intelligence (AI) is becoming increasingly prevalent due to the availability of machine learning models, such as stable diffusion, and greater computational powers. While this has many advantages, it has led to maliciously generated images being created, and AI-generated satellite imagery is now an emerging threat. The National Geospatial-Intelligence Agency has acknowledged that AI has been utilised to manipulate satellite images for malicious purposes and is not yet widespread. However, there is a high likelihood that it will be, due to the ever-increasing prevalence of social media. This paper proposes the development of a new dataset containing satellite images that have been synthetically manipulated using generative AI models since there are currently none publicly available. We also propose a new deep-learning-based detection algorithm for such manipulation. This research supports the fight against misinformation and will help to ensure that satellite images remain an objective source of truth. The work aims to create a benchmark for detecting manipulated satellite images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current diffusion models could assist in creating training datasets for Deep Neural Network (DNN)-based person detectors by producing high-quality, realistic, and custom images of non-existent people and objects, avoiding privacy issues. However, these models have difficulties in generating images of people in a fully controlled way. Problems may occur such as abnormal proportions, distortions in body or face, extra limbs, or elements that do not match the input text prompt. Moreover, biases related to factors like gender, clothing type and colors, ethnicity or age can also limit the control over the generated images. Both generative AI models and DNN-based person detectors need large sets of annotated images that reflect the diverse visual appearances expected in the application context. In this paper we explore the capabilities of state-of-the-art text-to-image a diffusion models for person image generation and propose a methodology to exploit their usage for training DNN-based person detectors. For the generation of virtual persons, this includes variations in the environment, such as illumination or background, and people characteristics, such as body pose, skin tones, gender, age, clothing types and colors, as well as multiple types of partial occlusions with other objects (or people). Our method leverages explainability techniques to gain more understanding of the behaviour of the diffusion models and the relation between inputs and outputs to improve the diversity of the person detection training dataset. Experimental results using the WiderPerson benchmark of a YOLOX detection model trained with the proposed methodology show the potential use of this approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work examines the extent to which training data can be artificially generated for a target domain in an unsupervised manner to train an object detector in the target domain in the presence of little or no real training data. If the distributions of a source and target domain differ, but the same task is performed on both, this is referred to as domain adaptation. In the field of image processing, generative approaches are often used when attempting to transform the distribution of the source domain into the target domain. In this work, a generative method, a Denoising Diffusion Probabilistic Model, is investigated for the domain adaptation from the visible spectrum (VIS) to the thermal infrared (IR). Systematic extensions, such as the use of alternative noise schedules, were incorporated and evaluated. The partial results of the domain adaptation are significantly improved by the implemented extensions. In a subsequent step, a thermal infrared object detector is trained using the results of the domain adaptation. The publicly available Multi-scenario Multi-Modality Benchmark to Fuse Infrared and the recording vehicle MODISSA are used here for evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a part-aided military vehicle detection model that combines data-driven and knowledge-based approaches to tackle the intricate task of military vehicle detection and classification. Military vehicles, with their distinct and externally recognizable features, serve as an ideal subject for this methodology. Traditional detection systems often struggle with the nuances of military vehicle detection due to the similarities among different models and the complexities introduced by camouflage. Our model transcends these challenges by focusing on the detection of vehicle parts rather than the entire vehicle itself. This part-based detection paradigm not only facilitates the classification of vehicles with minimal training data but also enhances the model’s ability to perform zero-shot detection, where the system classifies vehicles, it has not explicitly been trained on. The model employs open world attributes detection to dynamically adapt to new or modified vehicle. This adaptability is crucial in contemporary conflict scenarios where vehicle modifications are prevalent. Furthermore, the detection of individual parts offers a detailed description of a vehicle’s equipment and functionality, offering a great explanation of the classification process. We observe significant improvements of the part-aided model over a baseline model, particularly in scenarios with sparse training data. This enhancement is attributed to the model’s ability to generalize from part detection to vehicle classification, thereby reducing overfitting risks. The transparency of the classification process is another critical advantage, as it allows users to intuitively understand and verify the classification results based on visible parts. This paper demonstrates the efficacy of the part-aided approach in military vehicle detection. By leveraging features like zero-shot detection and open world attributes, this model paves the way for more robust, adaptable, and self-explainable AI systems in the field of vehicle detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Following the current Russian invasion of Ukraine, the use of AI-enabled technologies is the next logical step in the use of autonomous systems (such as UAVs or loitering munitions) for detecting military assets. The survivability of military assets on the battlefield is increased by Camouflage, Concealment & Deception (CCD) measures. However, current CCD measures are inadequate to prevent detection by AI-enabled technologies. To improve on CCD measures, adversarial patterns can be employed to fool AI for object detection: Assets, such as soldiers, command tents, and vehicles, camouflaged with adversarial patterns are either not detected or are misclassified by AI. In an operational setting, the downside of adversarial patterns is that they are colorful and distinct from their surroundings. This makes them easily detectable by the human eye. In this manuscript, we design anti-AI camouflage that only use colors close to camouflage netting, as commonly used by NATO forces. We show these are effective at (a) either preventing detection, (b) reducing the confidence the AI has in its detection (c) or making the AI detect many false objects with low confidence. This anti-AI camouflage can fool both human intelligence and artificial intelligence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Adversarial patches in computer vision can be used, to fool deep neural networks and manipulate their decisionmaking process. One of the most prominent examples of adversarial patches are evasion attacks for object detectors. By covering parts of objects of interest, these patches suppress the detections and thus make the target object “invisible” to the object detector. Since these patches are usually optimized on a specific network with a specific train dataset, the transferability across multiple networks and datasets is not given. This paper addresses these issues and investigates the transferability across numerous object detector architectures. Our extensive evaluation across various models on two distinct datasets indicates that patches optimized with larger models provide better network transferability than patches that are optimized with smaller models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Adversarial AI technologies can be used to make AI-based object detection in images malfunction. Evasion attacks make perturbations to the input images that can be unnoticeable to the human eye and exploit weaknesses in object detectors to prevent detection. However, evasion attacks have weaknesses themselves and can be sensitive to any apparent object type, orientation, positioning, and scale. This work will evaluate the performance of a white-box evasion attack and its robustness for these factors.
Video data from the ATR Algorithm Development Image Database is used, containing military and civilian vehicles at different ranges (1000-5000 m). A white-box evasion attack (adversarial objectness gradient) was trained to disrupt a YOLOv3 vehicles detector previously trained on this dataset. Several experiments were performed to assess whether the attack successfully prevented vehicle detection at different ranges. Results show that for an evasion attack trained on object at only 1500 m range and applied to all other ranges, the median mAP reduction is >95%. Similarly, when trained only on two vehicles and applied on all seven remaining vehicles, the median mAP reduction is >95%.
This means that evasion attacks can succeed with limited training data for multiple ranges and vehicles. Although a (perfect-knowledge) white-box evasion attack is a worst-case scenario in which a system is fully compromised, and its inner workings are known to an adversary, this work may serve as a basis for research into robustness and designing AIbased object detectors resilient to these attacks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
AI-based Image Enhancement and Novel AI Directions
The distortion caused by turbulence in the atmosphere during long range imaging can result in low quality images and videos. This, in turn, greatly increases the difficulty of any post acquisition tasks such as tracking or classification. The mitigation of such distortions is therefore important, allowing any post processing steps to be performed successfully. We make use of the EDVR network, initially designed for video restoration and super resolution, to mitigate the effects of turbulence. This paper presents two modifications to the training and architecture of EDVR, that improve its applicability to turbulence mitigation: namely the replacement of the deformable convolution layers present in the original EDVR architecture, alongside the addition of perceptual loss. This paper also presents an analysis of common metrics used for image quality assessment and it evaluates their suitability for the comparison of turbulence mitigation approaches. In this context, traditional metrics such as Peak Signal-to-Noise Ratio can be misleading, as they could reward undesirable attributes, such as increased contrast instead of high frequency detail. We argue that the applications for which turbulence mitigated imagery is used should be the real markers of quality for any turbulence mitigation technique. To aid in this, we also present a new turbulence classification dataset that can be used to measure the classification performance before and after turbulence mitigation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image quality degradation caused by atmospheric turbulence reduces the performance of automated tasks such as optical character recognition. This issue is addressed by fine-tuning text recognition models using turbulencedegraded images. As obtaining a realistic training dataset of turbulence-degraded recordings is challenging, two synthetic datasets were created: one using a physics-inspired deep learning turbulence simulator and one using a heat chamber. The fine-tuned text recognition model leads to improved performance on a validation dataset of turbulence-distorted recordings. A number of architectural modifications to the text recognition model are proposed that allow for using a sequence of frames instead of just a single frame, while still using the pre-trained weights. These modifications are shown to lead to a further performance improvement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning methodologies are extensively applied in addressing two-dimensional (2D) and three-dimensional (3D) computer vision challenges, encompassing tasks like object detection, super-resolution (SR), and classification. Radar imagery, however, contends with lower resolution compared to optical counterparts, posing a formidable obstacle in developing accurate computer vision models, particularly classifiers. This limitation stems from the absence of high-frequency details within radar imagery, complicating precise predictions by classifier models. Common strategies to mitigate this issue involve training on expansive datasets or employing more complex models, potentially susceptible to overfitting. However, generating sizeable datasets, especially for radar imagery, is challenging. Presenting an innovative solution, this study integrates a Convolutional Neural Network (CNN)-driven SR model with a classifier framework to enhance radar classification accuracy. The SR model is trained to upscale low-resolution millimetre-wave (mmW) images to high-resolution (HR) counterparts. These enhanced images serve as inputs for the classifier, distinguishing between threat and non-threat entities. Training data for the dual CNN layers is generated utilising a numerical model simulating a near-field coded-aperture computational imaging (CI) system. Evaluation of the resulting dual CNN model with simulated data yields a remarkable classification accuracy of 95%, accompanied by rapid inference time (0.193 seconds), rendering it suitable for real-time threat classification applications. Further validation with experimentally generated reconstruction data attests to the model’s robustness, achieving a classification accuracy of 94%. This integrated approach presents a promising solution for enhancing radar imagery analysis accuracy, offering substantial implications for real-world threat detection scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vector Symbolic Architecture (VSA), a.k.a. Hyperdimensional Computing (HDC) has transformative potential for advancing cognitive processing capabilities at the network edge. This paper examines how this paradigm offers robust solutions for AI and Autonomy within a future command, control, communications, computers, cyber, intelligence, surveillance and reconnaissance (C5ISR) enterprise by effectively modelling the cognitive processes required to perform Observe, Orient, Decide and Act (OODA) loop processing. The paper summarises the theoretical underpinnings, operational efficiencies, and synergy between VSA and current AI methodologies, such as neural-symbolic integration and learning. It also addresses major research challenges and opportunities for future exploration, underscoring the potential for VSA to facilitate intelligent decision-making processes and maintain information superiority in complex environments. The paper intends to serve as a cornerstone for researchers and practitioners to harness the power of VSA in creating next-generation AI applications, especially in scenarios that demand rapid, adaptive, and autonomous responses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vector Symbolic Architecture (VSA), a.k.a. Hyperdimensional Computing has transformative potential for advancing cognitive processing capabilities at the network edge. This paper presents a technology integration experiment, demonstrating how the VSA paradigm offers robust solutions for generation-after-next AI deployment at the network edge. Specifically, we show how VSA effectively models and integrates the cognitive processes required to perform intelligence, surveillance, and reconnaissance (ISR). The experiment integrates functions across the observe, orientate, decide and act (OODA) loop, including the processing of sensed data via both a neuromorphic event-based camera and a standard CMOS frame-rate camera; declarative knowledge-based reasoning in a semantic vector space; action planning using VSA cognitive maps; access to procedural knowledge via large language models (LLMs); and efficient communication between agents via highly-compact binary vector representations. In contrast to previous ‘point solutions’ showing the effectiveness of VSA for individual OODA tasks, this work takes a ‘whole system’ approach, demonstrating the power of VSA as a uniform integration technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mine detection is dangerous and time-consuming: obtaining maritime situational awareness relies on manned surface platforms in or near the minefield. Navies worldwide are investing in autonomous underwater vehicles (AUVs) or Unmanned Surface Vessels (USVs) with sonar capabilities to aid in this task. The next step is to have these AUVs or USVs detect MIne-Like COntacts (MILCOs) autonomously, using deep neural networks (DNNs). Teaching DNNs to detect objects requires large amounts of good-quality data. For operational naval mines, this data is lacking because of four main reasons: (1) there are not that many mines encountered for big-data, (2) usually, only one mine is found per encounter, (3) sonar capabilities have improved over the last few years, making older sonar data less useful for training, and (4) information on current sonar capabilities and mines is classified.
We leverage a synthetic dataset of several types of mines in realistic environments to train an open-source DNN to detect MILCOs. The synthetic dataset provides many images of mines, often multiple mines per image, and an image quality similar to current sonar systems’ capabilities. We test our deep neural network on a recently published dataset of real naval mines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is common for occlusion to occur in images relevant to counterterrorism applications tasked with firearm classification. To address this challenge, the architecture of a Compositional Convolutional Neural Network was selected for neural network construction. These networks, given appropriate training, demonstrate promising results in image classification even in the presence of occlusion. To adequately train the neural network while facing a shortage of available images depicting firearms under occlusion, a series of tools were developed to artificially introduce occlusion and noise. This facilitated the creation of an augmented dataset to complement the training dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an innovative approach to path-planning for Autonomous Underwater Vehicles (AUVs) in complex underwater environments, leveraging single-beam sonar data. Recognizing the limitations of traditional sonar systems in providing detailed environmental data, we introduce a method to effectively utilize Ping360 sonar scans for obstacle detection and avoidance. Our research addresses the challenges posed by dynamic underwater currents and obstacle unpredictability, incorporating environmental factors such as water temperature, depth, and salinity to adapt the sonar’s range detection capabilities. We propose a novel algorithm that extends beyond the capabilities of the A* algorithm, considering the underwater currents’ impact on AUV navigation. Our method demonstrates significant improvements in navigational efficiency and safety, offering a robust solution for AUVs operating in uncertain and changing underwater conditions. The paper outlines our experimental setup, algorithmic innovations, and the results of comprehensive simulations conducted in a controlled tank environment, showcasing the potential of our approach in enhancing AUV operational capabilities for defense and security applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In autonomous driving, utilizing deep learning models to help make decisions has become a popular theme, particularly in the realm of computer vision. These models are heavily geared to make decisions based on the environment in which they are trained. Currently, there are very few datasets that exist for off-road autonomy. In order for autonomous vehicles to traverse off-road or unstructured settings, the vehicles must have an understanding of these environments. This paper seeks to lay the groundwork for a new, off-road, multi-modality dataset, with initial data being collected at Mississippi State University’s Center for Advanced Vehicular Systems (MSU CAVS). This dataset will include co-aligned and coregistered LiDAR, thermal, and visual sensor data. Additionally, this dataset will include semantic segmented visual data as well as object detection and classification labels for all three modalities. However, to perform semantic segmentation on each image individually would be strenuous and time-consuming due to the large quantity of images. Thus, this paper will explore the utility of transfer learning for auto-labeling. Specifically, this paper considers how transfer learning for a unique and under-represented (data-wise) domain performs in reducing the burden associated with hand-annotation datasets for deep learning applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Security holograms, diffractive optical elements known for their intricate 3D visuals, are a cornerstone of brand protection. However, their effectiveness hinges on robust authentication methods. This work explores the critical role of illumination in hologram visualization for enhanced authentication. We investigate the impact of illumination angle and capture stage on the clarity of the hologram's visual features.
An experimental setup utilizing an LED light source, and a smartphone camera allows for systematic analysis of illumination angles. Laser light is also explored as a potential authentication tool. By combining observations, detailed documentation, and the potential of Artificial Intelligence (AI), this research aims to identify the optimal illumination conditions that maximize hologram visualization, facilitating efficient and reliable authentication workflows. This approach paves the way for improved security in document and product authentication, ultimately strengthening consumer trust and brand reputation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.