Recently deep learning-based methods for small object detection have been improved by leveraging temporal information. The capability of detecting objects down to five pixels, provides new opportunities for automated surveillance with high resolution wide field of view cameras. However, integration on unmanned vehicles generally comes with strict demands on size, weight and power. This poses a challenge for processing high framerate high resolution data, especially when multiple camera streams need to be analyzed in parallel for 360 degrees situational awareness. This paper presents results of the Penta Mantis-Vision project where we investigated the parallel processing of four 4K camera video streams with commercially available edge computing hardware, specifically the Nvidia Jetson AGX Orin. As the computational power of the GPU on an embedded platform is a critical bottleneck we explore widely available techniques to accelerate inference or reduce power consumption. Specifically we analyze the effect of INT8 quantization and replacement of the activation function on small object detection. Furthermore we propose a prioritized a tiling strategy to process camera frames in such a way that new objects can be detected anywhere in the camera view while previously detected objects can still be tracked robustly. We implemented a video processing pipeline for different temporal YOLOv8 models and evaluated these with respect to object detection accuracy and throughput. Our results demonstrate that recently developed deep learning models can be deployed on embedded devices for real-time multi-cam detection and tracking of small objects without compromising object detection accuracy.
Adversarial AI technologies can be used to make AI-based object detection in images malfunction. Evasion attacks make perturbations to the input images that can be unnoticeable to the human eye and exploit weaknesses in object detectors to prevent detection. However, evasion attacks have weaknesses themselves and can be sensitive to any apparent object type, orientation, positioning, and scale. This work will evaluate the performance of a white-box evasion attack and its robustness for these factors.
Video data from the ATR Algorithm Development Image Database is used, containing military and civilian vehicles at different ranges (1000-5000 m). A white-box evasion attack (adversarial objectness gradient) was trained to disrupt a YOLOv3 vehicles detector previously trained on this dataset. Several experiments were performed to assess whether the attack successfully prevented vehicle detection at different ranges. Results show that for an evasion attack trained on object at only 1500 m range and applied to all other ranges, the median mAP reduction is >95%. Similarly, when trained only on two vehicles and applied on all seven remaining vehicles, the median mAP reduction is >95%.
This means that evasion attacks can succeed with limited training data for multiple ranges and vehicles. Although a (perfect-knowledge) white-box evasion attack is a worst-case scenario in which a system is fully compromised, and its inner workings are known to an adversary, this work may serve as a basis for research into robustness and designing AIbased object detectors resilient to these attacks.
Image quality degradation caused by atmospheric turbulence reduces the performance of automated tasks such as optical character recognition. This issue is addressed by fine-tuning text recognition models using turbulencedegraded images. As obtaining a realistic training dataset of turbulence-degraded recordings is challenging, two synthetic datasets were created: one using a physics-inspired deep learning turbulence simulator and one using a heat chamber. The fine-tuned text recognition model leads to improved performance on a validation dataset of turbulence-distorted recordings. A number of architectural modifications to the text recognition model are proposed that allow for using a sequence of frames instead of just a single frame, while still using the pre-trained weights. These modifications are shown to lead to a further performance improvement.
Early threat assessment of vessels is an important surveillance task during naval operations. Whether a vessel is a threat depends on a number of aspects. Amongst those are the vessel class, the closest point of approach (CPA), the speed and direction of the vessel and the presence of possible threatening items on board the vessel such as weapons. Currently, most of these aspects are observed by operators viewing the camera imagery. Whether a vessel is a potential threat will depend on the final assessment of the operator. Automated analysis of electro-optical (EO) imagery for aspects of potential threats during surveillance can support the operator during observation. This can release the operator from continuous guard and provide him with the tools to provide a better overview of possible threats in the surroundings during a surveillance task. In this work, we apply different processing algorithms, including detection, tracking and classification, on recorded multi-band EO imagery in a harbor environment with many small vessels. With the results we aim to automatically determine the vessel’s CPA, number of people on board and the presence of possibly threatening items on board of the vessel. Hereby we show that our algorithms can support the operator in assessing whether a vessel poses a threat or not.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.