Proceedings Volume 13679

Artificial Intelligence for Security and Defence Applications III

cover
Proceedings Volume 13679

Artificial Intelligence for Security and Defence Applications III

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 7 November 2025
Contents: 10 Sessions, 47 Papers, 37 Presentations
Conference: Security + Defence 2025
Volume Number: 13679

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 13679
  • Object Detection and Tracking I
  • Object Detection and Tracking II
  • Object Detection and Tracking III
  • Detection, Recognition, and Identification
  • Adversarial AI and Counter AI
  • Synthetic and Simulated Data, Generative AI
  • Human-AI Collaboration and Advances in AI
  • Classification, Segmentation, and Scene Understanding
  • Poster Session
Front Matter: Volume 13679
icon_mobile_dropdown
Front Matter: Volume 13679
This PDF file contains the front matter associated with SPIE Proceedings Volume 13679, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Object Detection and Tracking I
icon_mobile_dropdown
Towards precise subpixel infrared target detection based on AI: an empirical study on SNR limits
Héctor Corrales, Álvaro Hernández, Javier Lorenzo, et al.
Infrared Small Target Detection (IRSTD) is a critical technology in military and maritime applications, where subpixel detection remains a significant challenge. These targets, with an apparent size smaller than a single pixel, are difficult to distinguish due to low signal-to-noise ratio (SNR), complex background clutter, and diffraction effects. Traditional methods lack flexibility, and, while artificial intelligence (AI)-based approaches improve detection through temporal context and deep learning, they require high computational resources. These modern methods have advanced multi-pixel target detection, however, they have not been explicitly evaluated for the subpixel case, a gap exacerbated by the absence of suitable datasets. To address this, a simulator has been developed to generate synthetic subpixel targets with adjustable SNR and diffraction effects. A spatio-temporal semantic segmentation network, inspired by DTUM,1 has been implemented and tested on synthetic data. Results quantify the model’s effectiveness and establish its operational limits under challenging SNR conditions, providing clear performance thresholds.
Multimodal ship detection using YOLOv11: comparative analysis of SWIR, LWIR, and visible sensors under diverse maritime and environmental conditions
Simon Rennotte, Cornelia Nita, Pieter-Jan Demeyer, et al.
Reliable ship detection is a critical capability for maritime security and military operations, particularly in scenarios where vessels do not broadcast Automatic Identification System (AIS) signals. In this paper, we investigate the use of computer vision techniques for automatic ship detection using a multimodal sensor suite comprising Short-Wave Infrared (SWIR), Long-Wave Infrared (LWIR), and visible spectrum cameras. We first conduct a qualitative analysis of the strengths and weaknesses of each sensor across varying environmental conditions using a custom dataset acquired in diverse atmospheric scenarios and against a range of backgrounds and ship types. Subsequently, we perform a quantitative evaluation by training and testing YOLOv5 and YOLOv11 models on data from each sensor type and comparing detection performance across video sequences captured in representative maritime conditions.
Analysing operational performance of few-shot learning using synthetic data
Niclas Hansson, Erik Persson, Sidney Rydström, et al.
Traditional machine learning techniques for vehicle detection and classification requires large amounts of annotated images. For applications in defence and public security, obtaining this is usually not possible, e.g., because it is difficult, or even impossible, to get access to the relevant environments and vehicles. Few-shot learning is a research area which aims to design models that can perform well with only a few training examples and can therefore be useful for these types of applications. In this work, we evaluate how few-shot learning can be used to solve the problem of limited data for a UAV-based vehicle detection scenario. Two few-shot learning methods, Meta-DETR and CD-ViTO, are evaluated with respect to their performance given different numbers of training examples. Their performance is compared to a traditional baseline, Faster R-CNN, that is instead trained on large amounts of data. We use a synthetically generated dataset, and further describe how this dataset is designed. We show that Meta-DETR has solid performance on our dataset given the small amounts of data, but does not reach the performance of the traditional baseline method Faster R-CNN. In contrast, CD-ViTO performed very poorly on our dataset and our analysis shows that this is likely because the DINOv2 features used for prototypes are not expressive enough to distinguish between the different vehicle classes.
The impact of burn-in on semi-supervised single-stage object detection on small military datasets
Michel van Lier, Frank Ruis, Thijs Eker, et al.
Semi-Supervised Object Detection (SSOD) aims to improve object detection performance by leveraging both labeled and (large amounts of) unlabeled data. By reducing the reliance on labeled data, SSOD enables faster model development and quicker deployment to the field, allowing systems to rapidly adapt to new environments or mission-specific scenarios with minimal annotation effort. While SSOD has demonstrated success in enhancing the performance of object detection models, ranging from two-stage detectors like R-CNN and DETR to more recent one-stage detectors such as YOLO, existing research has primarily focused on commonly studied benchmark datasets like PASCAL VOC and MS COCO. To the best of our knowledge, its effectiveness on domain-specific and smaller military relevant datasets has not been thoroughly evaluated. In this work, we investigate the performance of SSOD using YOLOv5 on three (military-relevant) use-case datasets: air-to-ground dataset VisDrone2019-DET, ground-to-ground dataset Automatic Target Recognition (ATR) Algorithm Development Image Database, our own in-house recorded and annotated proprietary military air-to-ground dataset, and our own internet crowd sourced Russian-Ukrainian War dataset containing ground-toground and air-to-ground imagery. We artificially limit the the labeled training data from 66% up to 1%, while treating the remainder as unlabeled. Our results show that SSOD consistently improves object detection performance across all datasets and label proportions compared to training without unlabeled data. Additionally, we find that the choice of burn-in epochs (the point at which labeled pretraining transitions to semi-supervised training) significantly impacts final performance. The optimal burn-in epoch is not necessarily the best performing validation epoch, highlighting the need for a careful burn-in approach.
Multi-human motion tracking with transformer network augmentation
Philip Birch, Xudong Han, Nobuyuki Oishi, et al.
The tracking of people within crowds is a challenging task when the crowd reaches a certain density using only two-dimensional camera images. Many human trackers assume that the motion of people can be modelled with a linear predictor, but this can poorly model reality in dense crowds, where more complex interactions between people occur. To overcome this limitation, this paper shows that deep learning temporal prediction models can successfully model these complex motions. Traditional augmentation is compared against physically realistic models and generated models to improve the performance.
Air-to-ground real-time temporal small object detection from a flying platform
Real-time small object detection in air-to-ground scenarios is important for military applications, enabling unmanned aerial vehicles (UAVs) and other airborne platforms to monitor large areas. The detection of small objects on the ground remains a significant challenge because of their lack of distinctive features and the low signal-to-noise ratio. Temporal-YOLO has demonstrated that exploiting temporal information in video data improves small object detection across diverse environments. However, the current implementation of Temporal- YOLO is not suitable for real-time deployment on a moving aerial platform. In this study, we extend Temporal-YOLO by addressing two key limitations. First, because Temporal- YOLO assumes a static camera over frames, it requires compensation for the platform motion. We implement and evaluate various (global) motion compensation techniques to assess their impact on detection performance. Second, the original Temporal-YOLO model introduces latency by incorporating a future frame. To enable real-time inference, we propose a modification that relies solely on past frames. Through evaluation on an aerial dataset featuring small military-relevant objects, we demonstrate that global motion compensation significantly enhances detection accuracy and that the low-latency historical approach achieves comparable performance.
Object Detection and Tracking II
icon_mobile_dropdown
Real-time and robust deep learning object detection based on feature domain adversarial
Shuyuan Wen, Yang Gao, Bingrui Hu, et al.
Object detection based on deep learning has been widely used in the field of security and has excellent performance in normal scenarios. However, due to limited computing power of embedded devices and complex illumination and weather, the speed and detection accuracy of existing methods are decreased significantly, which poses a threat to security system. Therefore, to overcome the above challenges, we propose a real-time and robust deep learning object detection method based on feature domain adversarial to improve detection accuracy and inference speed in complex scenarios. Firstly, we adopt structured re-parametrized convolution to reduce computational cost and improve inference speed. Secondly, we introduce a feature domain discriminator and adversarial loss function to learn consistency features to improve generalization ability. The experimental results demonstrate the effectiveness of the proposed method in low-light, rain and fog weather, and it is superior to existing methods.
Anti-AI camouflage for naval vessels
Alexander M. van Oers, Elise C. van Swol, Jesse Kassai
AI-enabled object detection and classification is being developed and deployed by militaries worldwide. In the maritime domain, autonomous vehicles such as unmanned aerial vehicles (UAVs) and unmanned surface vehicles (USVs) offer new possibilities for obtaining maritime situational awareness. However, these technologies also introduce new threats for our vessels: they can be deployed to detect and classify naval vessels. To counter this threat, we developed anti-AI camouflage for naval vessels. Using open-source publications from the Dutch Ministry of Defence, we compiled a dataset of Dutch Royal Navy vessels, consisting of five classes. The images, captured from the viewpoint of USVs and UAVs, were used to design camouflage patterns - adversarial patches - intended to fool AI-enabled object detection. These patterns reduce the likelihood of AI detecting naval vessels or cause the AI to misclassify the type of vessel. By picking the colors of the pattern, we achieve dual-attribute camouflage that can fool both the human and artificial brain.
Milliseconds matter: pushing YOLO to the limit for real-time object detection
Çağlayan Can Çakırgöz, Sefa Burak Okcu, Cevahir Çığla
This study focuses on optimizing object detection networks for real-time performance in edge devices, specifically targeting the YOLO family (YOLOv5, YOLOv8, and YOLOv10). Our experiments were structured into three key stages: pre-processing, model execution, and postprocessing. Notable improvements include manipulations of resize and activation functions, along with transferring max-class score and index evaluations from the CPU to the NPU. These optimizations significantly accelerate the object detection process with minimal accuracy tradeoffs, enhancing real-time performance, and making the system more suitable for low-latency applications, such as security and intelligent transportation systems.
KOIOS: AI techniques for secure, robust, frugal, resilient, and explainable solutions in defence applications
Iago Docamiño, Francisco Andrés
The integration of Artificial Intelligence (AI) into defense systems holds significant potential for enhancing security protocols, refining decision-making mechanisms, and elevating overall operational efficacy. Most countries have identified the integration of these techniques as a key innovation of their armed forces. One of the main challenges facing the defense sector is the availability of quality data that allows for the correct development of AI models. KOIOS, an interdisciplinary R&D project, aims to advance AI techniques for military applications through frugal learning methods. The project focuses on developing AI solutions for scenarios with limited data, rapidly adapting to new, untrained situations. The project pushes the boundaries of AI research in areas such as few-shot learning, zero-shot learning, synthetic data generation, semi-supervised learning, and domain adaptation. KOIOS enhances AI for military applications through simulation, use-case development, metrics, and real-world experiments. The work presented focuses on three image-related use cases: Detection, classification, and segmentation of military ships; Detection of rare events or threats in battlefield from images; Advanced autonomous capabilities for UAVs. However, the project develops other remarkable use cases related to threat detections, predictions, and operational adaptation. The project’s outputs include an integration of high-performance computing with AI models for improved frugality and robustness, contributions to standardizing benchmarking and evaluation methodologies for AI systems in defense, and the creation of training materials to support non-specialist end-users in adapting AI methods quickly.
Object Detection and Tracking III
icon_mobile_dropdown
Visual breeder and travel document authenticator
Henri Bouma, Jorge Melo, Johan-Martijn ten Hove, et al.
The use of AI technologies improves document authentication, which supports border guards and immigration services to fight document fraud, identity theft, illegal border crossing and illegal migration. This paper shows a novel application for the authentication of travel and breeder documents that preserves privacy during the process. The new capabilities include robust processing of images from mobile phones, federated-learning based training, data-driven discovery of new rules and knowledge-based tactical anomaly detection. The processing allows the tactical analysis of many data elements, such as consistency checks, multi-language support and validity of data elements.
Mobile vision systems for security: automating surveillance with scene change detection
Alejandro Rituerto, Peter Leškovský, Jorge García
The increasing demand for intelligent monitoring systems in security has accelerated the development of automated surveillance solutions. This paper presents a comprehensive study on the use of Computer Vision techniques combined with mobile camera platforms to automate surveillance rounds in border control and critical infrastructure protection. We explore the integration of Scene Change Detection (SCD) methods into embedded Computer Vision systems as a core strategy to enhance autonomous monitoring capabilities. The paper introduces a taxonomy of relevant surveillance use cases, identifying the types of operational changes that automated systems must detect. We then evaluate and compare the effectiveness of SCD techinques versus direct detection methods (such as classification, object detection or action recognition) to solve these use cases. Our analysis highlights the strengths of SCD in detecting unexpected or subtle environmental changes without requiring prior knowledge of specific targets. In addition, we provide a state-of-the-art survey of current SCD methods, discussing algorithmic approaches and benchmark results to establish realistic expectations for deployment. Finally, we examine the technical and operational requirements for developing and deploying SCD-enabled systems in real-world scenarios, particularly on mobile platforms such as autonomous agents.
Federated object detection for defense and security applications using realistic unbalanced heterogeneous data distributions
Muriel van der Spek, Arthur van Rooijen, Lotte Nijskens, et al.
There is often sensitive data in defence and security applications, making it difficult for organizations to share such data. This limits the training of artificial intelligence techniques, which typically require large, diverse datasets. Federated learning offers a solution by enabling organizations to collaboratively train models without sharing private data. However, existing research on federated learning often focuses on simple computer vision tasks, such as classification on balanced datasets, and rarely addresses more complex tasks involving realistic, heterogeneous data distributions, also known as non-IID (non-independent and identically distributed) data. In this work, we demonstrate a federated learning framework applied to various object detection tasks relevant to defence and security. These tasks are evaluated under different types of non-IID conditions, including quantity skew, label skew, and feature skew. The object detection tasks include number and symbol detection on UNO card corners, single-frame person and vehicle detection from an air-to-ground perspective using the VisDrone dataset, and small moving object detection in challenging environments. Experimental results show that federated models consistently outperform separately trained models in both IID and non-IID settings. In experiments involving the three types of skew, federated performance decreases as the data becomes more non-IID. However, our results still demonstrate the added benefit of federated training compared to separately trained models. These findings highlight the viability of federated object detection in real-world defence and security scenarios involving heterogeneous data.
Underwater object detection in forward-looking sonar using a transformer-guided enhancement
Murat Aydoğmuş, Işın Erer
Underwater imaging plays a vital role in various applications and domains, including underwater critical infrastructure protection, unknown object detection, mine detection and general surveys. One of the primary challenges in underwater object detection is the presence of significant noise and the limitations imposed by the harsh underwater environment. This study explores the effectiveness of a joint deep learning framework that combines image enhancement which is reinforced with transformer-based modules and object detection to improve the detection performance of underwater objects captured by forward-looking sonar (FLS). It is aimed to have better performance on underwater object detection for FLS images by using a joint framework based on enhancement with transformer structures and YOLOX detection module. The framework employs a shared backbone for both the enhancement and detection components, enabling improved detection accuracy through joint optimization. Enhancement using transformer modules facilitate the clear extraction of images and joint optimization with shared weights ensure the efficiency across the network. This approach is particularly suitable for defense and maritime security applications, where reliable detection of underwater threats is essential. Experimental findings show that the proposed method is superior to the detection-only model and the cascaded approach in which enhancement and detection are handled separately.
Removing power lines from digital surface models using OSM data
Dominik Stütz, Gisela Häufel, Dimitri Bulatov, et al.
A digital surface model (DSM) is crucial for various applications, such as visibility analysis, and are often commonly available. Depending on the sampling method of a DSM from 3D point clouds, power lines may be projected into the two-dimensional domain. The resulting artifacts represent a strong obstacle to automatic computation of visibility. To clean free DSM data from these artifacts, we propose an effective method that relies on the available OpenStreetMap (OSM) data and is widely unsupervised. After fetching relevant OSM data and creating buffers around power line features, we apply image processing techniques, including edge detection and Hough transform to accomplish the removal of artifacts left behind by power lines in DSMs. Hereby, care is taken to preserve vegetation under power lines. The proposed method is robust, easily adaptable, and fast, making its application suitable for large-scale data and contributing to the democratization of the access to high-quality geodata processing.
Detection, Recognition, and Identification
icon_mobile_dropdown
Face recognition in the wild: augmentation-based robustness against real-world degradations
Gökhan Sari, Sefa Burak Okcu, Direnç Demir, et al.
Face recognition systems often struggle with real-world imperfections such as blur, shift, lighting variations, imperfections on camera, and low contrast, significantly affecting their performance in security-critical scenarios. These degradations vary depending on factors like image sensor characteristics, camera placement, environmental lighting, and seasonal changes. Additionally, mixed system solutions involving edge-based detection and cloud-based recognition introduce further artifacts during edge processing, which degrades robust face recognition performance. In this study, we analyze the effects of real-world degradations on face recognition and propose a data augmentation-based technique to mitigate their impact. We employ realistic data augmentation techniques during face recognition model training to simulate real-world imperfections. In order to ensure that accuracy on clear data is not degraded, we apply these augmentation techniques randomly. Experiments evaluate the impact of these augmentations on face recognition accuracy, comparing performance on both pristine and degraded datasets. The results demonstrate that augmented training significantly improves robustness against real-world distortions while preserving recognition accuracy on standard benchmarks. This study provides insights into effective augmentation strategies for enhancing face recognition models in uncontrolled environments, offering a robust solution for deployments in real-world scenarios in both non-ideal and ideal conditions.
Insights into deep learning-based vessel detection and characterization using SAR and AIS data
Miguel A. Belenguer-Plomer, Michele Lazzarini, Omar Barrilero, et al.
As part of the Embed2Scale project funded by the Horizon Europe programme, this study presents a deep learning-based approach to vessel detection and characterisation in support of Maritime Domain Awareness (MDA). To address the limitations of the Automatic Identification System (AIS), which is not mandatory for all vessels and may be intentionally deactivated, Sentinel-1 Synthetic Aperture Radar (SAR) imagery was combined with AIS records to generate a large-scale dataset. The dataset comprises 2,008 SAR images collected from twelve major United States ports between 2020 and 2024, yielding 23,121 vessels and over 22,770 image batches of 500 × 500 pixels. The You Only Look Once version 8 (YOLOv8) architecture was employed for vessel detection and classification. Several model variants from nano to extra-large, were trained and compared using pre-trained weights being their performance evaluated on a validation set representing 20% of the batches. The main evaluation metric was the mean Average Precision at IoU = 0.5 (mAP@50), being the medium variant model which offered the best trade-off between accuracy and computational cost. Larger models provided only marginal gains and required more training time. Nevertheless, challenges remain in detecting classes of vessels unrepresented in the dataset, what highlights the need for strategies to address data imbalance.
Combining global and local vision foundation models for explainable tattoo matching
Sabina B. van Rooij, Michalis Lazaridis, Stefanos Demertzis, et al.
Tattoo matching is important for criminal investigations. Recently, vision foundation models have shown increased performance for tasks like image classification and image retrieval. Global vision foundation models (e.g., CLIP) or local approaches (e.g., OmniGlue) show increased performance for image retrieval. In this paper, we show the added value of combining global and local approaches for explainable tattoo matching. We also investigate the use of a sketchification approach, facilitating the matching process for more abstract tattoos. We finally highlight the potential of local OmniGlue as a more explainable alternative to global image-based matching methods like CLIP.
A deep-learning framework for concealed and unconcealed face analysis in submillimeter wave imaging
This work presents a deep learning framework for sub-millimeter-wave (Sub-MMW) facial analysis that operates on active 340 GHz intensity images acquired from 20 identities in both concealed and unconcealed conditions over five head postures, including front-facing and yawed views, with approximately 35 images per condition per identity (~1,400 total images). The framework addresses concealed-to-unconcealed face verification, head posture classification, and pixel-wise reconstruction of unconcealed from concealed observations, showing that task-tailored networks achieve strong performance mainly in verification, and posture recognition, while reproducing and extending a structural similarity (SSIM) sliding-window baseline for concealed–unconcealed verification reported for the same instrument class.
Audio foundation models for gunshot characterization: a self-supervised learning approach
Lucia Cristina Martin Perez, Carlos Roberto del-Blanco, Jesus Angel Alierta Nicodemus, et al.
Recent years have seen increasing focus on gunshot characterization in the cyber-physical Measurements And Signature INTelligence (MASINT) domain, fueled by IoT and AI advancements. This involves detecting, identifying, and classifying firearm discharge sounds and vibrations, benefiting forensic science, law enforcement, and military applications. A pressing challenge for gunshot characterization is the low availability of high-quality and robust datasets, containing multiple types of firearms with labelled data, which are necessary to train the deep learning models involved in the gunshot detection and recognition systems. The recent emergence of deep learning foundation models, which are large neural networks trained on loads of raw data from different domains, scenarios and contexts, have alleviated the requirement of large volumes of labelled data for the training, since the target model can leverage the general knowledge of the foundational model. However, there is still no works focused on how to apply and adapt foundational models to the field of gunshot detection and recognition. This work aims to leverage Transformer-based audio foundational models to characterize gunshot sound signals by using self-supervised learning schemes via different audio-related tasks. The recognition system will consider diverse types of perturbations, such as varying pitch or timbre, environmental conditions, the distance of the gun to the capturing source or the way it is being fired. The proposed system will be evaluated on a ballistic dataset that consists of audio files containing gunshots fired by four firearms with different firing modes: single shot fire, multiple shot fire, and rapid fire.
Adversarial AI and Counter AI
icon_mobile_dropdown
Can invisible physical events easily fool spiking neural networks?
Adir Hazan, Ido Avrahami, Adrian Stern
Event-based dynamic vision sensors, which generate sparse spike-based outputs, are ideal for low-power applications. Spiking Neural Networks are designed to process this data efficiently on asynchronous neuromorphic hardware. As event-based vision advances, understanding the vulnerability of Spiking Neural Networks to physical adversarial attacks becomes crucial. This work introduces a novel light-based adversarial attack on neuromorphic vision. We exploit undetectable optical events, specifically designed light pulses, to disrupt the temporal dynamics of event-based sensors. Our method demonstrates how these physical attacks can be tailored to the event-based data’s discrete and sparse nature while achieving high success rates.
Vision transformers: the threat of realistic adversarial patches
Kasper Cools, Clara Maathuis, Alexander M. van Oers, et al.
The increasing reliance on machine learning systems has made their security a critical concern. Evasion attacks enable adversaries to manipulate the decision-making processes of AI systems, potentially causing security breaches or misclassification of targets. Vision Transformers (ViTs) have gained significant traction in modern machine learning due to increased 1) performance compared to Convolutional Neural Networks (CNNs) and 2) robustness against adversarial perturbations. However, ViTs remain vulnerable to evasion attacks, particularly to adversarial patches, unique patterns designed to manipulate AI classification systems. These vulnerabilities are investigated by designing realistic adversarial patches to cause misclassification in person vs. non-person classification tasks using the Creases Transformation (CT) technique, which adds subtle geometric distortions similar to those occurring naturally when wearing clothing. This study investigates the transferability of adversarial attack techniques used in CNNs when applied to ViT classification models. Experimental evaluation across four fine-tuned ViT models on a binary person classification task reveals significant vulnerability variations: attack success rates ranged from 40.04% (google/vit-base-patch16-224-in21k) to 99.97% (facebook/dino-vitb16), with google/vit-base-patch16-224 achieving 66.40% and facebook/dinov3-vitb16 reaching 65.17%. These results confirm the cross-architectural transferability of adversarial patches from CNNs to ViTs, with pre-training dataset scale and methodology strongly influencing model resilience to adversarial attacks.
AutoDetect: designing an autoencoder-based detection method for poisoning attacks on object detection applications in the military domain
Alma M. Liezenga, Stefan Wijnja, Puck de Haan, et al.
Poisoning attacks pose an increasing threat to the security and robustness of Artificial Intelligence systems in the military domain. The widespread use of open-source datasets and pretrained models exacerbates this risk. Despite the severity of this threat, there is limited research on the application and detection of poisoning attacks on object detection systems. This is especially problematic in the military domain, where attacks can have grave consequences. In this work, we both investigate the effect of poisoning attacks on military object detectors in practice, and the best approach to detect these attacks. To support this research, we create a small, custom dataset featuring military vehicles: MilCivVeh. We explore the vulnerability of military object detectors for poisoning attacks by implementing a modified version of the BadDet attack: a patch-based poisoning attack. We then assess its impact, finding that while a positive attack success rate is achievable, it requires a substantial portion of the data to be poisoned – raising questions about its practical applicability. To address the detection challenge, we test both specialized poisoning detection methods and anomaly detection methods from the visual industrial inspection domain. Since our research shows that both classes of methods are lacking, we introduce our own patch detection method: AutoDetect, a simple, fast, and lightweight autoencoder-based method. Our method shows promising results in separating clean from poisoned samples using the reconstruction error of image slices, outperforming existing methods, while being less time- and memory-intensive. We urge that the availability of large, representative datasets in the military domain is a prerequisite to further evaluate risks of poisoning attacks and opportunities patch detection.
Synthetic and Simulated Data, Generative AI
icon_mobile_dropdown
Synthesizing low-frequency active sonar data for automatic target recognition using machine learning
Vincent Dieduksman, Wessel Boukema, Stijn Hendriks, et al.
This study explores the synthesis of low-frequency active sonar (LFAS) data to address the challenge of limited labelled datasets for machine learning (ML) in automatic target recognition (ATR). We present a framework that simulates four key sonar components separately: reverberation, ambient noise, endfire noise, and target echoes. These components are then combined into a realistic synthetic sonar display. To accurately account for multiple reflectors of a target, we follow a digital twin approach in which a frequency-modulated pulse is stimulated off 3D mesh models. After the creation of a large labelled synthetic dataset, we conducted a multiclass classification experiment aimed at training ML models to distinguish two distinct submarine classes from false alarms and from two additional object types: a surface vessel and a shipwreck. The two employed ML architectures yielded high accuracy under simulated conditions, indicating not only the value of synthetic data, but also the potential of ML for the design and development of robust and data-driven ATR applications using LFAS.
Improving object detection by modifying synthetic data with explainable AI
Nitish Mital, Simon Malzard, Richard Walters, et al.
Limited real-world data severely impacts model performance in many computer vision domains. Synthetically generated images are a promising solution, but 1) it remains unclear how to design synthetic training data to optimally improve model performance (e.g, whether and where to introduce more realism or more abstraction) and 2) the domain expertise, time and effort required from human operators represents a major practical challenge. Here, we use robust Explainable AI (XAI) techniques to guide a human-in-the-loop process of modifying 3D mesh models used to generate synthetic images. Importantly, this framework allows both modifications that increase and decrease realism in synthetic data, which can both improve model performance. We illustrate this for object detection on the ATR DSIAC infrared dataset and synthetic images generated from 3D mesh models in the Unity gaming engine, by finetuning YOLO series of models on different modifications to synthetic data guided by XAI to improve model performance.
Predicting pedestrian trajectories in outdoor environments using deep learning methods
Marilena Sinni, Dimitris M. Kyriazanos
Pedestrian trajectory prediction is a critical feature in applications such as urban planning, emergency response, and crowd management. While existing methods have been effective in controlled indoor environments, predicting pedestrian movement in outdoor spaces presents additional challenges due to dynamic obstacles, varying terrains, and fluctuating crowd densities. This paper explores deep learning approaches for next-location prediction by leveraging recurrent neural network architectures, including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU). We evaluate these methods using a synthetic dataset that mimics real-world pedestrian behaviour in an urban environment. Our findings demonstrate that deep learning models effectively capture long-term dependencies and movement patterns in pedestrian trajectory forecasting. The results contribute to improved pedestrian flow optimization, enhanced public safety strategies, and more accurate mobility forecasting in complex outdoor settings.
Data augmentation for vehicle detection with diffusion-based object inpainting
Sebastiaan P. Snel, Thijs A. Eker, Ella P. Fokkinga, et al.
Automated vehicle detection in video footage captured by Unmanned Aerial Vehicles (UAVs) is a critical capability in security and defense domains, especially for environments where communication is jammed. Development of deep learning-based object detectors for this purpose typically requires large-scale datasets, which can be hard to obtain due to limited access to relevant environments. To address this challenge, synthetic data has been proposed as a supplementary source of training data, introducing additional variations in the appearance and positioning of objects. One promising strategy for generating synthetic data is inpainting, where objects of interest are seamlessly integrated into various backgrounds. However, traditional inpainting techniques lack spatial and contextual awareness, limiting their effectiveness for data augmentation. Recent advancements in generative AI, specifically diffusion models, have demonstrated improvements in object harmonization and spatial control for object inpainting, enabling realistic foreground-background matching with a high level of diversity. In this work, we explore the value of diffusion-based inpainting as a data augmentation technique. We use the inpainting model AnyDoor to enrich a small subset (1000 frames), of the VisDrone train dataset with inpainted versions of minority-class objects (buses, vans, trucks). We train YOLOX detectors on datasets with increasing amounts of synthetic vehicles (1x, 5x, 10x, and 20x) and analyze the impact on detection performance. Results show that zero-shot inpainting can substantially improve detection for buses up to an augmentation factor of 10x, with no improvements at 20x. Effects for vans and trucks are mixed and sometimes negative. Fine-tuning AnyDoor provided limited additional benefit under the tested conditions. Overall, diffusion-based inpainting shows potential as a data augmentation strategy in low-resource UAV scenarios. Future work should explore strategies to increase contextual diversity, such as adding multiple synthetic objects per image or incorporating automated quality control for synthetic samples.
Generative AI for novel contexts and controllable dataset creation applied to vehicle detector model training
Nicolas Hueber, Alexander Pichler
This study highlights the potential of generative AI and Low Rank Adaptation to generate novel contexts for training vehicle detector models, with an emphasis on controlled dataset generation and high-level knowledge preservation. The focus is brought on the generation of new contexts, including aerial views, and military vehicles, which are less represented in open-access training sets. It is demonstrated that even with a small set of viewpoint images, relevant datasets may be created with a wide variety of new situations whereas controlling vehicle signatures over important operational parameters, such as view angles, object features and environmental conditions. The benefits of these training set generation are evaluated on a vehicle detection application from aerial top-down approaches of several vehicle types. The preliminary results demonstrate that such generative pipelines not only bridge critical dataset gaps but also substantially enhance detection performance, particularly for underrepresented operational scenarios.
Prescriptive maintenance for military vehicles using frugal learning
Antonio López-Almodóvar, Sergio Jorge Serra, Daniel Xu, et al.
In the realm of military operations, the reliability and operational readiness of vehicles are paramount. Unlike predictive maintenance, which forecasts potential failures, prescriptive maintenance extends the analysis by recommending specific actions to optimize maintenance schedules and enhance operational effectiveness. This paper presents a novel approach to prescriptive maintenance aimed at estimating the remaining useful life of military vehicle components. Three key challenges for achieving precise maintenance planning for operational awareness are: (i) limited availability of training data, (ii) the diversity of vehicle platforms and components, and (iii) the need for rapid adaptation of pre-trained models to new platforms. These challenges are addressed through the integration of frugal learning, meta-learning, and transfer learning. To mitigate data scarcity, a digital twin is developed to simulate mission-related data across various heavy-duty vehicle platforms and mission profiles. While real-world data are used to construct and calibrate the digital twin, only the synthetic data it generates are utilized for model training. To tackle the second and third challenges, this paper evaluates and compares several frugal learning strategies based on transfer learning and meta-learning. The proposed learning strategies leverage the synthetic data to predict the remaining useful life of the components, such as fuel consumption, used as a representative case study, while remaining applicable to other critical components. Experimental results demonstrate that meta-learning offers a favorable trade-off between adaptation time and performance across platforms, whereas transfer learning achieves higher overall prediction accuracy at the cost of increased fine-tuning time. The digital twin, which is constructed using OpenStreetMap-based trajectories, enables rapid adaptation to novel environments, thereby enhancing mission planning and vehicle readiness. This work contributes to a scalable, adaptable, and cost-effective AI-driven maintenance solution for defense applications, capable of supporting diverse operational scenarios with minimal data requirements.
Human-AI Collaboration and Advances in AI
icon_mobile_dropdown
Human-AI teaming co-learning in military operations
Clara Maathuis, Kasper Cools
In a time of rapidly evolving military threats and increasingly complex operational environments, the integration of AI into military operations proves significant advantages. At the same time, this implies various challenges and risks regarding building and deploying human-AI teaming systems in an effective and ethical manner. Currently, understanding and coping with them are often tackled from an external perspective considering the human-AI teaming system as a collective agent. Nevertheless, zooming into the dynamics involved inside the system assures dealing with a broader palette of relevant multidimensional responsibility, safety, and robustness aspects. To this end, this research proposes the design of a trustworthy co-learning model for human-AI teaming in military operations that encompasses a continuous and bidirectional exchange of insights between the human and AI agents as they jointly adapt to evolving battlefield conditions. It does that by integrating four dimensions. First, adjustable autonomy for dynamically calibrating the autonomy levels of agents depending on aspects like mission state, system confidence, and environmental uncertainty. Second, multi-layered control which accounts continuous oversight, monitoring of activities, and accountability. Third, bidirectional feedback with explicit and implicit feedback loops between the agents to assure a proper communication of reasoning, uncertainties, and learned adaptations that each of the agents has. And fourth, collaborative decision-making which implies the generation, evaluation, and proposal of decisions associated with confidence levels and rationale behind them. The model proposed is accompanied by concrete exemplifications and recommendations that contribute to further developing responsible and trustworthy human-AI teaming systems in military operations.
A workflow-oriented framework for asynchronous human-AI collaboration in hybrid and compute-intensive HPC environments
Sergio Mendoza, Cedric Bhihe, Natalia Zamora, et al.
Human involvement is critical in training and deploying AI systems in high-stakes defence and security contexts. However, real-time interaction is impractical in HPC environments due to compute intensity and resource constraints. We present a workflow framework that enables asynchronous human-AI collaboration across hybrid infrastructures, including HPC clusters, local machines, and cloud platforms. Workflows can pause at defined checkpoints for human input without halting underlying compute jobs, preventing idle resources and enabling non-blocking supervision. The framework supports interaction with SLURM-based scheduling, containerized and native tasks, and is customized for scenarios requiring human judgment and adaptability. We demonstrate its application in model training on systems like MareNostrum 5, highlighting benefits in portability, efficiency, and oversight in operational AI workflows.
Semi-automatic continuous adjustment of person detectors for public transport applications
Francisco Javier Iriarte, Luis Unzueta, Aitor Sarria
Deep Neural Network (DNN)-based person detection algorithms are widely used for safety and security applications in different scenarios due to their high accuracy and robustness. This kind of DNNs are trained with several thousands or even millions of images of person and background instances with a high variability in appearance in order to generalize the best possible way to all the scenarios. However, there is always a visual gap between the training data and the operational data that will reduce the detector’s on-site performance. Adjusting these DNNs by incorporating operational data to fine-tuning sessions is challenging and time-consuming as it requires carefully selecting and annotating the appropriate data samples to make the system work better over time, and not worse. This is especially challenging in public transport scenarios, like trains, buses and their stations where the people’s and the environment’s appearance can be very variate and dynamic. In this paper we present a semi-automatic model adjustment framework that combines Untrained AutoEncoders (UAE), statistical analysis, pre-trained multi-class object detection and annotation quality assessment methods to overcome these challenges. Using UAEs we reduce the dimension of the input data as an out-of-the-box preprocessing method, allowing us to identify data drift efficiently, while the pre-trained model and the assessment method accelerate the selection and labeling processes. Experiments on an in-bus dataset show that this approach can effectively adjust a person detection model in public transport. The framework reduces manual monitoring and contributes toward building more maintainable and reliable AI-powered surveillance systems, particularly in dynamic environments with low resources such as public transportation.
Efficient representation of 3D spatial data for defense-related applications
Benjamin Kahl, Marcus Hebel, Michael Arens
Geospatial sensor data is essential for modern defense and security, offering indispensable 3D information for situational awareness. This data, gathered from sources like lidar sensors and optical cameras, allows for the creation of detailed models of operational environments. In this paper, we provide a comparative analysis of traditional representation methods, such as point clouds, voxel grids, and triangle meshes, alongside modern neural and implicit techniques like Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS). Our evaluation reveals a fundamental trade-off: traditional models offer robust geometric accuracy ideal for functional tasks like line-of-sight analysis and physics simulations, while modern methods excel at producing high-fidelity, photorealistic visuals but often lack geometric reliability. Based on these findings, we conclude that a hybrid approach is the most promising path forward. We propose a system architecture that combines a traditional mesh scaffold for geometric integrity with a neural representation like 3DGS for visual detail, managed within a hierarchical scene structure to ensure scalability and performance.
Classification, Segmentation, and Scene Understanding
icon_mobile_dropdown
Improving classification accuracy in post-turbulence-mitigation tasks via perceptual loss optimization
David Vint, Gaetano Di Caterina, Robert A. Lamb
Image distortions caused by atmospheric turbulence have long posed a significant challenge for accurate imaging and observation. The ability to mitigate such distortions, without having to alter the imaging setup, is therefore extremely desirable. In recent years, deep learning based techniques for turbulence mitigation have achieved, and set, state-of-the-art performance. However, we propose that the definition of ‘State of the Art’, with regards to turbulence mitigation, may not be based on the most appropriate metrics. Unlike applications such as super resolution, or denoising, the goal of turbulence mitigation is not necessarily to provide a high quality, aesthetically pleasing image. The goal, rather, is more likely to be that of improving subsequent processing (i.e. post-mitigation) tasks. For example, when performing target detection or classification, it should be the improvement in performance of such post-mitigation tasks that measures the quality of a turbulence mitigation model. Literature commonly makes use of image quality metrics for the turbulence mitigation task; however, whilst these metrics may be informative, their use may result in ‘sub-optimal’ mitigation algorithms being promoted to generate turbulence-mitigated images that achieve high metric scores, but lack appropriate quality for downstream applications. Previous works showed that by training a turbulence mitigation model with classification in mind, the performance of the model could be improved. This paper proposes an improvement on the state-of-the-art model, DATUM, by incorporating perceptual loss into its training. By analysing its loss function, the model is better tuned towards turbulence mitigation, facilitating better post-mitigation performance in a classification task. Furthermore, this paper also presents an investigation into the computational requirements of such models, with the motivation of understanding applicability in the real world. This paper aims to steer turbulence mitigation research in the right direction, by highlighting the importance of downstream tasks as well as real-world deployability.
Occlusion robustness of CLIP for military vehicle classification
Jan Erik van Woerden, Gertjan Burghouts, Lotte Nijskens, et al.
Vision-language models (VLMs) like CLIP enable zero-shot classification by aligning images and text in a shared embedding space, offering advantages for defense applications where labeled data is scarce. However, CLIP is primarily trained on high-quality internet imagery. Its robustness in challenging military operational environments, characterized by factors like partial occlusion and degraded signal-to-noise ratio (SNR) because of obscurant objects or weather, remains underexplored. We investigate the robustness of several CLIP variants to occlusion, using a custom dataset of 18 military vehicle classes. We simulate both contiguous occlusions (slide blackout, bar occlusion) and dispersed occlusions (random rain, snow, grid dropout) to reflect real-world environmental challenges. Robustness is evaluated using Normalized Area Under the Curve (NAUC) across occlusion percentages. Four key insights emerge: (1) Fine-grained, dispersed occlusions (e.g., snow, rain) degrade performance more than larger contiguous occlusions—NAUC of 61.3% for dispersed occlusion vs 78.9% for contiguous occlusions for PE-Core-ViTL/ 14-336 (2) Transformer-based CLIP models consistently outperform CNN-based CLIP models, with ViT-B/16 achieving up to 22 percentage point higher NAUC than ResNet50; (3) Pre-training methodology significantly affects robustness— PE-Core models consistently outperform CLIPA counterparts at similar scales (e.g., +6.7pp NAUC at ~320M parameters), showing that improved pre-training augments robustness beyond scaling alone; (4) Fine-tuning introduces a trade-off— linear probing boosts clean-image accuracy (55.6%→88.0%) but reduces robustness under dispersed occlusions (snow NAUC 54.0%→36.0%), while full fine-tuning mitigates this effect (snow NAUC 44.5%) yet still falls short of zero-shot consistency. These results underscore the importance of occlusion-specific augmentations during training and the need for further exploration into patch-level sensitivity and architectural resilience for real-world deployment of CLIP.
From bounding boxes to semantic segmentation: leveraging SAM for weak supervision in remote sensing
Giuseppe Martino, Niccolò Camarlinghi, Antonio Di Tommaso, et al.
Semantic segmentation typically requires extensive pixel-level annotations, which are costly and time-consuming to obtain. This paper investigates the effectiveness of using the Segment Anything Model (SAM) for weakly supervised semantic segmentation of aerial and satellite imagery, utilizing only bounding box annotations. We present an approach that leverages SAM to generate pseudo ground truth annotations from bounding box prompts, which are then used to train the SegNeXT semantic segmentation model on the i-SAID dataset. Our method achieves results comparable to fully supervised training, with only a 4.2% decrease in mean Intersection over Union (mIoU). These findings demonstrate the potential of foundation models to reduce annotation costs while maintaining high performance in aerial image segmentation tasks.
Intelligent model mining in sensor data for military mission environment representation
L. Biersteker, E. Demeur, R. van der Meer, et al.
Military operations are in constant need of up-to-date information about the mission environment. For mission planning, rehearsal, and continuous decision support, detailed information is required on the terrain surface as well as all natural and man-made features. Our research aims to provide the military with a detailed 3D representation of these elements. In this paper, we present how electro-optical remote sensing data acquired by airborne platforms can be exploited to automatically model the mission environment. Our focus is on a model mining approach that aligns with the principles of generative neuro-symbolic artificial intelligence by integrating deep learning-based data analysis with a priori knowledge-based symbolic reasoning: a combination of artificial and real intelligence. We illustrate the approach by automatically extracting a power line infrastructure model from point cloud data.
Poster Session
icon_mobile_dropdown
Feature selection for enhanced IIoT security: a machine learning approach to intrusion detection
The Industrial Internet of Things (IIoT) has revolutionized smart manufacturing, automation, and critical infrastructure by enabling seamless device interconnectivity. However, this connectivity exposes IIoT systems to a wide spectrum of cyber threats, necessitating intelligent and adaptive intrusion detection mechanisms. In this study, we propose SMoFeLSTM—Sequential Modeling and Feature-Optimized Long Short-Term Memory—a deep learning-based intrusion detection framework tailored for multiclass attack classification in IIoT environments. The model incorporates meticulous preprocessing, including outlier elimination, label encoding, and temporal feature engineering, followed by MRMR-based feature selection and min–max normalization. SMoFeLSTM captures sequential dependencies using stacked LSTM layers trained with categorical cross-entropy loss and optimized via Adam. Experimental evaluations on the X-IIoTID dataset demonstrate superior performance, with an average accuracy of 96.97%, macro F1-score of 86.15%, and AUC of 98.42%, outperforming traditional and deep learning baselines. Class-wise results indicate robust detection of both frequent and rare attack categories. The proposed framework holds promise for scalable, explainable, and real-time IIoT security applications.
Enhancing financial security with machine learning-based credit card fraud detection
The increasing rate of credit card fraud is a major concern to the financial systems across the globe, and hence there is a need to come up with efficient and sustainable solutions for detection. This paper proposes a machine learning approach for credit card fraud detection using feature selection and deep learning to enhance the detection rate and speed. The proposed system employs Neighborhood Component Analysis (NCA) for feature selection to select the most important features that can improve the classification process and minimize computational complexity. The feature set is then passed through a Gated Recurrent Unit (GRU) network, a deep learning model that is effective in identifying temporal patterns in transaction data for fraud detection. The proposed framework is compared with other machine learning techniques on a standard credit card fraud detection dataset, and it outperforms the other techniques in terms of accuracy, precision, recall, and F1 score. The use of NCA for feature selection enhances the detection time and scalability of the system and therefore makes it suitable for real-time applications in high-traffic financial environments. This study shows that the integration of the proposed feature selection and deep learning can enhance financial security and help in the fight against credit card fraud.
VehicLens: an integrated AI system for vehicle classification and occupant analysis in border security applications
Georgios Stavropoulos, Alexandros Kalpazidis, Konstantinos Votis
As border security challenges continue to evolve, the need for effective inspection tools becomes increasingly critical. This paper introduces a novel system, named VehicLens, that integrates a custom-developed vehicle classification module with an integrated facial detection and gender categorization component. Additionally, a dataset created explicitly for training and testing the neural network responsible for the vehicle classification part of VehicLens is presented. Given that VehicLens is intended to be part of a Border Security Application, significant emphasis was placed on ensuring that the dataset contains realistic data to achieve robust performance under actual operational conditions. Therefore, the dataset comprises images of both new and primarily used vehicles, encompassing various conditions such as damages and modifications, rather than focusing solely on pristine new vehicles captured in ideal scenarios. Upon successful classification of the vehicle model by the system, the captured image undergoes further analysis via the facial detection and gender categorization parts of the tool. The advanced capabilities of these modules allow the system to predict the number of occupants inside the recognized vehicle, as well as their gender. The multi-modal analysis enables security personnel to identify potential security issues either through prior knowledge about incoming vehicles at border crossings or by identifying discrepancies between system predictions and the lists of pre-registered travelers provided by applications such as the one developed within the scope of the EINSTEIN project (GA.101121280). By working in conjunction with such pre-registration systems, VehicLens allows border guards to access critical information about a vehicle and its occupants prior to its arrival at the checkpoint. This early availability of data grants them additional time for inspection, ultimately enhancing the level of security while also reducing waiting times for travelers.
Textual inversion for efficient adaptation of open-vocabulary object detectors without forgetting
Recent progress in large pre-trained vision language models (VLMs) has reached state-of-the-art performance on several object detection benchmarks and boasts strong zero-shot capabilities, but for optimal performance on specific targets some form of finetuning is still necessary. While the initial VLM weights allow for great few-shot transfer learning, this usually involves the loss of the original natural language querying and zero-shot capabilities. Inspired by the success of Textual Inversion (TI) in personalizing text-to-image diffusion models, we propose a similar formulation for open-vocabulary object detection. TI allows extending the VLM vocabulary by learning new or improving existing tokens to accurately detect novel or fine-grained objects from as little as three examples. The learned tokens are completely compatible with the original VLM weights while keeping them frozen, retaining the original model’s benchmark performance, and leveraging its existing capabilities such as zero-shot domain transfer (e.g., detecting a sketch of an object after training only on real photos). The storage and gradient calculations are limited to the token embedding dimension, requiring significantly less compute than full-model fine-tuning. We evaluated whether the method matches or outperforms the baseline methods that suffer from forgetting in a wide variety of quantitative and qualitative experiments.
An AI-powered framework for streamlined image and audio annotations
Stefanos Vlachos, Vasileios Theiou, Christos Sgouropoulos, et al.
With AI applications scaling more rapidly and broadly than ever, there is a growing demand for large sets of high-quality labeled data. However, data annotation has traditionally been a time-consuming and laborintensive task, both for annotators, who elaborately submit annotations, and task supervisors, who manage and review the submitted work. This challenge has shifted research interest towards integrating AI methods into the annotation process, with the aim of reducing human effort. In this direction, we propose an end-to-end annotation pipeline, designed to support AI-assisted image and audio annotations. Role-based access to the app grants supervisors full control over the management of annotation tasks and distribution of annotation workload across multiple annotators. Its AI-centric architecture supports seamless integration of pre-trained AI models, which can assist annotators during the annotation process or even perform annotations fully autonomously, having a great impact in reducing effort. In the realm of annotation quality, a built-in mechanism provides analytics based on submitted annotations, by measuring annotator agreement and producing visual summaries of progress. Evaluated in complex annotation tasks, the proposed framework is proven to be highly effective in reducing annotation time and effort. As a future direction, the proposed framework provides a strong foundation for integrating active learning techniques, further minimizing annotation costs.
AI-enabled detection of vessels in distributed acoustic sensing (DAS) data using submarine fiber-optic cables
Alexander M. van Oers, Roos C. H. M. Dees
Europe's Critical Underwater Infrastructure (CUI) is both economically vital and vulnerable to sabotage. Recent incidents in the Baltic Sea suggest a deliberate strategy to exploit these vulnerabilities, by dragging anchors across seabed cables. The vast spatial extent of CUI makes such attacks difficult to detect and deter. Distributed Acoustic Sensing (DAS) enables continuous monitoring by detecting vibrations along submarine fiber-optic cables, offering maritime situational awareness. These vibrations are caused by surface vessels, submarines, or unmanned underwater vehicles. To enable real-time monitoring, we combined publicly available DAS and AIS datasets to label ship signals in the DAS data. Using this dataset, we trained a convolutional neural network (CNN) to detect vessels in large volumes of DAS data. We evaluated the CNN using two datasets, two data processing techniques, and augmentation strategies. Our CNN effectively distinguishes weak vessel signals from substantial background noise, demonstrating its potential for real-time maritime threat detection. Based on our findings, we offer recommendations for applying DAS-based monitoring to protect CUI in the North Sea.
Adaptive e-learning for image analysts: an xAPI-driven performance framework
Alexander Streicher, Daniela Altun
This paper explores the integration of Artificial Intelligence (AI) into adaptive assistance systems, focusing on e-learning applications for military image interpretation within the German Armed Forces. We propose a framework leveraging the Experience API (xAPI) to derive meaningful performance metrics by distinguishing between result performance and progress performance. The study addresses the implementation of Dynamic Difficulty Adjustment (DDA) in the serious game Spot-X, with the aim of enhancing engagement and training effectiveness. By introducing a multidimensional performance metric and modeling activity events as graphs, we provide a pathway for operationalizing adaptivity in military learning environments.
Frequency-domain transformation of audio signals for image-based encoding schemes
Yosef Golovachev, Eitan Katz, Neria Haimov, et al.
Steganography, the art of concealing information within seemingly innocuous cover media, has emerged as a critical technique for secure communication and data protection. This paper presents a new approach to improving the LSB steganography method, focusing on concealing an encryption key within a digital image using the Fourier transform and randomizing the data’s location. We apply this concept to increase the security of the LSB technique by hiding the key for the data in the Fourier transform of an image, using random non-data pixels. This method enables the use of encryption without prior knowledge of the key, since the key itself is embedded in the image with our proposed frequency-spectrum steganographic method. The scheme disrupts the regular pattern of LSB embedding, making data extraction more difficult and thereby increasing resistance to steganalysis and cryptanalysis, which enhances overall security. The unique qualities of this method can be useful for specific applications and may inspire further research. We implemented the proposal by embedding a short audio file in an RGB image using MATLAB.
Feature fusion for unified adversarial attack detection for object detection
Michel van Lier, Ian van de Poll, Hugo J. Kuijf
Integration of AI-based object detection has accelerated in a variety of defense applications, such as intelligence surveillance reconnaissance (ISR) and autonomous systems. However, the AI-based methods that enable these capabilities are inherently vulnerable to adversarial attacks, which are physical or digital input perturbations designed to mislead AI systems and induce incorrect or unpredictable behaviour. For object detection, such attacks can suppress or alter detections of critical targets, thereby undermining situational awareness, compromising mission integrity, or even disabling automated defense responses. Detecting these attacks is essential to the security of AI-driven military systems. In this work, we propose a unified adversarial attack detection approach, capable of identifying two distinct types of adversarial attacks for object detection, currently underexplored in the literature. We build on an existing method that leverages both local and global spatial image features to detect localized patch-based attacks. We extend this approach by introducing a fusion mechanism between these features to enable detection of white-box global and black-box local image perturbation attacks. Both the attacks target the state-of-the-art YOLOv10m object detector, trained on both the MS COCO and the military Automatic Target Recognition (ATR) object detection datasets. We show that with our feature fusion approach an attack detection accuracy of 99% can be achieved for both attacks, comparable to detectors specialized for single type attacks.
AI-enhanced imaging and multimodal detection of rare-earth fluorescence-based security features for document authentication and border control
Mehrnaz Taheri, Carlos Francisco Moreno-García, Craig Stewart, et al.
Advanced document forgeries using color-matched inks and counterfeit ultraviolet pigments pose significant challenges for border security, evading conventional optical scanners. This study introduces an AI-enhanced imaging framework for authenticating documents via rare-earth-based fluorescence security features, such as invisible waveguides, which are robust against replication. Our approach leverages deep learning to eliminate the need for specialized optics, enabling reliable detection with standard cameras. We developed a dataset of co-registered white-light and fluorescence image pairs, trained a Conditional Generative Adversarial Network (CGAN) to generate synthetic fluorescence from white-light inputs, and implemented a YOLO-v8-based detector for real-time identification of embedded security features. This pipeline achieves a signal-to-noise ratio gain of +8.5 dB and 97% detection accuracy, offering a scalable, cost-effective solution for border checkpoints. By integrating the physical resilience of rare-earth luminescence with AI adaptability, our framework enhances document authentication, strengthens anti-counterfeiting measures, and facilitates efficient passenger processing.