Paper 14037-60
Frequency-aware open-vocabulary object detection for thermal imaging
Abstract
Open-vocabulary object detection (OVD) is a promising approach for scaling thermal imaging applications in real-world environments such as HVAC monitoring and industrial inspection. However, applying visible-light OVD foundation models to thermal imagery is non-trivial due to the modality gap, which includes distinct radiometric distributions and characteristic spatial-frequency content. While prompt-based adaptation can preserve the open-vocabulary capability of frozen detectors, existing visual prompting approaches typically do not explicitly account for frequency-specific properties of thermal images. In this work, we propose frequency-aware prompt tuning for thermal-domain OVD. Our method decomposes a thermal image into low-/high-frequency components and uses a compact frequency-aware U-net to generate an additive input-space prompt, improving structural perception while maintaining radiometric consistency. Experiments on the FLIR-IR dataset with YOLO-World show consistent gains over zero-shot inference and a strong modality-prompting baseline, particularly for small and low-contrast objects.
Presenter
Jia Qu
Mitsubishi Electric Corp (Japan)
Head researcher at Mitsubishi Electric Corporation, Japan, focusing on computer vision and thermal imaging applications.