Paper 14085-31
ViTCapsNets: hybrid vision transformers and capsule networks architecture for automated building damage assessment using multitemporal satellite imagery
15 April 2026 • 16:20 - 16:40 CEST | Luxembourg/Salon 2 (Niveau/Level 0)
Abstract
Rapid and accurate building damage assessment following natural disasters is critical for effective emergency response. Existing deep learning approaches predominantly rely on binary classification or standard convolutional architectures that discard pose and spatial relationship information through max-pooling. This paper presents ViTCapsNets, a family of hybrid architectures integrating Vision Transformer variants with Capsule Networks for four-class ordinal building damage assessment from bi-temporal satellite imagery. Within this family, we evaluate standard Vision Transformer (ViT) and Swin Transformer backbones — the latter a hierarchical extension of ViT replacing global self-attention with shiftedwindow attention — paired with a Capsule Network head employing dynamic routing to preserve equivariant spatial partwhole relationships. Applied to the xBD dataset across six disaster types, the SwinCapsNets variant achieves a Cohen's Kappa of 0.675 and Matthews Correlation Coefficient of 0.676 — metrics introduced in this work to complement the dataset's standard F1-score evaluation — with 42M parameters, outperforming all ViTCapsNets configurations. A multiscale preprocessing strategy across three input resolutions (32×32, 64×64, 128×128 pixels) mitigates the Modifiable Areal Unit Problem inherent to buildings of variable footprint size. Beyond the scientific contribution, we present ViDa Caps — a minimum viable product integrating the best-performing model into a QGIS plugin — as an example of responsible science that closes the loop between research and operational societal impact.
Presenter
Victor Hernández-Díaz
Posgrado en Ciencia e Ingeniería de la Computación,Universidad Nacional Autónoma de México (Mexico)
Víctor Hernández Díaz is a geospatial data engineer and analyst specializing in the application of artificial intelligence (AI), Computer Vision (CV) and data science to address challenges in territorial and environmental planning.
He holds a Master's degree in Computer Science and Engineering from the Institute of Applied Mathematics and Systems Research (IIMAS) at the National Autonomous University of Mexico (UNAM), as well as a B.Sc. in Geomatics Engineering from UNAM. His current research focuses on the development of innovative hybrid models for assessing structural damage in buildings caused by natural disasters.
With over seven years of experience in geospatial analysis and environmental modeling, Hernández Díaz has been a key collaborator on high-impact projects at the National Laboratory for Sustainability Sciences (LANCIS–UNAM) and has coordinated the geospatial data science team for territorial and marine planning programs.