Paper 14040-6

UniMD3D: a multidataset training framework for point cloud semantic scene understanding

29 April 2026 • 10:20 AM - 10:40 AM EDT | Chesapeake 5

Abstract

Point cloud semantic scene understanding has emerged as a key research area due to the widespread use of point clouds as a natural representation for LiDARs, depth cameras, and other 3D sensors, as well as their ability to preserve precise geometric structure and enable rich spatial reasoning. Advances in 3D semantic scene understanding therefore have broad potential to improve perception across robotics, autonomous driving, and spatial computing applications. Despite significant progress, most deep learning models for point cloud classification, segmentation, and detection are trained on single datasets representing homogeneous environments collected using a single sensor. This training paradigm limits generalization, weakens robustness in diverse operational settings, and often leads models to overfit to sensor-specific characteristics, resulting in brittle performance when applied to new environments. In this work, we present the Unified Multi-Dataset 3D (UniMD3D) framework, an open-sourced framework for synergistic training on multiple point cloud datasets and tasks. The framework addresses the challenges of varied feature inputs and label discrepancies between datasets, and provides a modular design, allowing quick evaluation of backbone models and simple substitution of new datasets. Additionally, our framework provides task heads for classification, segmentation, and detection, allowing simultaneous training on a wide selection of datasets. By pretraining on multiple datasets spanning diverse environments and sensors, our framework fosters the development of more generalized and transferable 3D perception models, with larger vocabulary sets and improved performance in data-limited scenarios. We demonstrate the effectiveness of our approach through experiments on single-dataset, multi-dataset, and cross-task training settings, with quantitative comparisons to existing methods. Our code is available under an MIT license at https://github.com/andrewyarovoi/UniMD3D.

Presenter

Andrew Yarovoi

Georgia Institute of Technology (United States)

Andrew Yarovoi is a fifth-year Robotics Ph.D. student at the Georgia Institute of Technology. He earned his M.S. in Computer Science in 2024 and B.S. in Mechanical Engineering in 2021, both from Georgia Tech. He is currently a Graduate Research Assistant in the Electro-Optical Systems Laboratory (EOSL), where his work focuses on 3D perception, LiDAR SLAM, and point cloud processing.

Application tracks: AI/ML

Presenter/Author

Andrew Yarovoi

Georgia Institute of Technology (United States)

Author

Christopher R. Valenta

Georgia Tech Research Institute (United States)