Leveraging Scene Graphs and Deep Learning Semantics for Enhanced GraphSLAM

Robotics
ACQUAL
Staff Involved
Additional Remarks

Students should possess strong programming skills in Python and C++, and familiarity with Linux and Docker is beneficial, though not required. The multi-sensor backpack system is equipped with 2 x VLP16 lidars, Ladybug5+ HD camera, BD990 Trimble GNSS, and XSens MTI 630R IMU.

Topic description

Simultaneous Localization and Mapping (SLAM) has been foundational for autonomous systems, with GraphSLAM being a prominent variant that optimizes mapping as a graph-based optimization problem. However, while GraphSLAM is adept at geometric mapping, it lacks in providing rich semantic information about the environment.

Deep learning has shown substantial capability in semantic extraction from sensor data. Scene graphs, representing structured relational information about a scene, can serve as an intermediary between the raw semantics extracted by deep learning and the spatial representations in GraphSLAM. By integrating these concepts, we aim to produce a more holistic mapping solution.

Topic objectives and methodology

The research seeks to combine deep learning-driven semantic extraction, scene graphs for structured environmental understanding, and GraphSLAM's spatial mapping. The goal is a semantically-rich map offering both detailed geometry and comprehensive environmental context. Lidar point clouds are colored with semantically labelled images. Following methods and the related papers are relevant:

Segment anything(SAM):            https://github.com/facebookresearch/segment-anything

Graph optimization, SLAM:   https://openslam-org.github.io/g2o.html

Hydra scene graphs:               https://arxiv.org/abs/2201.13360

Fast LIO:                                 https://github.com/hku-mars/FAST_LIO

  1. Data Collection: Acquire sensor data.
  2. Semantic Segmentation with Deep Learning: Employ SAM to segment and categorize environmental features from sensor data.
  3. Scene Graph Generation: Convert the semantic labels and relationships into a structured scene graph, capturing entities and their inter-relationships within the environment.
  4. GraphSLAM Integration with Scene Graphs: Integrate the scene graphs into the GraphSLAM optimization process, enabling semantically-aware nodes or edges within the mapping graph.
  5. Map Fusion: Combine the spatial data from GraphSLAM with the semantic relationships from the scene graphs to produce an integrated map.

Evaluation: Validate the combined system in diverse environments, comparing its performance to traditional GraphSLAM in terms of accuracy, semantic richness, and computational overhead.