Deep learning techniques to optimise and improve 3D scene graphs
Students should have strong programming skills in Python and C++. Experience with deep learning frameworks is highly preferred (PyTorch OR Tensorflow).
The problem of 3D spatial perception involves the real-time construction and maintenance of a comprehensive and actionable representation of the environment using sensor data and prior knowledge. While advancements have been made in robot perception, existing methods primarily focus on purely geometric maps (such as traditional Simultaneous Localization and Mapping - SLAM) or "flat" metric-semantic maps that struggle to handle large environments or extensive semantic labelling. Hierarchical representations offer efficiency in terms of storage and result in layered graphs. These characteristics enable efficient inference procedures, providing provable computational efficiency.
We designed an algorithm based on the Hydra framework called mono-hydra to work with monocular image inputs to build a 3d hierarchical representation called "3D scene graph". This algorithm works in a real-time setting, so the generated mesh's quality is compromised. This thesis proposal aims to improve such a representation based on novel machine-learning approaches
- Rosinol, A., Gupta, A., et al. (2020) 3D dynamic scene graphs: Actionable spatial perception with places, objects, and humans, arXiv.org. Available at: https://doi.org/10.48550/arXiv.2002.06289
- Wu, C.-Y. et al. (2023) Multiview compressive coding for 3D reconstruction, arXiv.org. Available at: https://doi.org/10.48550/arXiv.2301.08247
- Pavllo, D. et al. (2021) Learning generative models of textured 3D meshes from real-world images, arXiv.org. Available at: https://arxiv.org/abs/2103.15627