Regularizing neural semantic fields for 3D semantic understanding of scenes with sparse input views

M-GEO
Robotics
ACQUAL
Staff Involved
Additional Remarks

Programming skill is also mandatory (preferably Python), experience with deep learning frameworks is highly preferred (PyTorch, Keras, Tensorflow, and JAX).

Topic description

In the past couple of years, Neural Radiance Fields (NeRF) [1] and methods derived from it have become a popular choice for solving different tasks such as novel view synthesis, 3D reconstruction, generative modeling, and semantic segmentation. Both 3D reconstruction and semantic segmentation are key tasks for different geospatial applications, e.g. creating maps using unmanned aerial vehicles (UAV) and photogrammetry. The main difference between such a geospatial use case and a typical NeRF application is the density/sparsity of the input views, i.e. most mapping applications are done over large geographical extents resulting to scenes or subscenes with sparse input views. This MSc topic will design and implement a method based on NeRF for joint 3D reconstruction and semantic segmentation from sparse input views.

Topic objectives and methodology

The student will initially revise existing literature on NeRF for 3D reconstruction and semantic segmentation. Focus will be on generalizing and extending methods that addresses the input view sparsity problem [2] and methods producing semantic information [3]. Core experiments will be done on a stereo image pair capturing a small subscene of a Dutch urban area. Future challenges will be to scale and integrate the method across all the subscenes of the entire dataset.

References for further reading

[1] Ben Mildenhall, Pratul Srinivasan, Matthew Tancik, Jonathan Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV, pp. 405–421. Springer, 2020.

[2] M. Niemeyer, J. T. Barron, B. Mildenhall, M. S. M. Sajjadi, A. Geiger, and N. Radwan, “RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs,” 2022.

[2] S. Vora et al., “NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes,” Transactions on Machine Learning Research, 2022.