Semantic Segmentation of Multi-line Raw LiDAR Scans
Given the recent advancements in segmentation, we are also considering more modern techniques, such as Mask4Former, a transformer-based model that generates accurate segmentation masks. This method offers significant improvements in handling occlusions and boundary details, potentially making it ideal for both our XTREME EU project’s complex indoor environments and forest data from the LANS project.
Furthermore, 4D-PLS, which focuses on spatiotemporal segmentation, has been identified as a strong candidate for incorporating temporal data into the segmentation process. This method is particularly useful for cases where motion distortions are pronounced, such as in dynamic environments or when objects move through the LiDAR field of view. However, the need for dynamic handling may not be large.
For natural environments, such as forests, methods proposed by Kaijaluoto et al. (2022) are being considered due to their focus on deep learning applied directly to raw laser scanner measurements. This approach has shown promise in environments with complex vegetation structures, making it a potential choice for forest datasets.
The choice of methods will depend on further discussions with the student, where we will evaluate the performance of these techniques on specific datasets and experiment with combinations to determine which best suits the unique challenges posed by multi-line LiDAR data.
Image from: Lehtola, Ville V., et al. "Preregistration classification of mobile LIDAR data using spatial correlations." IEEE transactions on geoscience and remote sensing 57.9 (2019): 6900-6915.
The segmentation of multi-line raw LiDAR scans is a challenging and crucial problem in 3D scene understanding. This research focuses on the semantic segmentation of raw LiDAR data, specifically treating the data as cylindrical range images. In this representation, the cylindrical projection of the point clouds facilitates the use of image-based semantic segmentation techniques, but the data is affected by motion distortions inherent to the LiDAR acquisition process.
The key challenge lies in dealing with these motion distortions, which affect the local and global structure of the range image. Standard segmentation techniques struggle with such distortions, making the design of invariant local descriptors essential. To capture both local features and long-range correlations within the distorted data, the project proposes to introduce a deep network that can learn these correlations effectively. The choice of whether to use transformers is discussed as a part of the project.
Relevant Literature and Methods:
In tackling the semantic segmentation of multi-line LiDAR data, several methods are being considered based on their strengths and applicability to the challenges at hand. For car segmentation tasks, Rangenet++ (Milioto et al., 2019) has been identified as a promising approach due to its ability to perform fast and accurate LiDAR segmentation using cylindrical range image projections. This method aligns well with the thesis' approach of treating LiDAR data as a range image, making it a natural candidate for testing on the ITC building data and other urban scenes.
Additionally, Patchwork++ is another urban-focused method that the student has explored. This method is known for its robustness in urban environments, providing reliable segmentation under noisy conditions, which could be beneficial when working with multi-line LiDAR data from densely populated areas.