Michael Yang, Francesco Nex
Scene Understanding with UAVs
student should be able to use a suitable programming language, e.g. PyTorch
In Photogrammetry and Computer Vision, the generation of dense point clouds from stereo pairs have been a hot research topic for the last three decades. The state-of-the-art approaches have shown very accurate and reliable results in most of the performed tests. Algorithms like the Semi-Global matching have been successfully implemented in commercial software and are able to provide good quality results. However, the 3D reconstruction is still an open research problem. Untextured regions, repetitive patterns and bad illuminations still represent an enormous obstacle to be solved for these algorithms. The recent introduction of Convolutional Neural Networks has provided big improvements in many applications such as image segmentation, classification, object tracking, etc. Some contributions have also investigated the use of deep learning for the 3D reconstruction using stereo images, showing very promising results. The aim of this MSc proposal is to investigate the use of the CNN for detailed 3D reconstructions. In particular, both UAV and terrestrial image sets will be considered in this work.
The student will initially revise the existing literature on image matching algorithms using CNN as backbone. The use of already existing code and available datasets to train these algorithms will be the starting point of the work. As starting point, the student will focus on the use of existing algorithms in orderto familiarize with the algorithms. Then, some provided datasets of airborne UAV images will be used for the same purpose. In this case, the images (as well as the training point clouds) will be provided by the supervisors. Then, the student will need to develop new deep learning algorithm, e.g. novel CNN network architecture, or novel Transformer-based architecture. The main challenges will be to develop, implement and train the algorithms. The expected output will be a trained CNN/Transformer able to provide a depth map from a set of overlapping images.