Efficient sampling design for deep Learning based mapping land use and land cover

M-GEO
ACQUAL
Additional Remarks

Required skills: 1) Programming skills in python. 2) Willingness to learn deep learning frameworks (TensorFlow, PyTorch) 3) Willingness to learn the manipulation of geographical data in Python (GDAL, geopandas, shapely, rasterio, etc)

Topic description

Understanding the types of vegetation that composes the land cover on non-anthropic areas and the land use for anthropic areas is essential to understand ecosystems dynamics and the impacts of human activities in the environment. This understanding allows us to qualify the biodiversity being lost by deforestation processes and also to estimate the amount of biomass being lost and its impact on the carbon cycle. Brazil produces large amounts of land use and land cover data for its biomes. Among those data, we can cite TerraClass and MapBiomas collections, which mapped several years of the dynamics of land use and land cover in the country. Such large amount of data has a considerable potential for training efficient deep learning models for mapping tasks. However, training a neural network with such amount of data, besides being impractical in most computational environments, may bring uncertainties to the classification, due to the noises in the label data. Active learning, meta-learning and other methods can be very useful to select the less uncertain samples, reducing the number of samples needed for classification, while keeping or even improving the classification accuracies. In this sense, this MSc topic focus on developing efficient automated sampling design methods for deep learning models to map land use and land cover with the least possible computational cost. For this topic, the student must have good programing skills.

Topic objectives and methodology

The student will initially revise the state of the art of semantic segmentation models, active learning, meta learning, and sampling design, aiming to find the most efficient and accurate models. The next step will be the comparison between those methods, and their optimization, by reducing the number of components, aiming in this way to achieve a better balance between the size of the sample set, the computational cost and accuracy.