Integrating Earth Observation (EO) with Large Language Models (LLMs): Towards A Multimodal EO-Language Model

GIMA
M-GEO
M-SE
STAMP
Staff Involved
M-SE Core knowledge areas
Spatial Information Science (SIS)
Technical Engineering (TE)
Additional Remarks

Dexterity in Python programming language is a must.

You must be willing to explore and learn new topics related to AI and Deep Learning. It's a tough topic, so do not come for it if you're not up for a challenge. 

This research will align with activities performed in a project funded by NWO.

Supervisor(s): Mahdi Farnaghi

Topic description

In the realm of artificial intelligence (AI), the advancement of large language models (LLM) like GPT-4, the base model of ChatGPT, has revolutionized natural language processing. Simultaneously, Earth Observation (EO) models have significantly impacted how we understand environmental phenomena. This proposal aims to explore the integration of these two powerful domains, creating a multimodal Large Language Model that synergizes the linguistic capabilities of LLMs with the spatial analysis strength of EO models.

 

Keywords: Large Language Models (LLMs), Deep Learning, Multimodality, Earth Observation

Topic objectives and methodology

Objectives

  1. To investigate the potential of combining LLMs with EO models for enhanced environmental data interpretation and understanding.
  2. To develop a prototype demonstrating the practical application of this integration in real-world scenarios.
  3. To lay the groundwork for further research and development in multimodal AI systems.

Methodology

The research will be explorative in nature, focusing on:

  • Literature Review and Theoretical Exploration:
    • Extensive literature review to understand the current state of LLMs and EO models, identifying potential synergies and integration challenges.
    • Extensive literature review to understand the current state of Multimodal LLMs (MLLMs) and the strategies to integrate different modalities.
  • Prototype Development: Designing and implementing a prototype that shows the possibility of combining LLMs with EO models. This will involve using available EO datasets, utilization of open source LLMs, development of integration mechanism, and multimodal training.
  • Testing and Analysis: Evaluating the prototype in a selected scenario, analyzing its performance, and identifying areas for improvement.

 

Supervisory team

References for further reading

[1]          Yin, S., Fu, C., Zhao, S., Li, K., Sun, X., Xu, T., and Chen, E. (2023). A Survey on Multimodal Large Language Models. arXiv preprint arXiv:2306.13549

[2]          Zhang, Y., Wei, C., Wu, S., He, Z., and Yu, W. (2023). GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT. arXiv preprint arXiv:2307.07930

[3]          Mai, G., Huang, W., Sun, J., Song, S., Mishra, D., Liu, N., Gao, S., Liu, T., Cong, G., and Hu, Y. (2023). On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv preprint arXiv:2304.06798

[4]          Liu, F., Chen, D., Guan, Z., Zhou, X., Zhu, J., and Zhou, J. (2023). RemoteCLIP: A Vision Language Foundation Model for Remote Sensing. arXiv preprint arXiv:2306.11029