SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

Author: Yang Zhan, Zhitong Xiong, Yuan Yuan

School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University

This is the official repository for paper "SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model". [paper] [SkyEye-968k]

Please share a STAR ⭐ if this project does help

You can focus on remote sensing multimodal large language model (Vision-Language) here

📢 Latest Updates

This is an ongoing project. We will be working on improving it.

📦 Chatbot, codebase, datasets, and models coming soon! 🚀
Jun-12-2024: RS instruction dataset SkyEye-968k is released. [huggingface] 🔥🔥
Jan-18-2024: paper is released. 🔥🔥
Jan-17-2024: A curated list about remote sensing multimodal large language model (Vision-Language) is created. 🔥🔥

💬 SkyEyeGPT: Remote Sensing Multi-modal Chatbot

The online demo will be released.

SkyEyeGPT: Architecture

The model and checkpoint are coming soon! 🚀

🌋 SkyEye-968k: Unified RS Vision-Language Instruction

The download link of the unified remote sensing vision-language instruction dataset is here! 🚀

Download link: https://huggingface.co/datasets/ZhanYang-nwpu/SkyEye-968k

📦 Performance

👁️ Visualization

1. Detailed description

2. Some testing samples of captioning, grounding, and VQA

👁️ Qualitative results

1. Remote Sensing Visual Grounding

2. Remote Sensing Phrase Grounding

3. Remote Sensing Image Captioning

4. UAV Aerial Video Captioning

5. Remote Sensing Visual Question Answering

6. Remote Sensing Referring Expression Generation

7. Remote Sensing Scene Classification

🔍 Quantitative results

1. Remote Sensing Image Captioning

2. UAV Aerial Video Captioning

3. Remote Sensing Visual Grounding

4. Remote Sensing Visual Question Answering

📜 Citation

@misc{zhan2024skyeyegpt,
      title={SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model}, 
      author={Yang Zhan and Zhitong Xiong and Yuan Yuan},
      year={2024},
      eprint={arXiv:2401.09712},
      archivePrefix={arXiv}
}

🙏 Acknowledgement

Our code is based on MiniGPT-4, shikra, and MiniGPT-v2. We sincerely appreciate their contributions and authors for releasing source codes. We are thankful to EVA and LLaMA2 for releasing their models as open-source contributions. I would like to thank Xiong zhitong and Yuan yuan for helping the manuscript. I also thank the School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University for supporting this work.

🤖 Contact

If you have any questions about this project, please feel free to contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

Please share a STAR ⭐ if this project does help

You can focus on remote sensing multimodal large language model (Vision-Language) here

📢 Latest Updates

💬 SkyEyeGPT: Remote Sensing Multi-modal Chatbot

SkyEyeGPT: Architecture

🌋 SkyEye-968k: Unified RS Vision-Language Instruction

📦 Performance

👁️ Visualization

1. Detailed description

2. Some testing samples of captioning, grounding, and VQA

👁️ Qualitative results

1. Remote Sensing Visual Grounding

2. Remote Sensing Phrase Grounding

3. Remote Sensing Image Captioning

4. UAV Aerial Video Captioning

5. Remote Sensing Visual Question Answering

6. Remote Sensing Referring Expression Generation

7. Remote Sensing Scene Classification

🔍 Quantitative results

1. Remote Sensing Image Captioning

2. UAV Aerial Video Captioning

3. Remote Sensing Visual Grounding

4. Remote Sensing Visual Question Answering

📜 Citation

🙏 Acknowledgement

🤖 Contact

About

Releases

Packages

ZhanYang-nwpu/SkyEyeGPT

Folders and files

Latest commit

History

Repository files navigation

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

Please share a STAR ⭐ if this project does help

You can focus on remote sensing multimodal large language model (Vision-Language) here

📢 Latest Updates

💬 SkyEyeGPT: Remote Sensing Multi-modal Chatbot

SkyEyeGPT: Architecture

🌋 SkyEye-968k: Unified RS Vision-Language Instruction

📦 Performance

👁️ Visualization

1. Detailed description

2. Some testing samples of captioning, grounding, and VQA

👁️ Qualitative results

1. Remote Sensing Visual Grounding

2. Remote Sensing Phrase Grounding

3. Remote Sensing Image Captioning

4. UAV Aerial Video Captioning

5. Remote Sensing Visual Question Answering

6. Remote Sensing Referring Expression Generation

7. Remote Sensing Scene Classification

🔍 Quantitative results

1. Remote Sensing Image Captioning

2. UAV Aerial Video Captioning

3. Remote Sensing Visual Grounding

4. Remote Sensing Visual Question Answering

📜 Citation

🙏 Acknowledgement

🤖 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages