I am currently pursuing the Ph.D. degree with the School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi’an, China.
Vision and Language, Large Language Model, Multimodal Machine Learning, AI for Remote Sensing, and Data Mining.
- 🚀 ……
- 🚀 SkyEyeGPT (天眼GPT, SkyEye-968k) [Paper][Code][Dataset]
- 🚀 Mono3DVG (Mono3DRefer) AAAI'24 [AAAI Paper][ArXiv Paper][Code][Dataset][AAAI Video/Poster][Baidu Poster][Baidu PPT]
- 🚀 PE-RSITR (MRS-Adapter) T-GRS'23 [Paper][Code][Dataset]
- 🚀 RSVG (DIOR-RSVG) T-GRS'23 [Paper][Code][Dataset]
- 🚀 STMGCN T-ITS'22 [Paper]
- 🚀 MVFFNet PRLetters'21 [Paper]
🔥 [……]:
🔥 [2024]: Remote sensing multimodal large language model is an ongoing project. We will be working on improving it.
🔥 [2024/1]: SkyEyeGPT now is available at arXiv.
- This work explores the remote sensing multimodal large language model (vision-language). We meticulously curate a high-quality remote sensing multi-modal instruction tuning dataset, including single-task and multi-task conversation instructions, namely SkyEye-968k. We develop SkyEyeGPT, which unifies remote sensing vision-language tasks and breaks new ground in enabling the unified modeling of remote sensing vision and LLM. Experiments on 8 datasets for remote sensing vision language tasks demonstrate SkyEyeGPT’s superiority in image-level and region-level tasks. Specially, it has shown encouraging results in some tests, compared with GPT-4V.
🔥 [2024/1]: A curated list about Remote Sensing Multimodal Large Language Model (Vision-Language) is created.
🔥 [2023/12]: Propose the Mono3DVG task and construct the Mono3DRefer dataset(accepted by AAAI2024)!
- For intelligent systems and robots, understanding objects based on language expressions in real 3D scenes is an important capability for human-machine interaction. However, existing 2D visual grounding cannot capture the true 3D extent of the referred objects. 3D visual grounding requires laser radars or RGB-D sensors, which greatly limits its application scenarios due to the expensive cost and device limitations. Monocular 3D object detection is low-cost and has strong applicability, but it cannot localize specific objects. We introduce a novel task of 3D visual grounding in monocular RGB images using language descriptions with appearance and geometry information. We create Mono3DRefer, which is the first dataset that leverages the ChatGPT to generate descriptions. We believe Mono3DVG can be widely applied since it does not require strict conditions such as RGB-D sensors, LiDARs, or industrial cameras. The application scenarios are wide, such as drones, surveillance systems, intelligent vehicles, robots, and other devices equipped with cameras.
🔥 [2023/08]: Propose a novel PE-RSITR task and provide empirical studies(accepted by T-GRS)!
- This work explores the parameter-efficient transfer learning for remote sensing image-text retrieval. Our proposed MRS-Adapter reduces 98.9% of fine-tuned parameters and its performance exceeds traditional methods by 7%~13%.
🔥 [2023/02]: Propose the RSVG task and construct the DIOR-RSVG dataset(accepted by T-GRS)!
- This work explores the visual grounding for remote sensing domain. The DIOR-RSVG takes DIOR dataset as the data source and is built using an automatic generation algorithm with manual verification. A novel transformer-based MGVLF model is devised to solve problems of the cluttered background and scale variation of RS images.
🔥 [2022/08]: Propose a STMGCN for vessel traffic flow prediction(accepted by T-ITS)!
- This work explores multi-graph convolutional network for vessel traffic flow prediction. Due to the difference between water traffic and land traffic, we propose a big data-driven maritime traffic network extraction algorithm to construct a "road network". We then design a STMGCN to make full use of maritime graphs and multi-graph learning (including distance graph, interaction graph, and correlation graph).
🔥 [2021/08]: Propose a MVFFNet for imbalanced ship classification(accepted by PRLetters)!
- Journal Reviewer:
- IEEE Transactions on Geoscience and Remote Sensing (T-GRS)
- Neural Networks (NEUNET)
- IEEE Geoscience and Remote Sensing Letters (IEEE GRSL)
- Pattern Recognition Letters (PRLETTERS)
- Journal of Supercomputing (J SUPERCOMPUT)
- Computers and Electrical Engineering (COMPELECENG)
- IET Intelligent Transport Systems (IET ITS)
Email: [email protected]