Arxiv 36

SkyEyeGPT Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model May 19, 2024
On the Foundations of Earth and Climate Foundation Models May 11, 2024
MMEarth Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning Apr 28, 2024
Change-Agent Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis Apr 22, 2024
LHRS-Bot Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model Apr 15, 2024
Charting New Territories Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs Apr 13, 2024
One for All Toward Unified Foundation Models for Earth Vision Apr 7, 2024
EarthGPT A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain Apr 1, 2024
Large Language Models for Captioning and Retrieving Remote Sensing Images Mar 12, 2024
SARATR-X A Foundation Model for Synthetic Aperture Radar Images Target Recognition Feb 20, 2024
SwiMDiff Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image Feb 18, 2024
H2RSVLM Towards Helpful and Honest Remote Sensing Large Vision Language Model Feb 17, 2024
Popeye A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery Jan 21, 2024
MTP Advancing Remote Sensing FoundationModel via Multi-Task Pretraining Jan 10, 2024
Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities Jan 5, 2024
DINO-MC Self-supervised Contrastive Learning for Remote Sensing Imagery with Multi-sized Local Crops Dec 23, 2023
FoMo-Bench a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models Nov 26, 2023
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications Oct 24, 2023
CtxMIM Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding Sep 23, 2023
USat A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery Jul 21, 2023
Foundation Models for Generalist Geospatial Artificial Intelligence Jul 13, 2023
A billion-scale foundation model for remote sensing images Jun 28, 2023
RSGPT A Remote Sensing Vision Language Model and Benchmark Jun 24, 2023
Predicting Gradient is Better Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture Jun 20, 2023
SatCLIP Global, General-Purpose Location Embeddings with Satellite Imagery May 22, 2023
Changes to Captions An Attentive Network for Remote Sensing Change Captioning Apr 28, 2023
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning Apr 22, 2023
RingMo-lite A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework Apr 13, 2023
Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing Mar 25, 2023
Tree-GPT Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis Mar 22, 2023
DeCUR decoupling common & unique representations for multimodal self-supervision Feb 19, 2023
Rsprompter Learning to prompt for remote sensing instance segmentation based on visual foundation model Feb 11, 2023
Good at captioning, bad at counting Benchmarking GPT-4V on Earth observation data Jan 20, 2023
Lightweight, Pre-trained Transformers for Remote Sensing Timeseries Jan 18, 2023
RS5M and GeoRSCLIP A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Jan 3, 2023
Self-supervised vision transformers for joint sar-optical representation learning Oct 13, 2022

Trending Tags

dataset 图像 paper Pretrain 图像、文本 Other VLM MLLM 视频 Agent