paper 95
- GeoChat Grounded Large Vision-Language Model for Remote Sensing
- Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
- SkyEyeGPT Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
- Generic Knowledge Boosted Pre-training ForRemote Sensing Images
- Self-Supervised Spatio-Temporal Representation Learning of Satellite Image Time Series
- On the Foundations of Earth and Climate Foundation Models
- GeoLLM Extracting Geospatial Knowledge from Large Language Models
- MMEarth Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
- Change-Agent Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
- S2MAE A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
- RingMo A Remote Sensing Foundation Model With Masked Image Modeling
- LHRS-Bot Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
- Charting New Territories Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
- Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
- RemoteCLIP A Vision Language Foundation Model for Remote Sensing
- One for All Toward Unified Foundation Models for Earth Vision
- LeMeViT Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
- EarthGPT A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
- RS-LLaVA Large Vision Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery
- Large Language Models for Captioning and Retrieving Remote Sensing Images
- GeoLLM-Engine A Realistic Environment for Building Geospatial Copilots
- SkyScript A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
- SpectralGPT Spectral Foundation Model
- SARATR-X A Foundation Model for Synthetic Aperture Radar Images Target Recognition
- Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation
- SwiMDiff Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image
- H2RSVLM Towards Helpful and Honest Remote Sensing Large Vision Language Model
- Bridging Remote Sensors with Multisensor Geospatial Foundation Models
- Evaluating Tool-Augmented Agents in Remote Sensing Platforms
- Popeye A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery
- MTP Advancing Remote Sensing FoundationModel via Multi-Task Pretraining
- SkySense A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
- Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities
- Remote Sensing ChatGPT Solving Remote Sensing Tasks with ChatGPT and Visual Models
- DINO-MC Self-supervised Contrastive Learning for Remote Sensing Imagery with Multi-sized Local Crops
- Scale-MAE A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
- 遥感基础模型发展综述与未来设想
- Multi-source interactive stair attention for remote sensing image captioning
- Multi Modal Multi Objective Contrastive Learning for Sentinel 1-2 Imagery
- FoMo-Bench a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models
- Language-aware domain generalization network for cross-scene hyperspectral image classification
- EarthPT a foundation model for Earth Observation
- CSP Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations
- On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications
- An Empirical Study of Remote Sensing Pretraining
- The Potential of Visual ChatGPT for Remote Sensing
- CtxMIM Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding
- Change-Aware Sampling and Contrastive Learning for Satellite Images
- CMID A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding
- USat A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery
- TOV The Original Vision Model for Optical Remote Sensing Image Understanding via Self-Supervised Learning
- Foundation Models for Generalist Geospatial Artificial Intelligence
- Towards Geospatial Foundation Models via Continual Pretraining
- A billion-scale foundation model for remote sensing images
- RSGPT A Remote Sensing Vision Language Model and Benchmark
- A Self-Supervised Cross-Modal Remote Sensing Foundation Model with Multi-Domain Representation and Cross-Domain Fusion
- Predicting Gradient is Better Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture
- Cross-Scale MAE A Tale of Multiscale Exploitation in Remote Sensing
- Vlca vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning
- CROMA Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders
- SatCLIP Global, General-Purpose Location Embeddings with Satellite Imagery
- SatlasPretrain A Large-Scale Dataset for Remote Sensing Image Understanding
- Changes to Captions An Attentive Network for Remote Sensing Change Captioning
- Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
- RingMo-lite A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework
- TOV The original vision model for optical remote sensing image understanding via self-supervised learning
- Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing
- Tree-GPT Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis
- DeCUR decoupling common & unique representations for multimodal self-supervision
- Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
- Rsprompter Learning to prompt for remote sensing instance segmentation based on visual foundation model
- Good at captioning, bad at counting Benchmarking GPT-4V on Earth observation data
- Lightweight, Pre-trained Transformers for Remote Sensing Timeseries
- S-CLIP Semi-supervised Vision-Language Learning using Few Specialist Captions
- RS5M and GeoRSCLIP A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
- GeoCLIP Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization
- Semantic segmentation of remote sensing images with self-supervised semantic-aware inpainting
- Self-Supervised Learning for Invariant Representations from Multi-Spectral and SAR Images
- Consecutive Pre-Training A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain
- Self-supervised vision transformers for joint sar-optical representation learning
- Geographical Knowledge-Driven RepresentationLearning for Remote Sensing Images
- SatMAE Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
- Transforming remote sensing images to textual descriptions
- Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks
- Global and Local Contrastive Self-Supervised Learning for Semantic Segmentation of HR Remote Sensing Images
- Advancing plain vision transformer toward remote sensing foundation model
- Multi-source remote sensing pretraining based on contrastive self-supervised learning
- Geography-aware self-supervised learning
- On Creating Benchmark Dataset for Aerial Image Interpretation Reviews, Guidances, and Million-AID
- Seasonal ContrastUnsupervised Pre-Training from Uncurated Remote Sensing Data
- Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding
- Remote Sensing Image Scene Classification with Self-Supervised Paradigm under Limited Labeled Samples
- Tile2Vec Unsupervised representation learning for spatially distributed data
- BIGEARTHNET A LARGE-SCALE BENCHMARK ARCHIVE FOR REMOTE SENSINGIMAGE UNDERSTANDING
- Functional Map of the World