Skip to content

GeoAI Datasets

Back to all datasets

GeoAI datasets combine remote sensing, environmental, and multimodal sources tailored for training and evaluating geospatial machine learning models (e.g., segmentation, classification, regression) across land cover, wildfire, disaster, and climate applications.

To access the datasets on RCAC clusters:

1
2
3
$ module avail
$ module load datasets
$ module avail geoai

Tips:

  • Use echo $ENV_NAME to check the environment value.
  • To see all environment variables related to a dataset, you can load the module then use: env | grep <DATASET_NAME>
  • To unload the module and remove the environment settings: module unload <DATASET_NAME>
  • Each dataset module sets environment variables (e.g., $<DATASET_NAME>_ROOTDIR, $<DATASET_NAME>_HOME, $RCAC_<DATASET_NAME>_ROOT, and $RCAC_<DATASET_NAME>_VERSION) that simplify dataset access and version management within your jobs and workflows.

GeoAI Datasets

Dataset Description
BioMassters Above Ground Biomass estimation dataset using multi-modal Sentinel-1 SAR and Sentinel-2 MSI satellite data
AerialImageDataset The Inria Aerial Image Labeling Dataset addresses the automatic pixelwise labeling of aerial imagery
burn_intensity This dataset contains burn scar intensity data and Harmonized Landsat and Sentinel-2 (HLS) images for burn scar analysis
geo-bench GEO-Bench: Toward Foundation Models for Earth Monitoring
gravity-wave-parameterization Data format description for the nonlocal gravity wave parameterization dataset
hls_burn_scars This dataset contains Harmonized Landsat and Sentinel-2 imagery of burn scars and the associated masks
hls_merra2_gppFlux This dataset consists of Harmonized Landsat and Sentinel-2 multispectral reflectance imagery and MERRA-2 observations
hurricane To evaluate the performance of Prithvi WxC on hurricanes, the surface and pressure data from the MERRA-2 dataset
Landslide4sense This dataset contains multispectral and elevation data for landslide detection
multi-temporal-crop-classification This dataset contains temporal Harmonized Landsat-Sentinel (HLS) imagery of diverse land cover and crop type classes
TerraMesh TerraMesh is a planetary-scale, multimodal analysis-ready dataset for Earth Observation foundation models