hls_merra2_gppFlux¶
| Field | Value |
|---|---|
| Description | Dataset Summary: This dataset consists of Harmonized Landsat and Sentinel-2 multispectral reflectance imagery and MERRA-2 observations centered around eddy covariance flux towers and the corresponding Gross Primary Productivity (GPP) data at the towers. Its purpose is to serve as a finetuning dataset for geospatial foundation models for the task of regressing GPP flux observations from HLS and MERRA-2 data. Dataset Structure: The dataset consists of: (1) HLS 6-band Tiff files of dimension 50x50x6, with the center of the chip colocated with flux tower locations, (2) 10-dimensional vector of MERRA-2 variables for each chip (1x1x10) recording temperature, soil moisture, heat flux, radiation, precipitation at the flux towers, (3) Daily GPP data derived from the eddy covariance measurements using the night-time partitioning approach at 37 flux tower sites distributed globally spanning 2018 to 2021. There are a total of 975 instances. MERRA-2 data and GPP flux observations are recorded as csv files, with a row corresponding to each HLS chip. HLS Band Order: 1, Blue, B02 2, Green, B03 3, Red, B04 4, NIR, B8A 5, SW 1, B11 6, SW 2, B12 MERRA-2 observations: [M2T1NXSLV] T2MIN, [M2T1NXSLV] T2MAX, [M2T1NXSLV] T2MEAN, [M2T1NXSLV] TSMDEWMEAN, [M2T1NXLND] GWETROOT, [M2T1NXLND] LHLAND, [M2T1NXLND] SHLAND, [M2T1NXLND] SWLAND, [M2T1NXLND] PARDFLAND, [M2T1NXLND] PRECTOTLAND Data Splits: The dataset consists of 975 chips and we split the dataset based on years to create train test splits. Given the relatively small size of the dataset, we use a leave-one-year-out-cross-validation approach to train and evaluate. The number of observations vary across years. In this repo, we have used three years for training and one year as test. |
| Folder | /datasets/geoai/ibm-nasa-geospatial/hls_merra2_gppFlux |
| Discipline | GeoAI / Remote Sensing / Climate Science |
| DOI | |
| Link | Access Data |
| Public | True |
| Publication Date | 2024-10-25 |
| Downloaded | 2025-09-10 |
| Data Type | GeoTIFF |
| Dataset Size | 982M |
| Number of Files | 1988 |
| Usage | $ module avail |
| Usage Policy Link | https://choosealicense.com/licenses/cc-by-4.0/ |
| Usage Policy | This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. The CC BY 4.0 license permits sharing, adaptation, and use of the dataset for both research and commercial purposes, provided that appropriate credit is given to the original authors. Users must include attribution to the IBM–NASA Geospatial team and cite the dataset’s Hugging Face URL when using it in publications, derived models, or applications. |
| Citation | IBM NASA Geospatial. (2024). HLS MERRA-2 GPP Flux Dataset (v1.0) [Dataset]. Hugging Face. https://huggingface.co/datasets/ibm-nasa-geospatial/hls_merra2_gppFlux |
| BibTeX | 📜 View BibTeX citation@dataset{ibm_nasa_hls_merra2_gppflux_2024, |