RCAC Datasets¶
Browse datasets available on Purdue RCAC clusters. This page links to the main dataset categories and shows how to discover datasets on the system.
Getting Started¶
To see which datasets are available on the system, run:
After loading the module, you can check the datasets available in a specific category such as ai, hydrological, or meteorological:
Dataset locations
Public datasets are accessible at the following paths:
- Anvil:
/anvil/datasets - Community clusters (Gilbreth, Negishi, Bell, Gautschi, and others):
/depot/datasets
These paths are accessible to all cluster users with read-only permissions.
Use the DATASETS_DIR environment variable
After running module load datasets, the base dataset path is automatically stored in the DATASETS_DIR environment variable for convenient access in your shell, jobs, and workflows.
Browse Dataset Categories¶
- AI Datasets
- Climate Model Datasets
- Covariates Datasets
- GeoAI Datasets
- Geospatial Datasets
- Hydrological Datasets
- Meteorological Datasets
Complete Dataset Catalog¶
For a filterable table of all datasets, see the Complete Dataset Catalog.
Requesting New Datasets¶
Can't find the dataset you need?
If you need a dataset that is not currently hosted, submit a request through the appropriate support channel below. We welcome suggestions for new datasets that would benefit the RCAC research community.
For Gautschi, Gilbreth, Negishi, Bell, and other Purdue clusters
For Anvil's ACCESS allocations
For Anvil's NAIRR Pilot allocations
What to include in your request
Providing complete information helps us evaluate and process your request efficiently:
| Information | Why it's needed |
|---|---|
| Justification | Explain why you need this dataset, how it supports your research, and its relevance to the broader community. Our goal is to host datasets that serve multiple users or groups, so requests limited to a single use case may be deprioritized. |
| Public availability | Is the dataset publicly accessible? Are there license restrictions? |
| Research description | Brief summary of your research project and goals |
| Dataset link | URL or reference to access or download the dataset |
| Publication reference | Citations or publications describing the dataset (if available) |
| Dataset size | Total size (GB/TB) for storage capacity planning |