Skip to content

RCAC Datasets

Browse datasets available on Purdue RCAC clusters. This page links to the main dataset categories and shows how to discover datasets on the system.

Getting Started

To see which datasets are available on the system, run:

$ module avail
$ module load datasets

After loading the module, you can check the datasets available in a specific category such as ai, hydrological, or meteorological:

$ module avail <category>

Dataset locations

Public datasets are accessible at the following paths:

  • Anvil: /anvil/datasets
  • Community clusters (Gilbreth, Negishi, Bell, Gautschi, and others): /depot/datasets

These paths are accessible to all cluster users with read-only permissions.

Use the DATASETS_DIR environment variable

After running module load datasets, the base dataset path is automatically stored in the DATASETS_DIR environment variable for convenient access in your shell, jobs, and workflows.

Browse Dataset Categories

Complete Dataset Catalog

For a filterable table of all datasets, see the Complete Dataset Catalog.

Requesting New Datasets

Can't find the dataset you need?

If you need a dataset that is not currently hosted, submit a request through the appropriate support channel below. We welcome suggestions for new datasets that would benefit the RCAC research community.

For Gautschi, Gilbreth, Negishi, Bell, and other Purdue clusters

Submit a ticket to RCAC

For Anvil's ACCESS allocations

Submit a ticket to ACCESS support

For Anvil's NAIRR Pilot allocations

Submit a ticket to NAIRR support

What to include in your request

Providing complete information helps us evaluate and process your request efficiently:

Information Why it's needed
Justification Explain why you need this dataset, how it supports your research, and its relevance to the broader community. Our goal is to host datasets that serve multiple users or groups, so requests limited to a single use case may be deprioritized.
Public availability Is the dataset publicly accessible? Are there license restrictions?
Research description Brief summary of your research project and goals
Dataset link URL or reference to access or download the dataset
Publication reference Citations or publications describing the dataset (if available)
Dataset size Total size (GB/TB) for storage capacity planning