Skip to content

fast.ai

Back to AI datasets

Field Value
Description fast.ai datasets are a curated collection of commonly used machine learning datasets that are:
- Pre-hosted (mostly on AWS Open Data)
- Standardized in format
- Integrated directly into the fastai library
- Their goal is to remove the friction of finding, downloading, and preprocessing data so you can focus on modeling.

Dataset Include:

- image_classification_datasets: CALTECH_101, CIFAR_100, CUB_200_2011, FOOD, MNIST, FLOWERS, PETS, CARS
- image_localization_datasets: LSUN_BEDROOMS, BIWI_HEAD_POSE, CAMVID, CAMVID_TINY, PASCAL_2007, PASCAL_2012
- kaggle_competitions_download_dogs_vs_cats: DOGS
- main_datasets: ADULT_SAMPLE, BIWI_SAMPLE, CIFAR, COCO_SAMPLE, COCO_TINY, HUMAN_NUMBERS, IMAGENETTE, IMAGENETTE_160, IMAGENETTE_320, IMAGEWANG, IMAGEWANG_160, IMAGEWANG_320, IMAGEWOOF, IMAGEWOOF_160, IMAGEWOOF_320, IMDB, IMDB_SAMPLE, MNIST_SAMPLE, MNIST_TINY, MNIST_VAR_SIZE_TINY, ML_SAMPLE, PLANET_SAMPLE, PLANET_TINY
- nlp_datasets: AG_NEWS, AMAZON_REVIEWS, AMAZON_REVIEWS_POLARITY, DBPEDIA, MT_ENG_FRA, SOGOU_NEWS, WIKITEXT, WIKITEXT_TINY, YAHOO_ANSWERS, YELP_REVIEWS, YELP_REVIEWS_POLARITY
- pretrained_models: OPENAI_TRANSFORMER, WT103_BWD, WT103_FWD
- skin_lesion_datasets: SIIM_SMALL, TCGA_SMALL
Folder /datasets/ai/fast.ai
Discipline AI / Machine Learning
DOI 10.3390/info11020108
Link Access Data
Public True
Publication Date 2020-02-11
Downloaded 2026-03-07
Data Type compressed tar archive (tgz)
Dataset Size 71G (compressed)
Number of Files - caltech_101.tgz: 9248
- cifar100.tgz: 60243
- CUB_200_2011.tgz: 12005
- food-101.tgz: 101121
- mnist_png.tgz: 70023
- oxford-102-flowers.tgz: 8194
- oxford-iiit-pet.tgz: 25869
- stanford-cars.tgz: 16189
- bedroom.tgz: 307494
- biwi_head_pose.tgz: 31455
- camvid.tgz: 1408
- camvid_tiny.tgz: 204
- pascal_2007.tgz: 10398
- pascal_2012.tgz: 25451
- dogscats.tgz: 37541
- adult_sample.tgz: 5
- biwi_sample.tgz: 203
- cifar10.tgz: 60024
- coco_sample.tgz: 21841
- coco_tiny.tgz: 203
- human_numbers.tgz: 3
- imagenette2-160.tgz: 13420
- imagenette2-320.tgz: 13418
- imagenette2.tgz: 13418
- imagewang-160.tgz: 26382
- imagewang-320.tgz: 26382
- imagewang.tgz: 26382
- imagewoof2-160.tgz: 12978
- imagewoof2-320.tgz: 12978
- imagewoof2.tgz: 12978
- imdb_sample.tgz: 2
- imdb.tgz: 100027
- mnist_sample.tgz: 14442
- mnist_tiny.tgz: 1439
- mnist_var_size_tiny.tgz: 1440
- movie_lens_sample.tgz: 2
- planet_sample.tgz: 1003
- planet_tiny.tgz: 203
- ag_news_csv.tgz: 5
- amazon_review_full_csv.tgz: 4
- amazon_review_polarity_csv.tgz: 4
- dbpedia_csv.tgz: 5
- giga-fren.tgz: 3
- sogou_news_csv.tgz: 5
- wikitext-103.tgz: 3
- wikitext-2.tgz: 3
- yahoo_answers_csv.tgz: 5
- yelp_review_full_csv.tgz: 4
- yelp_review_polarity_csv.tgz: 4
- transformer.tgz: 3
- wt103-bwd.tgz: 3
- wt103-fwd.tgz: 3
- siim_small.tgz: 255
- tcga_small.tgz: 120
Usage
$ module avail
$ module load datasets
$ module load ai/fast.ai/2020-02-11
Usage Policy Link
Usage Policy
Citation To cite fast.ai datasets, you should cite the original dataset creator (e.g., ImageNet, Oxford-IIIT) and acknowledge the fast.ai/AWS collection. https://docs.fast.ai/data.external.html
BibTeX
📜 View BibTeX citation
@article{howard2020fastai,
title={fastai: A Layered API for Deep Learning},
author={Howard, Jeremy and Gugger, Sylvain},
journal={Information},
volume={11},
number={2},
pages={108},
year={2020},
publisher={MDPI},
doi={10.3390/info11020108}
}