The Pile An 800GB Dataset of Diverse Text for Language Modeling
EleutherAI

Info hash0d366035664fdf51cfbe9f733953ba325776e667
Last mirror activity1035d,13:21:43 ago
Size772.89GB (772,891,257,239 bytes)
Added2021-03-01 01:37:09
Views1053
Hits1112
ID4618
Typemulti
Downloaded465 time(s)
Uploaded by gravatar.com icon for user joecohen
FolderEleutherAI_ThePile_v1
Num files51 files
File list [Hide list]
README.txt 0.10kB
pile/SHA256SUMS.txt 2.78kB
pile/test.jsonl.zst 460.25MB
pile/train/00.jsonl.zst 15.24GB
pile/train/01.jsonl.zst 15.21GB
pile/train/02.jsonl.zst 15.21GB
pile/train/03.jsonl.zst 15.19GB
pile/train/04.jsonl.zst 15.19GB
pile/train/05.jsonl.zst 15.21GB
pile/train/06.jsonl.zst 15.26GB
pile/train/07.jsonl.zst 15.31GB
pile/train/08.jsonl.zst 15.23GB
pile/train/09.jsonl.zst 15.22GB
pile/train/10.jsonl.zst 15.23GB
pile/train/11.jsonl.zst 15.22GB
pile/train/12.jsonl.zst 15.26GB
pile/train/13.jsonl.zst 15.21GB
pile/train/14.jsonl.zst 15.22GB
pile/train/15.jsonl.zst 15.28GB
pile/train/16.jsonl.zst 15.27GB
pile/train/17.jsonl.zst 15.31GB
pile/train/18.jsonl.zst 15.31GB
pile/train/19.jsonl.zst 15.28GB
pile/train/20.jsonl.zst 15.21GB
pile/train/21.jsonl.zst 15.31GB
pile/train/22.jsonl.zst 15.30GB
pile/train/23.jsonl.zst 15.29GB
pile/train/24.jsonl.zst 15.19GB
pile/train/25.jsonl.zst 15.20GB
pile/train/26.jsonl.zst 15.20GB
pile/train/27.jsonl.zst 15.22GB
pile/train/28.jsonl.zst 15.22GB
pile/train/29.jsonl.zst 15.22GB
pile/val.jsonl.zst 470.91MB
pile_preliminary_components/2020-09-08-arxiv-extracts-nofallback-until-2007-068.tar.gz 17.48GB
pile_preliminary_components/EuroParliamentProceedings_1996_2011.jsonl.zst 1.48GB
pile_preliminary_components/FreeLaw_Opinions.jsonl.zst 17.01GB
pile_preliminary_components/Literotica.jsonl.zst 4.43GB
pile_preliminary_components/NIH_ExPORTER_awarded_grant_text.jsonl.zst 630.78MB
pile_preliminary_components/PMC_extracts.tar.gz 28.28GB
pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst 6.90GB
pile_preliminary_components/PhilArchive.jsonl.zst 797.71MB
pile_preliminary_components/books1.tar.gz 2.40GB
pile_preliminary_components/books3.tar.gz 39.52GB
pile_preliminary_components/github.tar 113.35GB
pile_preliminary_components/hn.tar.gz 706.52MB
pile_preliminary_components/openwebtext2.jsonl.zst.tar 29.34GB
pile_preliminary_components/pile_uspto.tar 11.79GB
pile_preliminary_components/stackexchange_dataset.tar 36.80GB
pile_preliminary_components/ubuntu_irc_until_2020_9_1.jsonl.zst 2.04GB
pile_preliminary_components/yt_subs.jsonl.zst 1.78GB
Mirrors0 complete, 0 downloading = 0 mirror(s) total [Log in to see full list]

No stats to report yet.

Send Feedback Start
   0.000010
DB Connect
   0.000591
Lookup hash in DB
   0.000529
Get torrent details
   0.000136
Get torrent details, finished
   0.000309
Get authors
   0.000031
Parse bibtex
   0.000162
Write header
   0.000306
get stars
   0.000163
target tab
   0.000008
Request peers
   0.000337
Write table
   0.000384
geoloc peers
   0.000005
render right panel
   0.000006
render ads
   0.000519
fetch current hosters
   0.000224
Start get stats
   0.000338
End get stats
   0.000002
related datasets
   0.001951
Done