The Pile An 800GB Dataset of Diverse Text for Language Modeling
EleutherAI

Info hash0d366035664fdf51cfbe9f733953ba325776e667
Last mirror activity449d,07:57:06 ago
Size772.89GB (772,891,257,239 bytes)
Added2021-03-01 01:37:09
Views499
Hits1112
ID4618
Typemulti
Downloaded465 time(s)
Uploaded bygravatar.com icon for user joecohen
FolderEleutherAI_ThePile_v1
Num files51 files
File list
[Hide list]
PathSize
README.txt0.10kB
pile/SHA256SUMS.txt2.78kB
pile/test.jsonl.zst460.25MB
pile/train/00.jsonl.zst15.24GB
pile/train/01.jsonl.zst15.21GB
pile/train/02.jsonl.zst15.21GB
pile/train/03.jsonl.zst15.19GB
pile/train/04.jsonl.zst15.19GB
pile/train/05.jsonl.zst15.21GB
pile/train/06.jsonl.zst15.26GB
pile/train/07.jsonl.zst15.31GB
pile/train/08.jsonl.zst15.23GB
pile/train/09.jsonl.zst15.22GB
pile/train/10.jsonl.zst15.23GB
pile/train/11.jsonl.zst15.22GB
pile/train/12.jsonl.zst15.26GB
pile/train/13.jsonl.zst15.21GB
pile/train/14.jsonl.zst15.22GB
pile/train/15.jsonl.zst15.28GB
pile/train/16.jsonl.zst15.27GB
pile/train/17.jsonl.zst15.31GB
pile/train/18.jsonl.zst15.31GB
pile/train/19.jsonl.zst15.28GB
pile/train/20.jsonl.zst15.21GB
pile/train/21.jsonl.zst15.31GB
pile/train/22.jsonl.zst15.30GB
pile/train/23.jsonl.zst15.29GB
pile/train/24.jsonl.zst15.19GB
pile/train/25.jsonl.zst15.20GB
pile/train/26.jsonl.zst15.20GB
pile/train/27.jsonl.zst15.22GB
pile/train/28.jsonl.zst15.22GB
pile/train/29.jsonl.zst15.22GB
pile/val.jsonl.zst470.91MB
pile_preliminary_components/2020-09-08-arxiv-extracts-nofallback-until-2007-068.tar.gz17.48GB
pile_preliminary_components/EuroParliamentProceedings_1996_2011.jsonl.zst1.48GB
pile_preliminary_components/FreeLaw_Opinions.jsonl.zst17.01GB
pile_preliminary_components/Literotica.jsonl.zst4.43GB
pile_preliminary_components/NIH_ExPORTER_awarded_grant_text.jsonl.zst630.78MB
pile_preliminary_components/PMC_extracts.tar.gz28.28GB
pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst6.90GB
pile_preliminary_components/PhilArchive.jsonl.zst797.71MB
pile_preliminary_components/books1.tar.gz2.40GB
pile_preliminary_components/books3.tar.gz39.52GB
pile_preliminary_components/github.tar113.35GB
pile_preliminary_components/hn.tar.gz706.52MB
pile_preliminary_components/openwebtext2.jsonl.zst.tar29.34GB
pile_preliminary_components/pile_uspto.tar11.79GB
pile_preliminary_components/stackexchange_dataset.tar36.80GB
pile_preliminary_components/ubuntu_irc_until_2020_9_1.jsonl.zst2.04GB
pile_preliminary_components/yt_subs.jsonl.zst1.78GB
Mirrors0 complete, 0 downloading = 0 mirror(s) total [Log in to see full list]


Send Feedback Start
   0.000006
DB Connect
   0.000383
Lookup hash in DB
   0.000810
Get torrent details
   0.000644
Get torrent details, finished
   0.001025
Get authors
   0.000058
Parse bibtex
   0.000512
Write header
   0.000695
get stars
   0.000554
target tab
   0.000026
Request peers
   0.000611
Write table
   0.001740
geoloc peers
   0.000076
home tab
   0.002036
render right panel
   0.000037
render ads
   0.000078
fetch current hosters
   0.000657
Done