OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized
eukaryote31 and Joshua Peterson and Aaron Gokaslan and Vanya Cohen

folder data-owt (395 files)
fileowt262.npz 40.76MB
fileowt1.npz 40.44MB
fileowt2.npz 40.58MB
fileowt3.npz 40.61MB
fileowt4.npz 40.61MB
fileowt5.npz 40.61MB
fileowt6.npz 40.62MB
fileowt7.npz 40.40MB
fileowt8.npz 40.56MB
fileowt9.npz 40.58MB
fileowt10.npz 40.57MB
fileowt11.npz 40.62MB
fileowt12.npz 40.58MB
fileowt13.npz 40.49MB
fileowt14.npz 40.56MB
fileowt15.npz 40.56MB
fileowt16.npz 40.53MB
fileowt17.npz 40.58MB
fileowt18.npz 40.57MB
fileowt19.npz 40.56MB
fileowt20.npz 40.55MB
fileowt21.npz 40.50MB
fileowt22.npz 40.64MB
fileowt23.npz 40.53MB
fileowt24.npz 40.59MB
fileowt25.npz 40.55MB
fileowt26.npz 40.66MB
fileowt27.npz 40.54MB
fileowt28.npz 40.54MB
fileowt29.npz 40.51MB
fileowt30.npz 40.57MB
fileowt31.npz 40.60MB
fileowt32.npz 40.54MB
fileowt33.npz 40.42MB
fileowt34.npz 40.70MB
fileowt35.npz 40.65MB
fileowt36.npz 40.67MB
fileowt37.npz 40.41MB
fileowt38.npz 40.55MB
fileowt39.npz 40.56MB
fileowt40.npz 40.56MB
fileowt41.npz 40.58MB
fileowt42.npz 40.60MB
fileowt43.npz 40.51MB
fileowt44.npz 40.51MB
fileowt45.npz 40.28MB
fileowt46.npz 40.60MB
fileowt47.npz 40.52MB
fileowt48.npz 40.50MB
Too many files! Click here to view them all.
Type: Dataset
Tags:

Bibtex:
@article{,
title= {OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized},
journal= {},
author= {eukaryote31 and Joshua Peterson and Aaron Gokaslan and Vanya Cohen},
year= {},
url= {},
abstract= {Code by eukaryote31 and Joshua Peterson: https://github.com/jcpeterson/openwebtext and https://github.com/eukaryote31/openwebtext

Scraped by Aaron Gokaslan and Vanya Cohen: https://skylion007.github.io/OpenWebTextCorpus/

Tokenized by eukaryote31},
keywords= {},
terms= {},
license= {},
superseded= {}
}


Hosted by users:

Send Feedback Start
   0.000005
DB Connect
   0.000419
Lookup hash in DB
   0.000406
Get torrent details
   0.000129
Get torrent details, finished
   0.000212
Get authors
   0.000023
Parse bibtex
   0.000068
Write header
   0.000206
get stars
   0.000130
home tab
   0.000397
render right panel
   0.000006
render ads
   0.000402
fetch current hosters
   0.000419
Done