Wikipedia Training Data for Megatron-LM

folder wikipedia_bin (2 files)
filewiki_text_sentence.bin 6.29GB
filewiki_text_sentence.idx 1.55GB
Type: Dataset
Tags: BERT; NLP;

Metadata:
@article{,
title= {Wikipedia Training Data for Megatron-LM},
journal= {},
author= {},
year= {},
url= {},
abstract= {A preprocessed dataset for https://github.com/NVIDIA/Megatron-LM training. Please see instructions in https://github.com/Lyken17/ML-Datasets for how to use it.

Note: the author does not own any copyrights of the data. },
keywords= {BERT; NLP;},
terms= {},
license= {},
superseded= {}
}

Citation:
Wikipedia Training Data for Megatron-LM. (2021). [Data set]. Academic Torrents. https://academictorrents.com/details/b6215a898a2a08b6061d23f2e4e1094121fb7082
No stats to report yet.

Send Feedback Start
   0.000008
DB Connect
   0.000535
Lookup hash in DB
   0.000458
Get torrent details
   0.000142
Get torrent details, finished
   0.000285
Get authors
   0.000002
Select authors
   0.000199
Parse bibtex
   0.000134
Write header
   0.000257
get stars
   0.000141
home tab
   0.000266
render right panel
   0.000007
render ads
   0.000510
fetch current hosters
   0.000232
Start get stats
   0.000407
End get stats
   0.000001
related datasets
   0.008863
Done