Wikipedia Training Data for Megatron-LM

folder wikipedia_bin (2 files)
filewiki_text_sentence.bin 6.29GB
filewiki_text_sentence.idx 1.55GB
Type: Dataset
Tags: BERT; NLP;

Bibtex:
@article{,
title= {Wikipedia Training Data for Megatron-LM},
journal= {},
author= {},
year= {},
url= {},
abstract= {A preprocessed dataset for https://github.com/NVIDIA/Megatron-LM training. Please see instructions in https://github.com/Lyken17/ML-Datasets for how to use it.

Note: the author does not own any copyrights of the data. },
keywords= {BERT; NLP;},
terms= {},
license= {},
superseded= {}
}

No stats to report yet.

Send Feedback Start
   0.000007
DB Connect
   0.000423
Lookup hash in DB
   0.000380
Get torrent details
   0.000121
Get torrent details, finished
   0.000220
Get authors
   0.000001
Select authors
   0.000145
Parse bibtex
   0.000073
Write header
   0.000239
get stars
   0.000138
home tab
   0.001698
render right panel
   0.000018
render ads
   0.000413
fetch current hosters
   0.000205
Start get stats
   0.001080
End get stats
   0.000002
related datasets
   0.008121
Done