Wikipedia Training Data for Megatron-LM

folder wikipedia_bin (2 files)
filewiki_text_sentence.bin 6.29GB
filewiki_text_sentence.idx 1.55GB
Type: Dataset
Tags: BERT; NLP;

Bibtex:
@article{,
title= {Wikipedia Training Data for Megatron-LM},
journal= {},
author= {},
year= {},
url= {},
abstract= {A preprocessed dataset for https://github.com/NVIDIA/Megatron-LM training. Please see instructions in https://github.com/Lyken17/ML-Datasets for how to use it.

Note: the author does not own any copyrights of the data. },
keywords= {BERT; NLP;},
terms= {},
license= {},
superseded= {}
}


Send Feedback Start
   0.000002
DB Connect
   0.000453
Lookup hash in DB
   0.001205
Get torrent details
   0.003915
Get torrent details, finished
   0.000762
Get authors
   0.000001
Select authors
   0.001972
Parse bibtex
   0.000185
Write header
   0.000558
get stars
   0.000427
home tab
   0.011241
render right panel
   0.000031
render ads
   0.000077
fetch current hosters
   0.000574
Done