Wikipedia Training Data for Megatron-LM

folder wikipedia_bin (2 files)
filewiki_text_sentence.bin 6.29GB
filewiki_text_sentence.idx 1.55GB
Type: Dataset
Tags: BERT; NLP;

Bibtex:
@article{,
title= {Wikipedia Training Data for Megatron-LM},
journal= {},
author= {},
year= {},
url= {},
abstract= {A preprocessed dataset for https://github.com/NVIDIA/Megatron-LM training. Please see instructions in https://github.com/Lyken17/ML-Datasets for how to use it.

Note: the author does not own any copyrights of the data. },
keywords= {BERT; NLP;},
terms= {},
license= {},
superseded= {}
}


Send Feedback Start
   0.000007
DB Connect
   0.000448
Lookup hash in DB
   0.000417
Get torrent details
   0.001152
Get torrent details, finished
   0.000242
Get authors
   0.000001
Select authors
   0.000165
Parse bibtex
   0.000066
Write header
   0.000168
get stars
   0.000138
home tab
   0.002239
render right panel
   0.000007
render ads
   0.000382
fetch current hosters
   0.000239
related datasets
   0.008232
Done