Wikipedia Training Data for Megatron-LM

folder wikipedia_bin (2 files)
filewiki_text_sentence.bin 6.29GB
filewiki_text_sentence.idx 1.55GB
Type: Dataset
Tags: BERT; NLP;

Metadata:
@article{,
title= {Wikipedia Training Data for Megatron-LM},
journal= {},
author= {},
year= {},
url= {},
abstract= {A preprocessed dataset for https://github.com/NVIDIA/Megatron-LM training. Please see instructions in https://github.com/Lyken17/ML-Datasets for how to use it.

Note: the author does not own any copyrights of the data. },
keywords= {BERT; NLP;},
terms= {},
license= {},
superseded= {}
}

Citation:
Wikipedia Training Data for Megatron-LM. (2021). [Data set]. Academic Torrents. https://academictorrents.com/details/b6215a898a2a08b6061d23f2e4e1094121fb7082

Send Feedback Start
   0.000006
DB Connect
   0.000429
Lookup hash in DB
   0.000418
Get torrent details
   0.000254
Get torrent details, finished
   0.000248
Get authors
   0.000001
Select authors
   0.000220
Parse bibtex
   0.000120
Write header
   0.000252
get stars
   0.000231
home tab
   0.000262
render right panel
   0.000004
render ads
   0.000396
fetch current hosters
   0.000210
related datasets
   0.008280
Done