OpenWebText-urls-26M-filtered.xz
eukaryote and jcpeterson

OpenWebText-urls-26M-filtered.xz 480.28MB
Type: Dataset

Bibtex:
@article{,
title= {OpenWebText-urls-26M-filtered.xz},
journal= {},
author= {eukaryote and jcpeterson},
year= {},
url= {https://github.com/eukaryote31/openwebtext},
abstract= {Every outbound reddit link from before 31. Dec 2018 with at least 3 karma. The list is filtered to remove image sites, non-scraper-friendly sites, and other media files. },
keywords= {WebText, Reddit, gpt2},
terms= {},
license= {},
superseded= {}
}

No stats to report yet.

Send Feedback Start
   0.000008
DB Connect
   0.000464
Lookup hash in DB
   0.000383
Get torrent details
   0.000118
Get torrent details, finished
   0.000268
Get authors
   0.000018
Parse bibtex
   0.000060
Write header
   0.000192
get stars
   0.000122
home tab
   0.000149
render right panel
   0.000007
render ads
   0.000366
fetch current hosters
   0.000215
Start get stats
   0.000499
End get stats
   0.000001
related datasets
   0.003135
Done