Common Crawl corpus - training-parallel-commoncrawl.tgz (CS-EN, DE-EN, ES-EN, FR-EN, RU-EN)

Name DL Added Torrents Total Size
Text [edit]
RSS CSV
32 233.75GB 263 0

Hosted by users:

Send Feedback Start
   0.000003
DB Connect
   0.000352
Lookup hash in DB
   0.000331
Get torrent details
   0.000106
Get torrent details, finished
   0.000187
Get authors
   0.000000
Select authors
   0.000125
Parse bibtex
   0.000050
Write header
   0.000165
get stars
   0.000120
collections tab
   0.000592
render right panel
   0.000003
render ads
   0.000343
fetch current hosters
   0.000423
Done