Common Crawl corpus - training-parallel-commoncrawl.tgz (CS-EN, DE-EN, ES-EN, FR-EN, RU-EN)

Name DL Torrents Total Size
Text [edit]
RSS CSV
32 233.75GB 193 0
Hosted by users:

Send Feedback Start
   0.000003
DB Connect
   0.000479
Lookup hash in DB
   0.001720
Get torrent details
   0.000520
Get torrent details, finished
   0.001011
Get authors
   0.000003
Select authors
   0.002021
Parse bibtex
   0.000259
Write header
   0.000838
get stars
   0.000781
collections tab
   0.002355
render right panel
   0.000048
render ads
   0.000091
fetch current hosters
   0.011491
Done