Common Crawl corpus - training-parallel-commoncrawl.tgz (CS-EN, DE-EN, ES-EN, FR-EN, RU-EN)

Name DL Torrents Total Size
Text [edit]
RSS CSV
32 233.75GB 206 0
Hosted by users:

Send Feedback Start
   0.000012
DB Connect
   0.001300
Lookup hash in DB
   0.001357
Get torrent details
   0.000469
Get torrent details, finished
   0.000785
Get authors
   0.000002
Select authors
   0.000456
Parse bibtex
   0.000156
Write header
   0.000762
get stars
   0.000503
collections tab
   0.002094
render right panel
   0.000009
render ads
   0.001659
fetch current hosters
   0.001498
Done