OPUS Russian Open Speech To Text Dataset v1.01
Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin

folder ru_open_stt_opus (38 files)
filemanifests/tts_russian_addresses_rhvoice_4voices.csv 220.26MB
filemanifests/radio_v4_manifest.csv 515.81MB
filemanifests/radio_v4_add_manifest.csv 7.03MB
filemanifests/radio_pspeech_sample_manifest.csv 32.76MB
filemanifests/radio_2.csv 43.04MB
filemanifests/public_youtube700_val.csv 679.15kB
filemanifests/public_youtube700.csv 74.60MB
filemanifests/public_youtube1120_hq.csv 39.34MB
filemanifests/public_youtube1120.csv 141.83MB
filemanifests/public_speech_manifest.csv 132.35MB
filemanifests/public_series_1.csv 1.92MB
filemanifests/public_lecture_1.csv 660.11kB
filemanifests/private_buriy_audiobooks_2.csv 119.40MB
filemanifests/buriy_audiobooks_2_val.csv 744.95kB
filemanifests/asr_public_stories_2.csv 7.19MB
filemanifests/asr_public_stories_1.csv 4.84MB
filemanifests/asr_public_phone_calls_2.csv 60.34MB
filemanifests/asr_public_phone_calls_1.csv 26.39MB
filemanifests/asr_calls_2_val.csv 1.05MB
filearchives/tts_russian_addresses_rhvoice_4voices.tar.gz 13.86GB
filearchives/radio_v4_manifest.tar.gz 189.01GB
filearchives/radio_v4_add_manifest.tar.gz 3.04GB
filearchives/radio_pspeech_sample_manifest.tar.gz 12.27GB
filearchives/radio_2.tar.gz 26.45GB
filearchives/public_youtube700_val.tar.gz 469.33MB
filearchives/public_youtube700.tar.gz 13.09GB
filearchives/public_youtube1120_hq.tar.gz 5.31GB
filearchives/public_youtube1120.tar.gz 20.43GB
filearchives/public_speech_manifest.tar.gz 50.94GB
filearchives/public_series_1.tar.gz 319.23MB
filearchives/public_lecture_1.tar.gz 122.51MB
filearchives/private_buriy_audiobooks_2.tar.gz 27.74GB
filearchives/buriy_audiobooks_2_val.tar.gz 496.48MB
filearchives/asr_public_stories_2.tar.gz 1.50GB
filearchives/asr_public_stories_1.tar.gz 719.09MB
filearchives/asr_public_phone_calls_2.tar.gz 10.12GB
filearchives/asr_public_phone_calls_1.tar.gz 3.41GB
filearchives/asr_calls_2_val.tar.gz 805.25MB
Type: Dataset

Bibtex:
@article{,
title= {OPUS Russian Open Speech To Text Dataset v1.01},
journal= {},
author= {Anna Slizhikova and Alexander Veysov and Dilyara Nurtdinova and Dmitry Voronin},
year= {},
url= {https://github.com/snakers4/open_stt/},
abstract= {v1.0-beta 

Arguably the largest public Russian STT dataset up to date:
15m utterances;
20 000 hours;
2.3 TB (in mono .wav format in int16);

For more information please visit  https://github.com/snakers4/open_stt/},
keywords= {Dataset, russian, asr, stt, TTS},
terms= {https://github.com/snakers4/open_stt/#license},
license= {CC-NC-BY},
superseded= {}
}

No stats to report yet.

Send Feedback Start
   0.000007
DB Connect
   0.000453
Lookup hash in DB
   0.000368
Get torrent details
   0.000116
Get torrent details, finished
   0.000198
Get authors
   0.000021
Parse bibtex
   0.000071
Write header
   0.000314
get stars
   0.000100
home tab
   0.001330
render right panel
   0.000010
render ads
   0.000395
fetch current hosters
   0.000219
Start get stats
   0.000326
End get stats
   0.000001
related datasets
   0.004248
Done