Russian Educational Text Collection
nyuuzyou

folder main (3 files)
filedocuments.parquet 136.15MB
filepresentations.parquet 168.06MB
fileREADME.md 1.91kB
Type: Dataset
Tags:

Bibtex:
@article{,
title= {Russian Educational Text Collection},
journal= {},
author= {nyuuzyou},
year= {},
url= {https://huggingface.co/datasets/nyuuzyou/edutexts},
abstract= {# Dataset Card for Russian Educational Text Collection

### Dataset Summary
This dataset contains approximately 1.38M educational texts primarily in Russian with some content in Ukrainian and English. The content is extracted from presentations and documents, including educational presentations, essays, and various academic documents covering diverse topics from natural sciences to literature.

### Languages
- Russian (ru) - primary language
- Ukrainian (uk) - secondary language
- English (en) - secondary language

With Russian being the predominant language in the dataset, while Ukrainian and English content appears less frequently.

## Dataset Structure

### Data Fields
The dataset is split into two parquet files:
- presentations (1,335,171 entries):
  - `title`: Title of the presentation (string)
  - `slide_text`: Array of slide contents (list of strings)

- documents (47,474 entries):
  - `title`: Title of the document (string)
  - `document_text`: Full text content of the document (string)

## Additional Information

### License
This dataset is dedicated to the public domain under the Creative Commons Zero (CC0) license. This means you can:
* Use it for any purpose, including commercial projects
* Modify it however you like
* Distribute it without asking permission
No attribution is required, but it's always appreciated!},
keywords= {},
terms= {},
license= {Creative Commons Zero (CC0)},
superseded= {}
}

10 day statistics (1 downloads)
Average Time 14 hrs, 05 mins, 52 secs
Average Speed 5.99kB/s
Best Time 14 hrs, 05 mins, 52 secs
Best Speed 5.99kB/s
Worst Time 14 hrs, 05 mins, 52 secs
Worst Speed 5.99kB/s

Send Feedback Start
   0.000007
DB Connect
   0.000473
Lookup hash in DB
   0.000395
Get torrent details
   0.000129
Get torrent details, finished
   0.000242
Get authors
   0.000030
Parse bibtex
   0.000131
Write header
   0.000268
get stars
   0.000117
home tab
   0.000245
render right panel
   0.000010
render ads
   0.000429
fetch current hosters
   0.000233
Start get stats
   0.000358
End get stats
   0.000002
related datasets
   0.001932
Done