Russian Educational Text Collection
nyuuzyou

folder main (3 files)
filedocuments.parquet 136.15MB
filepresentations.parquet 168.06MB
fileREADME.md 1.91kB
Type: Dataset
Tags:

Metadata:
@article{,
title= {Russian Educational Text Collection},
journal= {},
author= {nyuuzyou},
year= {},
url= {https://huggingface.co/datasets/nyuuzyou/edutexts},
abstract= {# Dataset Card for Russian Educational Text Collection

### Dataset Summary
This dataset contains approximately 1.38M educational texts primarily in Russian with some content in Ukrainian and English. The content is extracted from presentations and documents, including educational presentations, essays, and various academic documents covering diverse topics from natural sciences to literature.

### Languages
- Russian (ru) - primary language
- Ukrainian (uk) - secondary language
- English (en) - secondary language

With Russian being the predominant language in the dataset, while Ukrainian and English content appears less frequently.

## Dataset Structure

### Data Fields
The dataset is split into two parquet files:
- presentations (1,335,171 entries):
  - `title`: Title of the presentation (string)
  - `slide_text`: Array of slide contents (list of strings)

- documents (47,474 entries):
  - `title`: Title of the document (string)
  - `document_text`: Full text content of the document (string)

## Additional Information

### License
This dataset is dedicated to the public domain under the Creative Commons Zero (CC0) license. This means you can:
* Use it for any purpose, including commercial projects
* Modify it however you like
* Distribute it without asking permission
No attribution is required, but it's always appreciated!},
keywords= {},
terms= {},
license= {Creative Commons Zero (CC0)},
superseded= {}
}

Citation:
nyuuzyou. (2026). Russian Educational Text Collection [Data set]. Academic Torrents. https://academictorrents.com/details/1f6b373346a0fa34de6b4d916984d698e0a623b3

Send Feedback Start
   0.000006
DB Connect
   0.000574
Lookup hash in DB
   0.000479
Get torrent details
   0.000156
Get torrent details, finished
   0.000269
Get authors
   0.000026
Parse bibtex
   0.000226
Write header
   0.000286
get stars
   0.000128
home tab
   0.000288
render right panel
   0.000003
render ads
   0.000603
fetch current hosters
   0.000291
related datasets
   0.002527
Done