Russian Educational Text Collection
nyuuzyou

folder main (3 files)
filedocuments.parquet 136.15MB
filepresentations.parquet 168.06MB
fileREADME.md 1.91kB
Type: Dataset
Tags:

Bibtex:
@article{,
title= {Russian Educational Text Collection},
journal= {},
author= {nyuuzyou},
year= {},
url= {https://huggingface.co/datasets/nyuuzyou/edutexts},
abstract= {# Dataset Card for Russian Educational Text Collection

### Dataset Summary
This dataset contains approximately 1.38M educational texts primarily in Russian with some content in Ukrainian and English. The content is extracted from presentations and documents, including educational presentations, essays, and various academic documents covering diverse topics from natural sciences to literature.

### Languages
- Russian (ru) - primary language
- Ukrainian (uk) - secondary language
- English (en) - secondary language

With Russian being the predominant language in the dataset, while Ukrainian and English content appears less frequently.

## Dataset Structure

### Data Fields
The dataset is split into two parquet files:
- presentations (1,335,171 entries):
  - `title`: Title of the presentation (string)
  - `slide_text`: Array of slide contents (list of strings)

- documents (47,474 entries):
  - `title`: Title of the document (string)
  - `document_text`: Full text content of the document (string)

## Additional Information

### License
This dataset is dedicated to the public domain under the Creative Commons Zero (CC0) license. This means you can:
* Use it for any purpose, including commercial projects
* Modify it however you like
* Distribute it without asking permission
No attribution is required, but it's always appreciated!},
keywords= {},
terms= {},
license= {Creative Commons Zero (CC0)},
superseded= {}
}


Send Feedback Start
   0.000007
DB Connect
   0.000435
Lookup hash in DB
   0.000397
Get torrent details
   0.000131
Get torrent details, finished
   0.000235
Get authors
   0.000040
Parse bibtex
   0.000160
Write header
   0.000377
get stars
   0.000117
home tab
   0.000261
render right panel
   0.000009
render ads
   0.000401
fetch current hosters
   0.000234
related datasets
   0.001898
Done