Neyshekar

Neyshekar is an open, community-driven Persian speech dataset collected via a web-based crowdsourcing platform at https://ney.shekar.io. It is designed to support research and development in text-to-speech (TTS), automatic speech recognition (ASR), speech representation learning, and other downstream Persian speech applications.

The recordings are provided by a combination of volunteer contributors and paid voice actors, all of whom are native Persian speakers. Each release represents a stable snapshot of the dataset, enabling reproducible research and consistent benchmarking.

Dataset Releases

Neyshekar is released incrementally. Each release represents a stable snapshot of the dataset at the time of publication.

v2 — 2026-01-15 (download)

Total samples: 20,020
Total duration (hours): 29.08
Average clip duration (seconds): 5.23
Total tokens: 208,472
Vocab size: 20,853

v1 — 2025-12-29

Total samples: 10,044
Total duration: 14.42 hours
Average clip duration: 5.17 seconds
Total tokens: 103,757
Vocabulary size: 15,224

License

This dataset is released under the CC0 1.0 Universal license.
It may be used, modified, and redistributed for any purpose without restriction.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
code		code
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neyshekar

Dataset Releases

v2 — 2026-01-15 (download)

v1 — 2025-12-29

License

About

Uh oh!

Languages

License

amirivojdan/neyshekar

Folders and files

Latest commit

History

Repository files navigation

Neyshekar

Dataset Releases

v2 — 2026-01-15 (download)

v1 — 2025-12-29

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages