Skip to content

amirivojdan/neyshekar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neyshekar

Neyshekar is an open, community-driven Persian speech dataset collected via a web-based crowdsourcing platform at https://ney.shekar.io. It is designed to support research and development in text-to-speech (TTS), automatic speech recognition (ASR), speech representation learning, and other downstream Persian speech applications.

The recordings are provided by a combination of volunteer contributors and paid voice actors, all of whom are native Persian speakers. Each release represents a stable snapshot of the dataset, enabling reproducible research and consistent benchmarking.

Dataset Releases

Neyshekar is released incrementally. Each release represents a stable snapshot of the dataset at the time of publication.

v2 — 2026-01-15 (download)

  • Total samples: 20,020
  • Total duration (hours): 29.08
  • Average clip duration (seconds): 5.23
  • Total tokens: 208,472
  • Vocab size: 20,853

v1 — 2025-12-29

  • Total samples: 10,044
  • Total duration: 14.42 hours
  • Average clip duration: 5.17 seconds
  • Total tokens: 103,757
  • Vocabulary size: 15,224

License

This dataset is released under the CC0 1.0 Universal license.
It may be used, modified, and redistributed for any purpose without restriction.