A simple set of tools to automatically generate HSK worksheets as CSVs, Mochi flash cards and PDFs files.
This application uses the excellent AllSet Learning Chinese Vocabulary Wiki as its default crawler data source. Other crawlers can be implemented as long as they generate the following fields for each scrapped item:
| Name | Type | Description |
|---|---|---|
| id | int | Sequential ID of each item in a given category |
| category | str | Name of the item category |
| chinese | str | Chinese word |
| pinyin | str | Pinyin representation |
| english | str | English translation |
Install all Python dependencies with:
poetry installYou will also need Typst installed and available on PATH to generate PDFs.
All commands listed below should be issued from this project's root folder unless otherwise stated.
To extract HSK 3 vocabulary to a CSV file at ./output/hsk_3.csv, you should run:
scrapy crawl AllSetLearning -a hsk=1 -O ./output/hsk_3.csvTo extract HSK 2 vocabulary to Mochi flashcards at ./output/hsk_2.mochi, you should run:
scrapy crawl AllSetLearning -a hsk=2 -O ./output/hsk_2.mochiTo generate a PDF HSK 1 worksheet at ./output/hsk_1.pdf from a given CSV vocabulary file located at ./output/hsk_1.csv, you should run:
typst compile template/main.typ output/hsk_1.pdf
--root .
--font-path font
--input hsk="1"
--input csv_file_path="../output/hsk_1.csv"The resulting file should look like this:
