Extraction and normalization of an (initially) Attic Prose tagged corpus for the Greek Learner Texts Project.
tagged-texts/contains the extracted files (with minor manual corrections)scripts/gather.pydid the initial extraction.counts.tsvgives current token counts.scripts/stats.pyproduced those counts.base-texts/contains the chunked base texts.scripts/extract_base.pyproduced those chunked base texts.tokenized-texts/contains tokenized base texts.scripts/tokens.pyproduced those tokenized base texts.aligned-tagging/contains initial alignment of different taggings of each text.scripts/align.pyproduced those alignments,
- Thucydides (
0003)001(Books 1–3) - Isocrates (
0010)007 008 009 011 019 021 - Demosthenes (
0014)001 004 005 006 018 020 021 - Xenophon (
0032) Anabasis (006) - Plato (
0059) Euthyphro (001) Apology (002) Crito (003) Symposium (011) Republic (030) - Lysias (
0540)001 002 003 004 005 006 007 008 009 010 012 013 014 015 016 017 018 019 020 022 023 025 026 032 033