Version 1.0 | December 2025
Github: @profdilley | Created by Prof LC Dilley, PhD with assistance from Claude Opus 4.5 [v. Desktop, Thinking Mode] and Claude Code, & Perplexity β 2025-12-02
THOTH is a powerful, privacy-first translation tool designed for professionals who need to translate CSV data containing multiple languagesβcompletely offline, with no data ever leaving your machine.
Named in honour of the Egyptian god of writing, language, and knowledge, THOTH brings enterprise-grade translation capabilities and dual state-of-the-art translation engines to your environment for local compute.
| Feature | Description |
|---|---|
| 100% Offline | All translation happens locally. No cloud services, no API calls, no data transmission. |
| 38 Languages | Full support for Russian, Ukrainian, Baltic, Balkan, European, Nordic, East Asian, and Middle Eastern languages. |
| Dual Translation Engines | Two industry-leading engines for maximum quality and coverage. |
| Smart Language Detection | Automatic per-cell language detectionβno manual configuration required. |
| Mixed-Language Support | A single column can contain text in multiple languages; THOTH handles each cell independently. |
| Adjacent Column Output | Translated columns appear immediately next to their source columns for easy comparison. |
Use THOTH with a Graphical User Interface (GUI) or Command Line Interface (CLI).
| Mode | Command | Best For |
|---|---|---|
| GUI | python thoth.py --gui |
Visual column selection, previewing translations |
| CLI | python thoth.py input.csv |
Scripting, automation, batch processing |
Both modes offer the same translation quality β choose based on your workflow.
THOTH provides two translation engines, giving you flexibility and redundancy:
- Model: Meta's No Language Left Behind (NLLB-200-distilled-600M)
- Coverage: 200 languages
- Strengths: State-of-the-art neural translation, excellent for low-resource languages (Baltic, Balkan), superior handling of Cyrillic scripts
- Size: ~2.5 GB
- Model: Open-source neural machine translation
- Coverage: 38 language pairs to English
- Strengths: Lightweight, fast inference, strong Western European performance
- Size: ~1.5 GB (all language packs)
Switch between engines anytime:
python thoth.py input.csv --engine nllb # Default
python thoth.py input.csv --engine argos # Alternative| Engine | Best For | Benchmark Result | Speed |
|---|---|---|---|
| NLLB (default) | All languages, especially Ukrainian, Japanese, Korean, Arabic, Spanish, German, and all Baltic/Balkan | chrF 61.15 (mean) | Baseline |
| Argos | Russian, French, Polish, Czech, Greek, Chinese when speed matters | chrF 56.77 (mean) | 14Γ faster |
Critical: Argos does not support Lithuanian, Latvian, Estonian, Serbian, Bulgarian, or Norwegian. Use NLLB for these languages.
Russian, Ukrainian, Belarusian, Polish, Czech, Slovak, Bulgarian, Serbian, Croatian, Bosnian, Slovenian, Macedonian
Lithuanian, Latvian, Estonian
Swedish, Norwegian, Danish, Finnish, Icelandic
German, French, Spanish, Portuguese, Italian, Dutch, Greek, Romanian, Hungarian
Mandarin Chinese, Cantonese, Japanese, Korean
Arabic, Hebrew, Turkish, Persian (Farsi)
Albanian, and more...
| Language | NLLB Code | Argos Code |
|---|---|---|
| Arabic | ara_Arab | ar |
| Bulgarian | bul_Cyrl | bg |
| Chinese (Simplified) | zho_Hans | zh |
| Croatian | hrv_Latn | hr |
| Czech | ces_Latn | cs |
| Danish | dan_Latn | da |
| Dutch | nld_Latn | nl |
| English | eng_Latn | en |
| Estonian | est_Latn | et |
| Finnish | fin_Latn | fi |
| French | fra_Latn | fr |
| German | deu_Latn | de |
| Greek | ell_Grek | el |
| Hebrew | heb_Hebr | he |
| Hungarian | hun_Latn | hu |
| Icelandic | isl_Latn | β |
| Italian | ita_Latn | it |
| Japanese | jpn_Jpan | ja |
| Korean | kor_Hang | ko |
| Latvian | lvs_Latn | lv |
| Lithuanian | lit_Latn | lt |
| Norwegian | nob_Latn | nb |
| Persian | pes_Arab | fa |
| Polish | pol_Latn | pl |
| Portuguese | por_Latn | pt |
| Romanian | ron_Latn | ro |
| Russian | rus_Cyrl | ru |
| Serbian | srp_Cyrl | sr |
| Slovak | slk_Latn | sk |
| Slovenian | slv_Latn | sl |
| Spanish | spa_Latn | es |
| Swedish | swe_Latn | sv |
| Turkish | tur_Latn | tr |
| Ukrainian | ukr_Cyrl | uk |
| ... |
cd thoth-translator
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtpython -m translator.setup --download-modelsWhat gets downloaded:
| Component | Size | Purpose |
|---|---|---|
| fastText LID218 | ~130 MB | Language detection model |
| NLLB-200-distilled-600M | ~2.5 GB | Primary translation engine |
| Argos language packs | ~1.5 GB | Secondary translation engine |
Total download: approximately 4 GB. Requires internet connection for this step only.
python thoth.py your_file.csv --columns "column1,column2,column3"Output: your_file_translated.csv with _en columns adjacent to originals.
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
python thoth.py data.csvpython thoth.py data.csv --columns "description,notes,comments"python thoth.py data.csv --engine argospython thoth.py data.csv --force-lang rus_Cyrlpython thoth.py data.csv --output translated_data.csvTranslate multiple files at once:
python thoth.py --batch --columns "description,notes"python thoth.py --batch-dir /path/to/data --columns "description,notes"python thoth.py --batch-recursive --columns "description,notes"python thoth.py --batch-dir ./data --columns "description" --target-lang fra_LatnNotes:
- Supports
.csv,.xlsx, and.xlsfiles - Files with
_translatedin the name are automatically skipped - Each output file is saved alongside its source with
_translatedsuffix
Unlike traditional translation tools that require one language per column, THOTH analyzes each cell independently.
Example input:
| id | comment |
|---|---|
| 1 | ΠΡΠ»ΠΈΡΠ½ΡΠΉ ΠΏΡΠΎΠ΄ΡΠΊΡ! |
| 2 | Π§ΡΠ΄ΠΎΠ²ΠΈΠΉ ΡΠ΅ΡΠ²ΡΡ |
| 3 | Εwietna obsΕuga |
| 4 | η΄ ζ΄γγγεθ³ͺ |
THOTH output:
| id | comment | comment_en |
|---|---|---|
| 1 | ΠΡΠ»ΠΈΡΠ½ΡΠΉ ΠΏΡΠΎΠ΄ΡΠΊΡ! | Excellent product! |
| 2 | Π§ΡΠ΄ΠΎΠ²ΠΈΠΉ ΡΠ΅ΡΠ²ΡΡ | Excellent service |
| 3 | Εwietna obsΕuga | Great service |
| 4 | η΄ ζ΄γγγεθ³ͺ | Excellent quality |
Russian, Ukrainian, Polish, and Japaneseβall handled automatically in a single column.
Translated columns are inserted immediately after their source columns:
β
THOTH output:
id | description | description_en | notes | notes_en | country
β Other tools:
id | description | notes | country | description_en | notes_en
| Format | Extension | Notes |
|---|---|---|
| CSV | .csv |
UTF-8, UTF-8-BOM, Latin-1, CP1252 auto-detected |
| Excel | .xlsx |
Full support |
| Excel (Legacy) | .xls |
Full support |
For advanced users, THOTH supports YAML configuration:
# config.yaml
translation:
default_engine: nllb
target_language: eng_Latn
detection:
confidence_threshold: 0.7
fallback_language: eng_Latn
column_defaults:
skip_numeric: true
skip_dates: true
skip_english: true
skip_empty: true
auto_select_foreign_text: trueRun the test suite to confirm everything is working:
python thoth.py --testExpected output: 21 passed
- Zero network transmission: After model download, THOTH never connects to the internet
- No telemetry: No usage data, analytics, or logging to external services
- Local processing: All translation happens on your CPU/GPU
- Open source: Full source code included for audit
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.10+ | 3.11+ |
| RAM | 8 GB | 16 GB |
| Disk Space | 6 GB | 10 GB |
| OS | macOS, Linux, Windows | Apple Silicon optimized |
The GUI requires tkinter.
Option #1: Install it:
- macOS:
brew install python-tk@3.13(match your Python version) - Ubuntu/Debian:
sudo apt-get install python3-tk - Windows: Reinstall Python and check "tcl/tk" option
Option #2: Use CLI mode instead:
python thoth.py input.csv --columns "col1,col2"Cancel (Ctrl+C) and restart:
python -m translator.setup --download-modelsCompleted downloads are cached and won't re-download.
pip install "numpy<2.0"Reduce batch size in config.yaml:
performance:
batch_size: 8 # Default is 16Tested on Apple M3, 16 GB RAM
| Dataset Size | Columns | Time (NLLB) | Time (Argos) |
|---|---|---|---|
| 1,000 rows | 2 columns | ~30 sec | ~20 sec |
| 10,000 rows | 2 columns | ~5 min | ~3 min |
| 10,000 rows | 10 columns | ~25 min | ~15 min |
THOTH's translation engines were rigorously evaluated against the FLORES+ benchmark datasetβthe gold standard for multilingual translation evaluation used by Meta to benchmark NLLB-200.
| Metric | NLLB-200 | Argos Translate |
|---|---|---|
| Languages Evaluated | 18 | 12 |
| Mean chrF Score | 61.15 | 56.77 |
| Mean BLEU Score | 34.95 | 28.81 |
| Translation Speed | Baseline | 14Γ faster |
chrF (character n-gram F-score) is the primary metricβhigher is better. Tested on 200 sentences per language from FLORES+ devtest split.
| Language | NLLB chrF | Argos chrF | Recommended Engine |
|---|---|---|---|
| Russian | 60.3 | 62.5 | Either (Argos slightly better) |
| Ukrainian | 62.1 | 50.7 | NLLB (+11.4) |
| Polish | 56.6 | 56.0 | Either (similar quality) |
| Czech | 63.9 | 62.9 | Either (similar quality) |
| Bulgarian | 66.1 | N/A | NLLB (Argos unavailable) |
| Serbian | 65.6 | N/A | NLLB (Argos unavailable) |
| Language | NLLB chrF | Argos chrF | Recommended Engine |
|---|---|---|---|
| Lithuanian | 57.6 | N/A | NLLB (Argos unavailable) |
| Latvian | 58.7 | N/A | NLLB (Argos unavailable) |
| Estonian | 61.9 | N/A | NLLB (Argos unavailable) |
| Language | NLLB chrF | Argos chrF | Recommended Engine |
|---|---|---|---|
| Chinese (Simplified) | 56.1 | 54.6 | Either (similar quality) |
| Japanese | 53.0 | 46.4 | NLLB (+6.6) |
| Korean | 55.4 | 45.6 | NLLB (+9.8) |
| Language | NLLB chrF | Argos chrF | Recommended Engine |
|---|---|---|---|
| French | 69.4 | 69.2 | Either (similar quality) |
| German | 67.0 | 64.2 | NLLB (+2.8) |
| Spanish | 59.9 | 53.9 | NLLB (+6.0) |
| Arabic | 63.8 | 57.2 | NLLB (+6.6) |
| Greek | 59.5 | 58.1 | Either (similar quality) |
| Norwegian | 63.9 | N/A | NLLB (Argos unavailable) |
- NLLB is the recommended default engine β Higher quality across 17 of 18 languages tested
- Argos coverage gaps β Does not support Baltic (LT, LV, ET), most Balkan (SR, BG), or Norwegian
- Argos wins only for Russian β Slight advantage (+2.2 chrF)
- Speed vs Quality trade-off β Argos is 14Γ faster; acceptable for FR, PL, CZ, EL, ZH when speed matters
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WHICH ENGINE SHOULD I USE? β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Is your source language Baltic, Balkan, or Norwegian? β
β YES β Use NLLB (Argos doesn't support these) β
β NO β β
β β
β Is your source language Japanese, Korean, Arabic, or Spanish? β
β YES β Use NLLB (significantly better quality) β
β NO β β
β β
β Is your source language Ukrainian or German? β
β YES β Use NLLB (better quality) β
β NO β β
β β
β Is speed critical and source is FR/PL/CZ/EL/ZH/RU? β
β YES β Argos acceptable (14Γ faster, similar quality) β
β NO β Use NLLB (default, best overall) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
---
## β¨ π License
Open source for private use.
---
## β¨ π Acknowledgments
β¨β¨ π§ **THOTH Language Translator** β¨ is built on the shoulders of giants:
- **Meta AI** β NLLB-200 translation model
- **Argos Open Tech** β Argos Translate
- **Facebook Research** β fastText language identification
- **Hugging Face** β Transformers library
- **Perplexity.AI** β Tech SOTA Consulting
- **Prof George Lakoff** β Language Cognitive Neuroscientist & Influencer on [@profdilley](https://github.com/profdilley)
- **Prof Claude Shannon** β Father of Information Theory & Influencer on [@profdilley](https://github.com/profdilley)
- **Claude by Anthropic** β AI Collaborator with [@profdilley](https://github.com/profdilley) on β¨β¨ π§ **THOTH Language Translator** β¨
---
*Built with care* by *and* for β¨β¨***professionals***β¨β¨ *who value* β¨***privacy***β¨ and β¨***precision***.β¨
*Contact* Author ****Prof LC Dilley, PhD**** *on Github*: [@profdilley](https://github.com/profdilley)
β¨β¨ π§ ****THOTH Language Translator****β¨ | β¨"*Your words, your machine, your control*."β¨