Add exclude_words parameter to translation methods#1216
Add exclude_words parameter to translation methods#1216
Conversation
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
|
@copilot move test to tests/extra |
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
|
@copilot There are changes in dev branch to fix the errors occured in Python 3.13. Merge the changes and run the tests again. |
…ilot/exclude-words-in-translation
|
@copilot The dev branch is updated. Some code were refactored. Please resync with it. The test suite got refactored and now we have 4-tier of test cases located in four test suites: core, compact, extra, noauto - depends on their dependency requirements. See the test suite refactoring and naming conventions in this merged PR |
Co-authored-by: bact <128572+bact@users.noreply.github.com>
|
@copilot the dev branch is updated. with fixes that address ruff reports. please resync and review all the code again. |
Resynced with dev branch and completed comprehensive review. Everything is in good shape: Verification Results: Code Review Summary:
Files verified:
Code behavior is unchanged - all improvements are quality enhancements (type hints, linting compliance) that don't affect functionality. |
|
@copilot I noticed earlier a few copy & paste mistakes. Like having "zhth" functions/variables in th_fr file. Please recheck of any possible remaining mistakes like that. |
Reviewed all translator files for copy-paste mistakes. No issues found. Verification performed:
✅ No cross-contamination found:
✅ Language references are correct:
✅ Docstring examples use correct variable names matching their respective files ✅ Import statements are consistent and correct across all files ✅ All 17 unit tests pass The code is clean with no copy-paste mistakes remaining. |
|
@copilot the dev branch is updated, with numerous type annotations added. please resync and complete type annotations for changes that being made in this PR, then review the code again. there can be cases of unused imports or missing ones, or unused type-related comments fix that. |
…ator classes Co-authored-by: bact <128572+bact@users.noreply.github.com>
Done. Resynced with dev branch and completed type annotation review in commit 97704b2. Changes made:
Verification: |
|
@copilot the dev branch is updated, with numbers of type annotation fixes. Resync then recheck to avoid any type annotation reassignments/redundancy from this PR. |
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Done. Resynced with dev branch and removed redundant type annotations in commit e1a26a8. Changes made:
Verification: |
|
|
@wannaphong merge conflicts resolved. |



What do these changes do
Adds optional
exclude_wordsparameter to all translation methods, allowing users to preserve specific words (proper nouns, technical terms, brand names) in their original form during translation.What was wrong
The translation module had no mechanism to exclude specific words from translation. Users needed to preserve certain terms like brand names, technical jargon, or proper nouns but had to manually post-process translations.
How this fixes it
Core Implementation:
_prepare_text_with_exclusions()and_restore_excluded_words()helper functions incore.py<<<PYTHAINLP_EXCLUDE_N>>>) to mark excluded words before translationAPI Changes:
Translate.translate()withexclude_words: Optional[list[str]] = NoneparameterEnThTranslator,ThEnTranslator,ThZhTranslator,ZhThTranslator,ThFrTranslator,Small100TranslatorType Annotations:
Optional[list[str]]for type annotations to maintain consistency with project-wide conventionsTYPE_CHECKINGimports with forward references for better type hint supportUniontype annotations and return type annotations where neededtype: ignorecomments for known type checking limitations.cuda()calls inThEnTranslatorandZhThTranslatorinitializationTesting:
tests/extra/testx_translate_helpers.pywith 17 unit tests covering all edge casesTestCaseXnaming convention for extra tests (TranslateHelpersTestCaseX)Documentation:
_prepare_text_with_exclusions()to accurately describe behavior for text with and without spacesexclude_wordsparameterCode Quality:
Anyimports from translator modulesAnyimport totokenization_small100.pyCompatibility:
Example:
Your checklist for this pull request
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.