Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ See PR for prompt and details.
- Make package zip-safe #1212
- Ensure thread-safety for tokenizers #1213
- Add Thai-NNER integration with top-level entity filtering #1221
- Reorganize noauto test suite by dependency groups (torch, tensorflow, onnx, cython, network) #935
- Improved documentation; code cleanup; more tests

## Version 5.1.2 -> 5.2.0
Expand Down
41 changes: 41 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,47 @@ extra = [
"tltk>=1.10",
]

# Noauto test dependencies - for tests.noauto-* modules
# These are grouped by dependency framework to avoid conflicts

# PyTorch-based dependencies - for tests.noauto-torch
noauto-torch = [
"attacut>=1.0.6",
"numpy>=1.26.0",
"sentencepiece>=0.1.91",
"thai-nner>=0.3",
"tltk>=1.10",
"torch>=1.13.1",
"transformers>=4.22.1",
"wtpsplit>=1.0.1",
]

# TensorFlow-based dependencies - for tests.noauto-tensorflow
noauto-tensorflow = [
"deepcut>=0.7.0",
"numpy>=1.26.0",
]

# ONNX Runtime-based dependencies - for tests.noauto-onnx
noauto-onnx = [
"numpy>=1.26.0",
"onnxruntime>=1.10.0",
"oskut>=1.3",
"sefr_cut>=1.1",
]

# Cython-based dependencies - for tests.noauto-cython
noauto-cython = [
"phunspell>=0.1.6",
]

# Network-dependent tests - for tests.noauto-network
# These tests require network access but minimal dependencies
noauto-network = [
"huggingface-hub>=0.16.0",
]


# Full dependencies - pinned where available
full = [
"attacut==1.0.6",
Expand Down
88 changes: 80 additions & 8 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,16 +74,88 @@ The CI/CD test workflow is at

## Noauto tests (testn_*.py)

- These dependencies might include huge libraries like `tensorflow`.
- Due to dependency complexities, these functionalities may not be tested
in the CI/CD pipeline.
- In the future, we might create a separate
step or workflow to run this test suite.
It will be triggered manually.
We may also need to group test cases by
a non-conflicting set of dependencies.
The noauto (no-automated) test suite contains tests for functionalities
that require heavy dependencies which are not feasible to run in automated
CI/CD pipelines. These tests are organized into specialized suites based
on their dependency requirements.

### Why separate noauto test suites?

Different ML/AI frameworks often have conflicting version requirements for
their dependencies. For example:
- PyTorch and TensorFlow may require different versions of numpy or protobuf
- Large frameworks take significant time to install (~1-3 GB each)
- Some packages require Cython compilation or system libraries

By separating tests by dependency group, we can:
- Test each framework independently without conflicts
- Optimize CI/CD resources by running only relevant test groups
- Make it easier for developers to test specific functionality

### Noauto test suites

#### Umbrella suite: tests.noauto

- Run `unittest tests.noauto`
- Includes all noauto test suites (legacy and new modular suites)
- Use this for comprehensive testing when all dependencies are available
- Test case class suffix: `TestCaseN`

#### Modular suites by dependency:

**PyTorch-based: tests.noauto_torch**

- Run `unittest tests.noauto_torch`
- Need dependencies from `pip install "pythainlp[noauto_torch]"`
- Tests requiring PyTorch and its ecosystem:
- torch, transformers (PyTorch backend), sentence-transformers
- attacut, thai-nner, wtpsplit, tltk
- Tests: spell correction (wanchanberta), NER/POS tagging (transformers-based),
tokenization (attacut), subword tokenization (phayathai, wangchanberta),
sentence tokenization (wtp)
- Dependencies: ~2-3 GB
- Test case class suffix: `TestCaseN`

**TensorFlow-based: tests.noauto_tensorflow**

- Run `unittest tests.noauto_tensorflow`
- Need dependencies from `pip install "pythainlp[noauto_tensorflow]"`
- Tests requiring TensorFlow:
- deepcut tokenizer
- Dependencies: ~1-2 GB
- Note: May conflict with PyTorch dependencies
- Test case class suffix: `TestCaseN`

**ONNX Runtime-based: tests.noauto_onnx**

- Run `unittest tests.noauto_onnx`
- Need dependencies from `pip install "pythainlp[noauto_onnx]"`
- Tests requiring ONNX Runtime:
- oskut, sefr_cut tokenizers
- Dependencies: ~200-500 MB
- Test case class suffix: `TestCaseN`

**Cython-compiled: tests.noauto_cython**

- Run `unittest tests.noauto_cython`
- Need dependencies from `pip install "pythainlp[noauto_cython]"`
- Tests requiring Cython-compiled packages:
- phunspell spell checker
- Requires: Cython, C compiler, system libraries (hunspell)
- Platform-specific build requirements
- Test case class suffix: `TestCaseN`

**Network-dependent: tests.noauto_network**

- Run `unittest tests.noauto_network`
- Need dependencies from `pip install "pythainlp[noauto_network]"`
- Tests requiring network access:
- HuggingFace Hub model downloads
- External API calls
- Requires: Internet connection, may involve large downloads
- Test case class suffix: `TestCaseN`


## Robustness tests (test_robustness.py)

A comprehensive test suite within core tests that tests edge cases important
Expand Down
26 changes: 23 additions & 3 deletions tests/noauto/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,28 @@

These tests are NOT run in automated CI workflows but are kept for
manual testing and future re-enabling when dependencies improve.

This test suite serves as an umbrella that includes all specialized
noauto test suites:
- noauto_torch: PyTorch and transformers-based tests
- noauto_tensorflow: TensorFlow-based tests
- noauto_onnx: ONNX Runtime-based tests
- noauto_cython: Cython-compiled package tests
- noauto_network: Network-dependent tests

For targeted testing, use the specific test suites instead of this umbrella.
"""

from unittest import TestLoader, TestSuite

# Names of module to be tested
# Note: These tests are NOT included in automated CI runs
test_packages: list[str] = [
"tests.noauto.testn_spell",
"tests.noauto.testn_tag",
"tests.noauto.testn_tokenize",
"tests.noauto_torch",
"tests.noauto_tensorflow",
"tests.noauto_onnx",
"tests.noauto_cython",
"tests.noauto_network",
]


Expand All @@ -29,6 +41,14 @@ def load_tests(
) -> TestSuite:
"""Load test protocol
See: https://docs.python.org/3/library/unittest.html#id1

This loads all modular test suites.
For targeted testing, use specific test suites directly:
- unittest tests.noauto_torch
- unittest tests.noauto_tensorflow
- unittest tests.noauto_onnx
- unittest tests.noauto_cython
- unittest tests.noauto_network
"""
suite = TestSuite()
for test_package in test_packages:
Expand Down
184 changes: 0 additions & 184 deletions tests/noauto/testn_tokenize.py

This file was deleted.

Loading
Loading