Skip to content

Commit 610c219

Browse files
authored
Merge pull request #1242 from bact/docs-dep
Add "docs" dependencies
2 parents 9ad4ca6 + 5945a46 commit 610c219

File tree

15 files changed

+142
-144
lines changed

15 files changed

+142
-144
lines changed

.github/workflows/codemeta2cff.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22
# SPDX-License-Identifier: Apache-2.0
33
# SPDX-FileType: SOURCE
44

5-
name: Generate CITATION.cff from codemeta.json
5+
# Generate CITATION.cff from codemeta.json
6+
7+
name: Generate CITATION.cff
68
run-name: Generate CITATION.cff after ${{github.event_name}} by ${{github.actor}}
79

810
on:

.github/workflows/deploy-docs.yml

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
1-
name: Deploy development documentation
1+
# SPDX-FileCopyrightText: 2026 PyThaiNLP Project
2+
# SPDX-License-Identifier: Apache-2.0
3+
# SPDX-FileType: SOURCE
4+
5+
# Deploy documentation to https://pythainlp.org/docs/
6+
7+
name: Deploy dev docs
28
on:
39
push:
410
branches:
@@ -14,18 +20,17 @@ on:
1420
jobs:
1521
deploy-docs:
1622
name: Build and deploy documentation
17-
runs-on: ubuntu-24.04
23+
runs-on: ubuntu-latest
1824
steps:
1925
- name: Checkout
2026
uses: actions/checkout@v6
2127
- name: Set up Python
2228
uses: actions/setup-python@v6
2329
with:
24-
python-version: "3.10"
30+
python-version: "3.12"
2531
- name: Install build tools and doc build tools
2632
run: |
2733
pip install --upgrade "pip<24.1" "setuptools>=69.0.0,<=73.0.1"
28-
pip install boto smart_open sphinx sphinx-rtd-theme
2934
# pip<24.1 because https://github.com/omry/omegaconf/pull/1195
3035
# setuptools>=65.0.2 because https://github.com/pypa/setuptools/commit/d03da04e024ad4289342077eef6de40013630a44#diff-9ea6e1e3dde6d4a7e08c7c88eceed69ca745d0d2c779f8f85219b22266efff7fR1
3136
# setuptools<=73.0.1 because https://github.com/pypa/setuptools/issues/4620
@@ -35,7 +40,7 @@ jobs:
3540
# run: |
3641
# if [ -f docker_requirements.txt ]; then pip install -r docker_requirements.txt; fi
3742
- name: Install PyThaiNLP
38-
run: pip install .
43+
run: pip install ".[docs]"
3944
- name: Build sphinx documentation
4045
run: |
4146
cd docs && make html

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,11 @@
1414
[![Google Colab Badge](https://badgen.net/badge/Launch%20Quick%20Start%20Guide/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/PyThaiNLP/tutorials/blob/master/source/notebooks/pythainlp_get_started.ipynb)
1515
[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/#thainlp:matrix.org)
1616

17-
PyThaiNLP is a Python package for text processing and linguistic analysis, similar to [NLTK](https://www.nltk.org/) with a focus on Thai language.
17+
PyThaiNLP is a Python package for text processing and linguistic analysis,
18+
similar to [NLTK](https://www.nltk.org/) with a focus on Thai language.
1819

19-
PyThaiNLP เป็นไลบารีภาษาไพทอนสำหรับประมวลผลภาษาธรรมชาติ คล้ายกับ NLTK โดยเน้นภาษาไทย [ดูรายละเอียดภาษาไทยได้ที่ README_TH.MD](https://github.com/PyThaiNLP/pythainlp/blob/dev/README_TH.md)
20+
PyThaiNLP เป็นไลบารีภาษาไพทอนสำหรับประมวลผลภาษาธรรมชาติ คล้ายกับ NLTK โดยเน้นภาษาไทย
21+
[ดูรายละเอียดภาษาไทยได้ที่ README_TH.MD](https://github.com/PyThaiNLP/pythainlp/blob/dev/README_TH.md)
2022

2123
## Quick install
2224

docs/api/generate.rst

Lines changed: 15 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -2,71 +2,44 @@
22

33
pythainlp.generate
44
==================
5-
The :class:`pythainlp.generate` module is a powerful tool for generating Thai text using PyThaiNLP. It includes several classes and functions that enable users to create text based on various language models and n-gram models.
65

7-
Modules
8-
-------
9-
10-
Unigram
11-
~~~~~~~
12-
.. autoclass:: Unigram
13-
:members:
6+
The :mod:`pythainlp.generate` module provides classes and functions for generating Thai text using n-gram and neural language models.
147

15-
The :class:`Unigram` class provides functionality for generating text based on unigram language models. Unigrams are single words or tokens, and this class allows you to create text by selecting words probabilistically based on their frequencies in the training data.
8+
N-gram generators
9+
-----------------
1610

17-
Bigram
18-
~~~~~~
19-
.. autoclass:: Bigram
11+
.. autoclass:: pythainlp.generate.Unigram
2012
:members:
2113

22-
The :class:`Bigram` class is designed for generating text using bigram language models. Bigrams are sequences of two words, and this class enables you to generate text by predicting the next word based on the previous word's probability.
14+
.. autoclass:: pythainlp.generate.Bigram
15+
:members:
2316

24-
Trigram
25-
~~~~~~~
26-
.. autoclass:: Trigram
17+
.. autoclass:: pythainlp.generate.Trigram
2718
:members:
2819

29-
The :class:`Trigram` class extends text generation to trigram language models. Trigrams consist of three consecutive words, and this class facilitates the creation of text by predicting the next word based on the two preceding words' probabilities.
20+
Thai2fit helper
21+
---------------
3022

31-
pythainlp.generate.thai2fit.gen_sentence
32-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3323
.. autofunction:: pythainlp.generate.thai2fit.gen_sentence
3424
:noindex:
3525

36-
The function :func:`pythainlp.generate.thai2fit.gen_sentence` offers a convenient way to generate sentences using the Thai2Vec language model. It takes a seed text as input and generates a coherent sentence based on the provided context.
26+
WangChanLM
27+
----------
3728

38-
pythainlp.generate.wangchanglm.WangChanGLM
39-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4029
.. autoclass:: pythainlp.generate.wangchanglm.WangChanGLM
4130
:members:
4231

43-
The :class:`WangChanGLM` class is a part of the `pythainlp.generate.wangchanglm` module, offering text generation capabilities. It includes methods for creating text using the WangChanGLM language model.
44-
4532
Usage
46-
~~~~~
47-
48-
To use the text generation capabilities provided by the `pythainlp.generate` module, follow these steps:
33+
-----
4934

50-
1. Select the appropriate class or function based on the type of language model you want to use (Unigram, Bigram, Trigram, Thai2Vec, or WangChanGLM).
51-
52-
2. Initialize the selected class or use the function with the necessary parameters.
53-
54-
3. Call the appropriate methods to generate text based on the chosen model.
55-
56-
4. Utilize the generated text for various applications, such as chatbots, content generation, and more.
35+
Choose the generator class or function for the model you want, initialize it with appropriate parameters, and call its generation methods. Generated text can be used for chatbots, content generation, or data augmentation.
5736

5837
Example
59-
~~~~~~~
60-
61-
Here's a simple example of how to generate text using the `Unigram` class:
38+
-------
6239

6340
::
6441
from pythainlp.generate import Unigram
65-
66-
# Initialize the Unigram model
42+
6743
unigram = Unigram()
68-
69-
# Generate a sentence
7044
sentence = unigram.gen_sentence("สวัสดีครับ")
71-
7245
print(sentence)

docs/api/summarize.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,20 @@
22

33
pythainlp.summarize
44
===================
5-
The :class:`summarize` is Thai text summarizer.
5+
The :mod:`pythainlp.summarize` module provides functions for Thai text summarization and keyword extraction.
66

7-
Modules
8-
-------
7+
Functions
8+
---------
99

10-
.. autofunction:: summarize
11-
.. autofunction:: extract_keywords
10+
.. autofunction:: pythainlp.summarize.summarize
11+
.. autofunction:: pythainlp.summarize.extract_keywords
1212

13-
Keyword Extraction Engines
13+
Keyword extraction engines
1414
--------------------------
1515

1616
KeyBERT
17-
+++++++
17+
-------
1818

1919
.. automodule:: pythainlp.summarize.keybert
20-
.. autoclass:: pythainlp.summarize.keybert.KeyBERT
20+
.. autoclass:: pythainlp.summarize.keybert.KeyBERT
2121
:members:

docs/conf.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@
9797
"sphinx.ext.mathjax",
9898
"sphinx.ext.ifconfig",
9999
"sphinx.ext.viewcode",
100+
"sphinx_copybutton",
100101
]
101102

102103
# Add any paths that contain templates here, relative to this directory.
@@ -116,7 +117,7 @@
116117
#
117118
# This is also used if you do content translation via gettext catalogs.
118119
# Usually you set "language" from the command line for these cases.
119-
language = None
120+
language = "en"
120121

121122
# List of patterns, relative to source directory, that match files and
122123
# directories to ignore when looking for source files.

docs/index.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ PyThaiNLP documentation
1010

1111
PyThaiNLP is a Python library for Thai natural language processing (NLP).
1212

13-
Website: `PyThaiNLP.github.io <https://pythainlp.org/>`_
13+
Website: `pythainlp.org <https://pythainlp.org/>`_
1414

1515

1616
.. toctree::
@@ -38,10 +38,10 @@ Indices and tables
3838

3939
Citations
4040
=========
41-
If you use PyThaiNLP in your project or publication, please cite the library as follows
41+
If you use PyThaiNLP in your project or publication, please cite the library as follows:
4242

4343
Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit. 2023. PyThaiNLP: Thai Natural Language Processing in Python. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 25–36, Singapore, Singapore. Empirical Methods in Natural Language Processing.
4444

45-
Apache Software License 2.0
45+
License: Apache License 2.0
4646

4747
Maintained by the PyThaiNLP team.

docs/notes/FAQ.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
FAQ
22
===
33

4-
*Frequently Asked Questions about PyThaiNLP*
4+
*Frequently asked questions about PyThaiNLP*
55

6-
You can read the FAQ at `FAQ | PyThaiNLP GitHub <https://pythainlp.org/FAQ>`_
6+
Read the FAQ at `PyThaiNLP FAQ <https://pythainlp.org/FAQ>`_.

docs/notes/command_line.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
Command Line
1+
Command line
22
============
33

4-
You can use some thainlp functions directly from command line.
4+
You can use some `thainlp` functions directly from the command line.
55

66
**Tokenization**::
77

@@ -24,7 +24,7 @@ You can use some thainlp functions directly from command line.
2424
$ thainlp tokenize sent "หลายปีที่ผ่านมา ชาวชุมชนโคกยาวหลายคนได้พากันย้ายออก บ้างก็เสียชีวิต บางคนถูกจำคุกในข้อบุกรุกป่าหรือแม้กระทั่งสูญหาย"
2525
หลายปีที่ผ่านมา @@ชาวชุมชนโคกยาวหลายคนได้พากันย้ายออก @@บ้างก็เสียชีวิต @@บางคนถูกจำคุกในข้อบุกรุกป่าหรือแม้กระทั่งสูญหาย@@
2626

27-
**Part-Of-Speech tagging**::
27+
**Part-of-speech tagging**::
2828

2929
pythainlp tagg pos [-s SEPARATOR] TEXT
3030

docs/notes/getting_started.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
1-
Getting Started
1+
Getting started
22
===============
33

4-
PyThaiNLP is a Python library for natural language processing (NLP) of Thai language. With this package, you can perform NLP tasks such as text classification and text tokenization.
4+
PyThaiNLP is a Python library for Thai natural language processing (NLP). With this package you can perform common NLP tasks such as text classification and tokenization.
55

6-
**Tokenization Example**::
6+
**Tokenization example**::
77

88
from pythainlp.tokenize import word_tokenize
99

1010
text = "โอเคบ่เรารักภาษาถิ่น"
1111
word_tokenize(text, engine="newmm") # ['โอเค', 'บ่', 'เรา', 'รัก', 'ภาษาถิ่น']
12-
word_tokenize(text, engine="icu") # ['โอ', 'เค', 'บ่', 'เรา', 'รัก', 'ภาษา', 'ถิ่น']
12+
word_tokenize(text, engine="icu") # ['โอ', 'เค', 'บ่', 'เรา', 'รัก', 'ภาษา', 'ถิ่น']
1313

14-
Thai has historically faced a lot of NLP challenges. A quick list of them include as follows:
14+
Thai NLP faces several challenges. A brief list includes:
1515

16-
#. **Start-end of sentence marking** - This is arguably the biggest problem for the field of Thai NLP. The lack of end of sentence marking (EOS) makes it hard for researchers to create training sets, the basis of most research in this field. The root of the problem is two-pronged. In terms of writing system, Thai uses space to indicate both commas and periods. No letter indicates an end of a sentence. In terms of language use, Thais have a habit of starting their sentences with connector terms such as 'because', 'but', 'following', etc, making it often hard even for natives to decide where the end of sentence should be when translating.
16+
#. **Sentence boundary detection** This is one of the biggest challenges in Thai NLP. The lack of explicit end-of-sentence markers makes it difficult to create training sets for many tasks. The issue is twofold: in the writing system, Thai punctuation and spacing do not always indicate sentence endings; in language use, sentences often begin with conjunctions such as 'because' or 'but', which can make sentence boundaries ambiguous even for native speakers.
1717

18-
#. **Word segmentation** - Thai does not use space and word segmentation is not easy. It boils down to understanding the context and ruling out words that do not make sense. This is a similar issue that other Asian languages such as Japanese and Chinese face in different degrees. For languages with space, a similar but less extreme problem would be multi-word expressions, like the French word for potato — 'pomme de terre'. In Thai, the best known example is "ตา-กลม" and "ตาก-ลม". As of recent, new techniques that capture words, subwords, and letters in vectors seem poised to overcome to issue.
18+
#. **Word segmentation** Thai does not use spaces to separate words, so segmentation is challenging. Solving it often requires understanding context to rule out unlikely word breaks. This is similar to issues in other Asian languages such as Japanese and Chinese. Recently, techniques that represent words, subwords, and characters as vectors (embeddings) have improved performance and help address this problem.
1919

20-
Tutorial Notebooks
20+
Tutorial notebooks
2121
==================
2222
- `PyThaiNLP Get Started <https://pythainlp.org/tutorials/notebooks/pythainlp-get-started.html>`_
2323
- `Other tutorials <https://pythainlp.org/tutorials/>`_

0 commit comments

Comments
 (0)