support DefineTokens with AloneTokenOption and IgnoreCaseTokenOption#38
support DefineTokens with AloneTokenOption and IgnoreCaseTokenOption#38einsitang wants to merge 4 commits intobzick:masterfrom
DefineTokens with AloneTokenOption and IgnoreCaseTokenOption#38Conversation
refactor: rename ignoreCaseAlphabet to upperCaseAlphabet for clarity
|
It doesn't work with unicode, but unicode one of the main feature. |
you mean IgnoreCase with unicode not work? I checked the encoding table. Both unicode and ascii have lowercase letters, and the difference is 32 |
How is the progress of the Unicode feature currently? |
|
I mean: |
|
The problem you mentioned does indeed exist. however, when the |
|
There is a new implementation method.
tokenizer.DefineTokens(HelloKey,[]string{"hello","哈喽"}, IgnoreCaseTokenOption) // panic , because "哈喽" is not alphabet |
This is a Breaking API change though (see: https://go.dev/blog/module-compatibility#adding-to-a-function). You should add a new function like |
AloneTokenOptionAfter defining the token, if there are consecutive letters after the token value, it will be split independently.
after
AloneTokenOptionOnly supports independent match
helloIgnoreCaseTokenOptionmake token value case-insensitive match (#12 )
use example: