A TypeScript word counting library. Count the number of characters, words, sentences, paragraphs, and lines in your text instantly with tally-ts
Note
We use the terms graphemes and characters interchangeably in this README, although technically we are counting Unicode grapheme clusters rather than Unicode characters.
tally-ts is a TypeScript library that uses modern APIs like Intl.Segmenter to count the number of characters,
words, paragraphs, and lines in the input. It can also show breakdowns for different types of characters like letters,
digits, spaces, punctuation, and symbols/special characters.
- 🧮 View text metrics: Count the number of characters, words, sentences, paragraphs, and lines in your text.
- 📊 View character composition: View the number of spaces, digits, letters, punctuation, and symbols/special characters in the input.
- 🌍 Multilingual support: Uses
Intl.Segmenterfor accurate word and character segmentation across many languages and scripts. - 👨🏻💻 Open-source: Know how to code? Help make tally-ts better by contributing to the project on GitHub, or copy it and make your own version!
- 📚 Students & Educators: Check essay lengths and assignment limits quickly and accurately.
- ✍️ Writers & Bloggers: Track writing progress and optimize structure for readability.
- 📄 Legal & Business Professionals: Ensure documents meet required character or word counts.
- 📱 Social Media Managers: Stay within platform limits for tweets, posts, and bios.
- 🧪 Developers & Testers: Analyze input strings and view line counts for code and data.
- 🌐 SEO Specialists: Optimize content length for meta descriptions, headings, and body text.
Tip
JSR has some advantages if you're using TypeScript or Deno:
- It ships typed, modern ESM code by default
- No need for separate type declarations
- Faster, leaner installs without extraneous files
You can use JSR with your favorite package manager.
This package is available on both JSR and npm. Install it using your preferred package manager:
🦕 Deno
deno add jsr:@twocaretcat/tally-ts # JSR (recommended)deno add npm:@twocaretcat/tally-ts # npm🥖 Bun
bunx jsr add @twocaretcat/tally-ts # JSRbun add @twocaretcat/tally-ts # npm🟢 npm
npx jsr add @twocaretcat/tally-ts # JSRnpm install @twocaretcat/tally-ts # npm🟧 pnpm
pnpm i jsr:@twocaretcat/tally-ts # JSRpnpm add @twocaretcat/tally-ts # npm🧶 yarn
yarn add jsr:@twocaretcat/tally-ts # JSRyarn add @twocaretcat/tally-ts # npm🖇 vlt
vlt install jsr:@twocaretcat/tally-ts # JSRvlt install @twocaretcat/tally-ts # npmWarning
Some Caveats:
- This library relies on the
Intl.SegmenterAPI (or a compatible replacement) to split the input into graphemes, words, and sentences. Thus, the exact behavior and reproducibility of output counts depend on the JavaScript runtime used. Results may vary between browsers, Node versions, or polyfills. - There may be slight variations between the counts generated by tally-ts and other libraries due to differences in how they are implemented.
- Languages like Chinese that do not have clearly defined words may have inaccurate word counts due to the segmentation algorithm used. If you need consistent or linguistically precise segmentation for these languages, use a dedicated tool instead. For Chinese, see Jieba, Stanford Segmenter, or pkuseg.
To get started, import the Tally class and create a new instance of it. I recommend setting the locale like so:
import { Tally } from 'tally-ts';
const tally = new Tally({ locales: 'en' });Use individual methods to get counts for sentences and words:
tally.countWords('How are you?');
// → { total: 3 }
tally.countSentences('¿Como estas?');
// → { total: 1 }You can get the number of graphemes (characters) the same way:
tally.countGraphemes('Hello world!');
// → {
// total: 12,
// by: {
// spaces: { total: 1 },
// letters: { total: 10 },
// digits: { total: 0 },
// punctuation: { total: 1 },
// symbols: { total: 0 },
// },
// related: {
// paragraphs: { total: 1 },
// lines: { total: 1 },
// }
// }This method has some extra features. You can access breakdown counts of the graphemes by type:
const result = tally.countGraphemes('Hi there!');
console.debug(result.by);
// → {
// spaces: { total: 1 },
// letters: { total: 7 },
// digits: { total: 0 },
// punctuation: { total: 1 },
// symbols: { total: 0 }
// }As well as related features that were computed at the same time:
console.debug(result.related);
// → {
// paragraphs: { total: 1 },
// lines: { total: 1 }
// }To get all counts at once, use the countAll() method:
const all = tally.countAll(`Hello world!\n\nThis is a test.`);
console.debug(all);
/* →
{
graphemes: {
total: 27,
by: {
spaces: { total: 4 },
letters: { total: 20 },
digits: { total: 0 },
punctuation: { total: 1 },
symbols: { total: 0 },
},
related: {
paragraphs: { total: 2 },
lines: { total: 3 },
}
},
words: { total: 5 },
sentences: { total: 2 },
paragraphs: { total: 2 },
lines: { total: 3 }
}
*/You can pass a locale (or an array of locales) via the locales option. This value is forwarded directly to
Intl.Segmenter and determines how the input string is split into graphemes, words, and sentences:
// Single locale
new Tally({ locales: 'en' });
// Multiple locales (preference order)
new Tally({ locales: ['fr-CA', 'fr'] });If locales is not provided, Intl.Segmenter will resolve the runtime's best locale automatically.
Note
Even if you provide a locale, the resolved locale may be different if Intl.Segmenter doesn't support the one you've
provided. In this case, another locale may be picked automatically.
If you didn't provide a locale, you might want to know which locale was actually used by Intl.Segmenter. You can get
it by like so:
const tally = new Tally();
console.debug(tally.getResolvedLocale());
// → "en-US"If your environment doesn't support Intl.Segmenter (or the exact locale you want to use), you can provide a custom
implementation or polyfill instead:
new Tally({ Segmenter: SomeSegmenter });This is also useful if you want to get consistent results across different runtimes. If you don't provide a segmenter,
we will try to use the native Intl.Segmenter implementation.
Internally, we will call the constructor of Segmenter to create segmenters of different granularities.
Warning
Deprecated: The legacy implementation is no longer maintained and it has limited support for languages other than
English. Use the class-based Tally API instead if possible.
The legacy implementation exposes a single function, getCounts(), that can be used to get the number of characters,
words, sentences, paragraphs, lines, spaces, letters, digits, and symbols at once:
import { getCounts } from 'tally-ts/legacy';
const counts = await getCounts(`Hello world!\n\nThis is a test.`);
console.debug(counts);
/* →
{
characters: 27,
words: 5,
sentences: 2,
paragraphs: 2,
lines: 3,
spaces: 4,
letters: 20,
digits: 0,
symbols: 1
}
*/You can provide an optional locale to improve segmentation accuracy for non-English text:
const counts = await getCounts(`Hello world!\n\nThis is a test.`, 'de-DE');Note that the this only affects the segmentation of characters. If your language doesn't use spaces to separate words or
uses letters outside of the ASCII range, for example, you will still not get accurate results. For multilingual
counting, use the class-based Tally API instead.
Note
In this section, we refer to words, graphemes, spaces, lines, etc. as tokens for simplicity.
Here's some more details about how tally-ts does its magic.
The class-based implementation uses Intl.Segmenter for locale-aware text segmentation at three granularities:
- grapheme with
countGraphemes() - word with
countWords() - sentence with
countSentences()
Each segmenter operates independently, and the results are combined when using countAll().
The counting functions are implemented as single-pass parsers for performance reasons. Each grapheme in the input string
is classified using Unicode General Categories (e.g., \p{L}, \p{Nd}, \p{Zs}), providing accurate results for all
languages and scripts supported by the platform’s ICU data.
Here’s how counts are determined for each token type:
| Count Type | Description |
|---|---|
| grapheme | A user-perceived character as defined by Intl.Segmenter with granularity: "grapheme". Multi-codepoint characters (e.g., emojis, accented letters, combined scripts) are counted as one. Examples: a, é, 😊, 👩🚀, 貓. |
| word | Counted using Intl.Segmenter with granularity: "word". Each segment where isWordLike is true increments the word count. This is locale-aware and works for non-Latin scripts (e.g., Chinese, Arabic). Examples: "Hello world" → 2, "你好世界" → 1. |
| sentence | Counted using Intl.Segmenter with granularity: "sentence". Each non-empty segment increments the sentence count. Works for punctuation and locale rules (e.g., handling ¿ and !). |
| space | A grapheme that matches the Unicode Space Separator category (\p{Zs}). Includes ordinary spaces and non-breaking spaces. Examples: ' ', \u00A0. |
| letter | A grapheme in the Unicode Letter category (\p{L}). Includes characters from all alphabets. Examples: A, ß, д, あ, م. |
| digit | A grapheme in the Unicode Decimal Digit category (\p{Nd}). Works across scripts (e.g., Arabic-Indic, Devanagari). Examples: 0, ९, ٢. |
| punctuation | A grapheme in the Unicode Punctuation category (\p{P}). Examples: ., ,, !, ¿, “”. |
| symbol | A grapheme in the Unicode Symbol category (\p{S}). Includes math, currency, emoji, and miscellaneous symbols. Examples: +, $, ©, 🔥, ™. |
| line | Determined by newline graphemes ('\n'). Each newline increments the line count. A final line is counted even if the text doesn’t end with a newline, unless the input is empty, in which case the line count is 0. |
| paragraph | A non-empty, non-newline string, separated from other paragraphs by one or more newline characters. A trailing paragraph is counted even if the text doesn’t end with a newline, unless the input is empty, in which case the paragraph count is 0. Example: "Hello\n\nWorld" → 2 paragraphs. |
The legacy implementation exposes a single function, getCounts(), that can be used to get the number of characters,
words, sentences, paragraphs, lines, spaces, letters, digits, and symbols at once.
The counting function is implemented as a single-pass parser for performance reasons. State transitions (sentence terminator → letter, letter → space, etc.) are used to determine when to increment the counts for each token type.
The following characters are used to separate tokens:
- Space:
' ' - Newline:
\n - End Mark:
.,!,?
End of Input can also be considered a separator because words, sentences, paragraphs, and lines at the end of the
input are counted even if not specifically terminated. For example, Something is counted as a word, sentence,
paragraph, and line.
Here is an overview of how we determine the counts for each token type:
| Count Type | Description |
|---|---|
| character | A Unicode grapheme cluster (user-perceived character), as determined by Intl.Segmenter. Using this method, Emojis and other multi-codepoint characters are counted as a single character. Examples: a, 2, !, 🔥, 貓 |
| word | A contiguous sequence of one or more letters or digits followed by a space, end mark, or newline. Symbols by themselves are not considered words. Examples: space, Whoa!, newline\n, 42. |
| sentence | A contiguous sequence of one or more words followed by an end mark. Example: Hello, world!, 20 93.. |
| paragraph | A contiguous sequence of one or more sentences followed by a newline. Examples: The quick brown cat jumps over the lazy dog\n, Hello world! Bye world!\n, 42\n. |
| space | A literal space character (' '). Other whitespace (ex. tabs, newlines) are not included. |
| letter | A character in the ASCII ranges A–Z or a–z. Examples: A, j, z. |
| digit | A character in the ASCII range 0-9. Examples: 0, 5, 9. |
| symbol | A non-letter, non-digit, non-space, non-newline character. This includes emojis, symbols, punctuation, and most whitespace. Examples: ,, %, #, 😊, 貓, \t. |
| line | A literal newline character (\n). |
Need help? See the support resources for information on how to:
- request features
- report bugs
- ask questions
- report security vulnerabilities
Want to help out? Pull requests are welcome for:
- feature implementations
- bug fixes
- translations
- documentation
- tests
See the contribution guide for more details.
Copyright © 2025 John Goodliff (@twocaretcat).
This project is licensed under the MIT license. See the license for more details.
Other projects you might like:
- 👤 Tally Chrome Extension: A Chrome extension to easily count the number of words, characters, and paragraphs on any site
Notable projects that depend on this one:
- 👤 Tally: A free online tool to count the number of characters, words, paragraphs, and lines in your text. Tally uses this library to compute counts
Similar projects you might want to use instead:
- 🌐 Alfaaz: An alternative multilingual word counting library with less features, but faster execution
Find this project useful? Sponsoring me will help me cover costs and commit more time to open-source.
If you can't donate but still want to contribute, don't worry. There are many other ways to help out, like:
- 📢 reporting (submitting feature requests & bug reports)
- 👨💻 coding (implementing features & fixing bugs)
- 📝 writing (documenting & translating)
- 💬 spreading the word
- ⭐ starring the project
I appreciate the support!