perf(tokenizer): use String.fromCharCode for BMP characters by TrevorBurnham · Pull Request #1680 · inikulin/parse5

TrevorBurnham · 2026-01-24T21:13:53Z

This PR optimizes the tokenizer's character emission by using String.fromCharCode() instead of String.fromCodePoint() for BMP characters (code points < 0x10000).

Motivation

String.fromCharCode() is faster than String.fromCodePoint() because it's simpler and doesn't need to handle surrogate pairs. Since the vast majority of characters in HTML content fall within the BMP (Basic Multilingual Plane), this optimization targets the common case while maintaining correctness for characters outside the BMP.

Changes

Modified _emitCodePoint() in the tokenizer to use String.fromCharCode() for code points < 0x10000
Falls back to String.fromCodePoint() for characters outside the BMP (rare in HTML)

Benchmark Results

Results from running npm run bench-perf (multiple runs averaged):

Benchmark	Result
MICRO	~1-5% faster
PAGES	~1-3% faster
HUGE	Within noise (~1%)
STREAM	Variable (high variance)

Testing

All 19,325 existing tests pass
No changes to public API or behavior
Correctness maintained for all Unicode code points

Use String.fromCharCode() instead of String.fromCodePoint() for BMP characters (code points < 0x10000) in _emitCodePoint(). This provides a small performance improvement since fromCharCode is simpler and doesn't need to handle surrogate pairs. Characters outside the BMP are rare in HTML content, so the fallback to fromCodePoint is seldom needed.

43081j

makes sense to me 👍

@fb55 can you also give this one a review? ill run the benchmarks myself when im on my other machine too just to confirm they agree

43081j approved these changes Jan 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(tokenizer): use String.fromCharCode for BMP characters#1680

perf(tokenizer): use String.fromCharCode for BMP characters#1680
TrevorBurnham wants to merge 1 commit intoinikulin:masterfrom
TrevorBurnham:perf/use-fromcharcode-for-bmp-characters

TrevorBurnham commented Jan 24, 2026 •

edited

Loading

Uh oh!

43081j left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

TrevorBurnham commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Benchmark Results

Testing

Uh oh!

43081j left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TrevorBurnham commented Jan 24, 2026 •

edited

Loading