Skip to content

perf(tokenizer): use String.fromCharCode for BMP characters#1680

Open
TrevorBurnham wants to merge 1 commit intoinikulin:masterfrom
TrevorBurnham:perf/use-fromcharcode-for-bmp-characters
Open

perf(tokenizer): use String.fromCharCode for BMP characters#1680
TrevorBurnham wants to merge 1 commit intoinikulin:masterfrom
TrevorBurnham:perf/use-fromcharcode-for-bmp-characters

Conversation

@TrevorBurnham
Copy link

@TrevorBurnham TrevorBurnham commented Jan 24, 2026

This PR optimizes the tokenizer's character emission by using String.fromCharCode() instead of String.fromCodePoint() for BMP characters (code points < 0x10000).

Motivation

String.fromCharCode() is faster than String.fromCodePoint() because it's simpler and doesn't need to handle surrogate pairs. Since the vast majority of characters in HTML content fall within the BMP (Basic Multilingual Plane), this optimization targets the common case while maintaining correctness for characters outside the BMP.

Changes

  • Modified _emitCodePoint() in the tokenizer to use String.fromCharCode() for code points < 0x10000
  • Falls back to String.fromCodePoint() for characters outside the BMP (rare in HTML)

Benchmark Results

Results from running npm run bench-perf (multiple runs averaged):

Benchmark Result
MICRO ~1-5% faster
PAGES ~1-3% faster
HUGE Within noise (~1%)
STREAM Variable (high variance)

Testing

  • All 19,325 existing tests pass
  • No changes to public API or behavior
  • Correctness maintained for all Unicode code points

Use String.fromCharCode() instead of String.fromCodePoint() for BMP
characters (code points < 0x10000) in _emitCodePoint(). This provides
a small performance improvement since fromCharCode is simpler and
doesn't need to handle surrogate pairs.

Characters outside the BMP are rare in HTML content, so the fallback
to fromCodePoint is seldom needed.
Copy link
Collaborator

@43081j 43081j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me 👍

@fb55 can you also give this one a review? ill run the benchmarks myself when im on my other machine too just to confirm they agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants