feat(js-api): add spanInBytesToSpanInCodeUnits helper function by ash1day · Pull Request #8944 · biomejs/biome

ash1day · 2026-02-02T06:40:33Z

This PR was developed with assistance from Claude Code and Codex.

Summary

Biome diagnostics return spans as UTF-8 byte offsets, but JavaScript strings use UTF-16 code units. This causes string.slice() to extract incorrect text when content contains non-ASCII characters.

This PR adds a spanInBytesToSpanInCodeUnits helper function (ported from the playground, as suggested by @siketyan) to correctly convert byte spans to code unit spans.

Test Plan

Added 5 tests in tests/spanConversion.test.ts:

Non-ASCII characters (ç)
ASCII-only content
Emoji (surrogate pairs)
Mixed multi-byte characters
Unpaired surrogates

Docs

n/a

Add a utility function to convert byte-based spans from Biome diagnostics to UTF-16 code unit spans for correct use with JavaScript string.slice(). Biome internally uses UTF-8 byte offsets for spans, but JavaScript strings use UTF-16 code units. This causes incorrect text extraction when the content contains non-ASCII characters. The new function allows users to correctly extract text from diagnostics: ```ts const [start, end] = spanInBytesToSpanInCodeUnits( diagnostic.location.span, content ); const text = content.slice(start, end); // Correct! ```

changeset-bot · 2026-02-02T06:40:38Z

🦋 Changeset detected

Latest commit: c53e4e2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@biomejs/js-api	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-02-03T12:46:28Z

Walkthrough

A new helper function spanInBytesToSpanInCodeUnits has been added to the @biomejs/js-api package. This function converts byte-based span positions (utilised internally by Biome diagnostics) to UTF-16 code unit spans compatible with JavaScript string operations. The implementation includes utilities for detecting surrogate pairs and calculating UTF-8 byte lengths. Supporting test coverage validates the conversion across various content types including non-ASCII characters, emoji, and mixed multi-byte sequences.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding the spanInBytesToSpanInCodeUnits helper function to the js-api package.
Description check	✅ Passed	The description is directly related to the changeset, explaining the UTF-8 to UTF-16 conversion issue, the solution implemented, and test coverage added.
Linked Issues check	✅ Passed	The PR successfully addresses issue `#4035` by implementing a conversion function that transforms UTF-8 byte spans to UTF-16 code unit spans, enabling correct string.slice() usage for non-ASCII content.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to implementing the spanInBytesToSpanInCodeUnits helper function and its tests, with no unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ash1day added 2 commits February 2, 2026 03:12

fix: address review feedback for unpaired surrogates and test stability

c53e4e2

ash1day marked this pull request as ready for review February 3, 2026 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(js-api): add spanInBytesToSpanInCodeUnits helper function#8944

feat(js-api): add spanInBytesToSpanInCodeUnits helper function#8944
ash1day wants to merge 2 commits intobiomejs:mainfrom
ash1day:fix/4035-js-api-utf16-span

ash1day commented Feb 2, 2026

Uh oh!

changeset-bot bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ash1day commented Feb 2, 2026

Summary

Test Plan

Docs

Uh oh!

changeset-bot bot commented Feb 2, 2026

🦋 Changeset detected

Uh oh!

coderabbitai bot commented Feb 3, 2026

Walkthrough

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant