Skip to content

feat(js-api): add spanInBytesToSpanInCodeUnits helper function#8944

Open
ash1day wants to merge 2 commits intobiomejs:mainfrom
ash1day:fix/4035-js-api-utf16-span
Open

feat(js-api): add spanInBytesToSpanInCodeUnits helper function#8944
ash1day wants to merge 2 commits intobiomejs:mainfrom
ash1day:fix/4035-js-api-utf16-span

Conversation

@ash1day
Copy link

@ash1day ash1day commented Feb 2, 2026

This PR was developed with assistance from Claude Code and Codex.

Summary

Fixes #4035

Biome diagnostics return spans as UTF-8 byte offsets, but JavaScript strings use UTF-16 code units. This causes string.slice() to extract incorrect text when content contains non-ASCII characters.

This PR adds a spanInBytesToSpanInCodeUnits helper function (ported from the playground, as suggested by @siketyan) to correctly convert byte spans to code unit spans.

Test Plan

Added 5 tests in tests/spanConversion.test.ts:

  • Non-ASCII characters (ç)
  • ASCII-only content
  • Emoji (surrogate pairs)
  • Mixed multi-byte characters
  • Unpaired surrogates

Docs

n/a

Add a utility function to convert byte-based spans from Biome diagnostics
to UTF-16 code unit spans for correct use with JavaScript string.slice().

Biome internally uses UTF-8 byte offsets for spans, but JavaScript strings
use UTF-16 code units. This causes incorrect text extraction when the
content contains non-ASCII characters.

The new function allows users to correctly extract text from diagnostics:

```ts
const [start, end] = spanInBytesToSpanInCodeUnits(
    diagnostic.location.span,
    content
);
const text = content.slice(start, end); // Correct!
```
@changeset-bot
Copy link

changeset-bot bot commented Feb 2, 2026

🦋 Changeset detected

Latest commit: c53e4e2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@biomejs/js-api Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@ash1day ash1day marked this pull request as ready for review February 3, 2026 12:42
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Walkthrough

A new helper function spanInBytesToSpanInCodeUnits has been added to the @biomejs/js-api package. This function converts byte-based span positions (utilised internally by Biome diagnostics) to UTF-16 code unit spans compatible with JavaScript string operations. The implementation includes utilities for detecting surrogate pairs and calculating UTF-8 byte lengths. Supporting test coverage validates the conversion across various content types including non-ASCII characters, emoji, and mixed multi-byte sequences.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding the spanInBytesToSpanInCodeUnits helper function to the js-api package.
Description check ✅ Passed The description is directly related to the changeset, explaining the UTF-8 to UTF-16 conversion issue, the solution implemented, and test coverage added.
Linked Issues check ✅ Passed The PR successfully addresses issue #4035 by implementing a conversion function that transforms UTF-8 byte spans to UTF-16 code unit spans, enabling correct string.slice() usage for non-ASCII content.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing the spanInBytesToSpanInCodeUnits helper function and its tests, with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 @biome/js-api diagnostic generate incorrect span range when content has non-ASCII characters

1 participant