feat(js-api): add spanInBytesToSpanInCodeUnits helper function#8944
feat(js-api): add spanInBytesToSpanInCodeUnits helper function#8944ash1day wants to merge 2 commits intobiomejs:mainfrom
Conversation
Add a utility function to convert byte-based spans from Biome diagnostics
to UTF-16 code unit spans for correct use with JavaScript string.slice().
Biome internally uses UTF-8 byte offsets for spans, but JavaScript strings
use UTF-16 code units. This causes incorrect text extraction when the
content contains non-ASCII characters.
The new function allows users to correctly extract text from diagnostics:
```ts
const [start, end] = spanInBytesToSpanInCodeUnits(
diagnostic.location.span,
content
);
const text = content.slice(start, end); // Correct!
```
🦋 Changeset detectedLatest commit: c53e4e2 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
WalkthroughA new helper function 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Fixes #4035
Biome diagnostics return spans as UTF-8 byte offsets, but JavaScript strings use UTF-16 code units. This causes
string.slice()to extract incorrect text when content contains non-ASCII characters.This PR adds a
spanInBytesToSpanInCodeUnitshelper function (ported from the playground, as suggested by @siketyan) to correctly convert byte spans to code unit spans.Test Plan
Added 5 tests in
tests/spanConversion.test.ts:Docs
n/a