Skip to content

feat(NODE-7315): Use BSON ByteUtils instead of Nodejs Buffer#4840

Open
PavelSafronov wants to merge 31 commits intomongodb:mainfrom
PavelSafronov:NODE-7315
Open

feat(NODE-7315): Use BSON ByteUtils instead of Nodejs Buffer#4840
PavelSafronov wants to merge 31 commits intomongodb:mainfrom
PavelSafronov:NODE-7315

Conversation

@PavelSafronov
Copy link
Contributor

@PavelSafronov PavelSafronov commented Jan 14, 2026

Description

Summary of Changes

Use BSON ByteUtils for Buffer.* operations.
This change also removes the Buffer type from multiple internal types.

What is the motivation for this change?

Eliminate the driver's hard dependency on the Node.js Buffer global, making the codebase compatible with standard Web APIs (Uint8Array) and portable to environments like Cloudflare Workers, Deno, and the Browser.

This is the next step to implementing NODE-7300 Replace Buffer with Uint8Array,

Double check the following

  • Lint is passing (npm run check:lint)
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

1. using Uint8Array instead of Buffer in all internal code
2. enabling eslint rule to block Buffer use in src
3. using BSON's ByteUtils in places where we depended on Buffer operations
4. using ByteUtils.isUint8Array instead of Buffer.isBuffer
5. introduced writeInt32LE wrapper that uses the same order of variables, to avoid embarassing mistakes
@PavelSafronov PavelSafronov changed the title wip: using bson for buffer calls feat(NODE-7315): Use BSON ByteUtils instead of Nodejs Buffer Jan 16, 2026
@PavelSafronov PavelSafronov marked this pull request as ready for review January 21, 2026 19:10
@PavelSafronov PavelSafronov requested a review from a team as a code owner January 21, 2026 19:10
@baileympearson baileympearson self-assigned this Jan 22, 2026
@baileympearson baileympearson added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Jan 22, 2026
import * as process from 'process';

import { BSON, type Document, Int32, NumberUtils } from '../../bson';
import { BSON, type Document, fromUTF8, Int32, toUTF8 } from '../../bson';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have two separate ways of importing ByteUtils:

  • importing the BSON object and calling BSON.ByteUtils.<method>
  • importing the wrapper function from the driver's bson.ts file

Can we consolidate on one or the other? One other option is to export ByteUtils and NumberUtils from bson.ts, and then use those directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PavelSafronov was this addressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing Driver's ByteUtils, since it's just a proxy to BSON's ByteUtils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baileympearson , yup, addressed in latest version: Driver's ByteUtils are removed, since they were just a proxy for the BSON ByteUtils.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the ambiguity. What I actually meant is that we don't have consistency in how we import methods from bson.ts. Ex, from commands.ts:

import {
  allocateBuffer,
  allocateUnsafeBuffer,
  BSON,
  type BSONSerializeOptions,
  ByteUtils,
  concatBuffers,
  type Document,
  type Long,
  readInt32LE,
  writeInt32LE
} from '../bson';

We import ByteUtils and some free-functions as well (readInt32LE for example). I was suggesting we choose one or the other and use it consistently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We import ByteUtils and some free-functions as well (readInt32LE for example). I was suggesting we choose one or the other and use it consistently

The specific reason for readInt32LE is that we want the behavior of NumberUtils.getInt32LE PLUS we want to validate the inputs, similarly to what Node's Buffer.readInt32LE is doing.
I think we need this validation logic, because that's the current behavior, and we check that behavior using an empty buffer: https://github.com/mongodb/node-mongodb-native/blob/main/test/unit/cmap/commands.test.ts#L124-L127

That being said, we can get rid of a number of forward-only exports in src/bson.ts like fromUTF8, and our code will just use ByteUtils or NumberUtils. Making that change seems like a good idea, the current approach of re-exporting these functions is unnecessary.

src/utils.ts Outdated
import {
allocateBuffer,
allocateUnsafeBuffer,
ByteUtils as BSONByteUtils,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still necessary, or could we just use ByteUtils now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing outdate ByteUtils and using only BSON's ByteUtils.

src/utils.ts Outdated
@@ -46,9 +55,7 @@ export type AnyOptions = Document;

export const ByteUtils = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be best, but it's public, wouldn't we be breaking users if we removed this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only items exported from index.ts and tagged as @public are a part of our public API.

src/bson.ts Outdated
export const getFloat64LE = BSON.NumberUtils.getFloat64LE;
export const getBigInt64LE = BSON.NumberUtils.getBigInt64LE;
export const toUTF8 = BSON.ByteUtils.toUTF8;
export const fromUTF8 = BSON.ByteUtils.fromUTF8;
Copy link
Contributor

@addaleax addaleax Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's too late to change but this PR made me realize that the names toUTF8 and fromUTF8 are reversed semantically – fromUTF8 encodes a string into UTF-8 and toUTF8 decodes a string from UTF-8 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, great spotted! I feel like we can only change that in future major version, I will create ticket for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ByteUtils is experimental - we can make changes outside of semver if we want to fix it now

Copy link
Member

@tadjik1 tadjik1 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the driver already rely on toUTF8 method (this one is old, fromUTF8 has been added just recently) , so if we change its' behaviour I believe it will break existing driver functionality, npm install will download newer version of bson.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodejs nealjs return type
Buffer.from('...', 'utf8') ByteUtils.fromUTF8('...') Buffer
buffer.toString('utf8') ByteUtils.toUTF8(buffer) string

Not saying it's right or semantically makes sense, but this was the inspo for the naming. There are only two hard things in Computer Science: cache invalidation and naming things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the Buffer.from() and buffer.toString() APIs are a bit weird in that they can either do conversion from/to character encodings (like UTF-8 or UTF-16) or from/to binary-to-text encodings (like Hex or Base64), since those "happen" to have the same signatures. The ByteUtils names make sense for the latter, not the former

Co-authored-by: Anna Henningsen <github@addaleax.net>
src/utils.ts Outdated
@@ -46,9 +55,7 @@ export type AnyOptions = Document;

export const ByteUtils = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only items exported from index.ts and tagged as @public are a part of our public API.

import * as process from 'process';

import { BSON, type Document, Int32, NumberUtils } from '../../bson';
import { BSON, type Document, fromUTF8, Int32, toUTF8 } from '../../bson';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PavelSafronov was this addressed?

it('sets the length of the document sequence', function () {
// Bytes starting at index 1 is a 4 byte length.
expect(buffers[3].readInt32LE(1)).to.equal(25);
expect(readInt32LE(buffers[3], 1)).to.equal(25);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are deferring test changes to a future PR, why are these changes necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting test changes, these are not necessary.

import * as process from 'process';

import { BSON, type Document, Int32, NumberUtils } from '../../bson';
import { BSON, type Document, fromUTF8, Int32, toUTF8 } from '../../bson';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the ambiguity. What I actually meant is that we don't have consistency in how we import methods from bson.ts. Ex, from commands.ts:

import {
  allocateBuffer,
  allocateUnsafeBuffer,
  BSON,
  type BSONSerializeOptions,
  ByteUtils,
  concatBuffers,
  type Document,
  type Long,
  readInt32LE,
  writeInt32LE
} from '../bson';

We import ByteUtils and some free-functions as well (readInt32LE for example). I was suggesting we choose one or the other and use it consistently

writeInt32LE(header, this.requestId, 4); // requestID
writeInt32LE(header, 0, 8); // responseTo
writeInt32LE(header, OP_MSG, 12); // opCode
writeInt32LE(header, flags, 16); // flags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you meant to change Uint32LE -> Int32LE, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this code right now to call NumberUtils.setInt32LE directly, but setInt32LE is able to handle both signed and unsigned, so we can use the same method here for these calls.

In this particular spot, flags can be only a handful of values, so I verified that both our setInt32LE and Buffer.writeUInt32LE behave the same way when we pass in any of the possible flags.

import assert from 'node:assert';
import { Buffer } from 'node:buffer';

const setInt32LE = (destination: Uint8Array, offset: number, value: number): 4 => {
    destination[offset] = value;
    value >>>= 8;
    destination[offset + 1] = value;
    value >>>= 8;
    destination[offset + 2] = value;
    value >>>= 8;
    destination[offset + 3] = value;
    return 4;
}

const startingLength = 20;

const val1 = 1;
const val2 = 2;
const val3 = 1 << 16;
const val4 = val1 | val2 | val3;

const buff1 = Buffer.alloc(startingLength);
buff1.writeUint32LE(val1, 0);
buff1.writeUint32LE(val2, 4);
buff1.writeUint32LE(val3, 8);
buff1.writeUint32LE(val4, 12);
const str1 = buff1.toString('hex');
console.log(str1);

const buff2 = Buffer.alloc(startingLength);
setInt32LE(buff2, 0, val1);
setInt32LE(buff2, 4, val2);
setInt32LE(buff2, 8, val3);
setInt32LE(buff2, 12, val4);
const str2 = buff2.toString('hex');
console.log(str2);

assert(str1 === str2, 'Buffers do not match!');

const payloadType = this.data[this.index++];
if (payloadType === 0) {
const bsonSize = this.data.readUInt32LE(this.index);
const bsonSize = readInt32LE(this.data, this.index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another Uint32LE -> Int32LE - let's do careful audit of this PR just to confirm we're reading the correct integer values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated: readUint32LE has been added to src/bson.ts and is now being called in this spot.

I'm auditing the PR now, will post results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the PR line by line, looks good.

Also spoke with Bailey about his Uint32/Int32 concerns. We're going to add comments to the updated code in this file, to link to the relevant specs and explain why we are using the types we're using.

}

return uuidV4().toString('hex');
return BSON.ByteUtils.toHex(uuidV4());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return BSON.ByteUtils.toHex(uuidV4());
return ByteUtils.toHex(uuidV4());

just for consistency

if (appName.length > 0) {
const name =
Buffer.byteLength(appName, 'utf8') <= 128
BSON.ByteUtils.utf8ByteLength(appName) <= 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BSON.ByteUtils.utf8ByteLength(appName) <= 128
ByteUtils.utf8ByteLength(appName) <= 128

- replace `BSON.x.y` code with `x.y`
- added readUint32LE wrapper around NumberUtils.getUint32LE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Primary Review In Review with primary reviewer, not yet ready for team's eyes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants