feat(NODE-7315): Use BSON ByteUtils instead of Nodejs Buffer#4840
feat(NODE-7315): Use BSON ByteUtils instead of Nodejs Buffer#4840PavelSafronov wants to merge 31 commits intomongodb:mainfrom
Conversation
1. using Uint8Array instead of Buffer in all internal code 2. enabling eslint rule to block Buffer use in src 3. using BSON's ByteUtils in places where we depended on Buffer operations 4. using ByteUtils.isUint8Array instead of Buffer.isBuffer 5. introduced writeInt32LE wrapper that uses the same order of variables, to avoid embarassing mistakes
| import * as process from 'process'; | ||
|
|
||
| import { BSON, type Document, Int32, NumberUtils } from '../../bson'; | ||
| import { BSON, type Document, fromUTF8, Int32, toUTF8 } from '../../bson'; |
There was a problem hiding this comment.
It looks like we have two separate ways of importing ByteUtils:
- importing the
BSONobject and callingBSON.ByteUtils.<method> - importing the wrapper function from the driver's
bson.tsfile
Can we consolidate on one or the other? One other option is to export ByteUtils and NumberUtils from bson.ts, and then use those directly.
There was a problem hiding this comment.
Removing Driver's ByteUtils, since it's just a proxy to BSON's ByteUtils.
There was a problem hiding this comment.
@baileympearson , yup, addressed in latest version: Driver's ByteUtils are removed, since they were just a proxy for the BSON ByteUtils.
There was a problem hiding this comment.
Sorry for the ambiguity. What I actually meant is that we don't have consistency in how we import methods from bson.ts. Ex, from commands.ts:
import {
allocateBuffer,
allocateUnsafeBuffer,
BSON,
type BSONSerializeOptions,
ByteUtils,
concatBuffers,
type Document,
type Long,
readInt32LE,
writeInt32LE
} from '../bson';We import ByteUtils and some free-functions as well (readInt32LE for example). I was suggesting we choose one or the other and use it consistently
There was a problem hiding this comment.
We import ByteUtils and some free-functions as well (readInt32LE for example). I was suggesting we choose one or the other and use it consistently
The specific reason for readInt32LE is that we want the behavior of NumberUtils.getInt32LE PLUS we want to validate the inputs, similarly to what Node's Buffer.readInt32LE is doing.
I think we need this validation logic, because that's the current behavior, and we check that behavior using an empty buffer: https://github.com/mongodb/node-mongodb-native/blob/main/test/unit/cmap/commands.test.ts#L124-L127
That being said, we can get rid of a number of forward-only exports in src/bson.ts like fromUTF8, and our code will just use ByteUtils or NumberUtils. Making that change seems like a good idea, the current approach of re-exporting these functions is unnecessary.
src/utils.ts
Outdated
| import { | ||
| allocateBuffer, | ||
| allocateUnsafeBuffer, | ||
| ByteUtils as BSONByteUtils, |
There was a problem hiding this comment.
Still necessary, or could we just use ByteUtils now?
There was a problem hiding this comment.
Removing outdate ByteUtils and using only BSON's ByteUtils.
src/utils.ts
Outdated
| @@ -46,9 +55,7 @@ export type AnyOptions = Document; | |||
|
|
|||
| export const ByteUtils = { | |||
There was a problem hiding this comment.
Can we remove this now?
There was a problem hiding this comment.
That would be best, but it's public, wouldn't we be breaking users if we removed this?
There was a problem hiding this comment.
Only items exported from index.ts and tagged as @public are a part of our public API.
src/bson.ts
Outdated
| export const getFloat64LE = BSON.NumberUtils.getFloat64LE; | ||
| export const getBigInt64LE = BSON.NumberUtils.getBigInt64LE; | ||
| export const toUTF8 = BSON.ByteUtils.toUTF8; | ||
| export const fromUTF8 = BSON.ByteUtils.fromUTF8; |
There was a problem hiding this comment.
I guess it's too late to change but this PR made me realize that the names toUTF8 and fromUTF8 are reversed semantically – fromUTF8 encodes a string into UTF-8 and toUTF8 decodes a string from UTF-8 😢
There was a problem hiding this comment.
oh, great spotted! I feel like we can only change that in future major version, I will create ticket for this
There was a problem hiding this comment.
ByteUtils is experimental - we can make changes outside of semver if we want to fix it now
There was a problem hiding this comment.
As the driver already rely on toUTF8 method (this one is old, fromUTF8 has been added just recently) , so if we change its' behaviour I believe it will break existing driver functionality, npm install will download newer version of bson.
There was a problem hiding this comment.
| nodejs | nealjs | return type |
|---|---|---|
Buffer.from('...', 'utf8') |
ByteUtils.fromUTF8('...') |
Buffer |
buffer.toString('utf8') |
ByteUtils.toUTF8(buffer) |
string |
Not saying it's right or semantically makes sense, but this was the inspo for the naming. There are only two hard things in Computer Science: cache invalidation and naming things.
There was a problem hiding this comment.
Yeah, the Buffer.from() and buffer.toString() APIs are a bit weird in that they can either do conversion from/to character encodings (like UTF-8 or UTF-16) or from/to binary-to-text encodings (like Hex or Base64), since those "happen" to have the same signatures. The ByteUtils names make sense for the latter, not the former
Co-authored-by: Anna Henningsen <github@addaleax.net>
src/utils.ts
Outdated
| @@ -46,9 +55,7 @@ export type AnyOptions = Document; | |||
|
|
|||
| export const ByteUtils = { | |||
There was a problem hiding this comment.
Only items exported from index.ts and tagged as @public are a part of our public API.
| import * as process from 'process'; | ||
|
|
||
| import { BSON, type Document, Int32, NumberUtils } from '../../bson'; | ||
| import { BSON, type Document, fromUTF8, Int32, toUTF8 } from '../../bson'; |
test/unit/cmap/commands.test.ts
Outdated
| it('sets the length of the document sequence', function () { | ||
| // Bytes starting at index 1 is a 4 byte length. | ||
| expect(buffers[3].readInt32LE(1)).to.equal(25); | ||
| expect(readInt32LE(buffers[3], 1)).to.equal(25); |
There was a problem hiding this comment.
If we are deferring test changes to a future PR, why are these changes necessary?
There was a problem hiding this comment.
Reverting test changes, these are not necessary.
| import * as process from 'process'; | ||
|
|
||
| import { BSON, type Document, Int32, NumberUtils } from '../../bson'; | ||
| import { BSON, type Document, fromUTF8, Int32, toUTF8 } from '../../bson'; |
There was a problem hiding this comment.
Sorry for the ambiguity. What I actually meant is that we don't have consistency in how we import methods from bson.ts. Ex, from commands.ts:
import {
allocateBuffer,
allocateUnsafeBuffer,
BSON,
type BSONSerializeOptions,
ByteUtils,
concatBuffers,
type Document,
type Long,
readInt32LE,
writeInt32LE
} from '../bson';We import ByteUtils and some free-functions as well (readInt32LE for example). I was suggesting we choose one or the other and use it consistently
src/cmap/commands.ts
Outdated
| writeInt32LE(header, this.requestId, 4); // requestID | ||
| writeInt32LE(header, 0, 8); // responseTo | ||
| writeInt32LE(header, OP_MSG, 12); // opCode | ||
| writeInt32LE(header, flags, 16); // flags |
There was a problem hiding this comment.
I don't think you meant to change Uint32LE -> Int32LE, right?
There was a problem hiding this comment.
Updating this code right now to call NumberUtils.setInt32LE directly, but setInt32LE is able to handle both signed and unsigned, so we can use the same method here for these calls.
In this particular spot, flags can be only a handful of values, so I verified that both our setInt32LE and Buffer.writeUInt32LE behave the same way when we pass in any of the possible flags.
import assert from 'node:assert';
import { Buffer } from 'node:buffer';
const setInt32LE = (destination: Uint8Array, offset: number, value: number): 4 => {
destination[offset] = value;
value >>>= 8;
destination[offset + 1] = value;
value >>>= 8;
destination[offset + 2] = value;
value >>>= 8;
destination[offset + 3] = value;
return 4;
}
const startingLength = 20;
const val1 = 1;
const val2 = 2;
const val3 = 1 << 16;
const val4 = val1 | val2 | val3;
const buff1 = Buffer.alloc(startingLength);
buff1.writeUint32LE(val1, 0);
buff1.writeUint32LE(val2, 4);
buff1.writeUint32LE(val3, 8);
buff1.writeUint32LE(val4, 12);
const str1 = buff1.toString('hex');
console.log(str1);
const buff2 = Buffer.alloc(startingLength);
setInt32LE(buff2, 0, val1);
setInt32LE(buff2, 4, val2);
setInt32LE(buff2, 8, val3);
setInt32LE(buff2, 12, val4);
const str2 = buff2.toString('hex');
console.log(str2);
assert(str1 === str2, 'Buffers do not match!');
| const payloadType = this.data[this.index++]; | ||
| if (payloadType === 0) { | ||
| const bsonSize = this.data.readUInt32LE(this.index); | ||
| const bsonSize = readInt32LE(this.data, this.index); |
There was a problem hiding this comment.
Another Uint32LE -> Int32LE - let's do careful audit of this PR just to confirm we're reading the correct integer values.
There was a problem hiding this comment.
This is updated: readUint32LE has been added to src/bson.ts and is now being called in this spot.
I'm auditing the PR now, will post results.
There was a problem hiding this comment.
Went through the PR line by line, looks good.
Also spoke with Bailey about his Uint32/Int32 concerns. We're going to add comments to the updated code in this file, to link to the relevant specs and explain why we are using the types we're using.
src/cmap/connection.ts
Outdated
| } | ||
|
|
||
| return uuidV4().toString('hex'); | ||
| return BSON.ByteUtils.toHex(uuidV4()); |
There was a problem hiding this comment.
| return BSON.ByteUtils.toHex(uuidV4()); | |
| return ByteUtils.toHex(uuidV4()); |
just for consistency
| if (appName.length > 0) { | ||
| const name = | ||
| Buffer.byteLength(appName, 'utf8') <= 128 | ||
| BSON.ByteUtils.utf8ByteLength(appName) <= 128 |
There was a problem hiding this comment.
| BSON.ByteUtils.utf8ByteLength(appName) <= 128 | |
| ByteUtils.utf8ByteLength(appName) <= 128 |
- replace `BSON.x.y` code with `x.y` - added readUint32LE wrapper around NumberUtils.getUint32LE
Description
Summary of Changes
Use BSON ByteUtils for Buffer.* operations.
This change also removes the Buffer type from multiple internal types.
What is the motivation for this change?
Eliminate the driver's hard dependency on the Node.js Buffer global, making the codebase compatible with standard Web APIs (Uint8Array) and portable to environments like Cloudflare Workers, Deno, and the Browser.
This is the next step to implementing NODE-7300 Replace Buffer with Uint8Array,
Double check the following
npm run check:lint)type(NODE-xxxx)[!]: descriptionfeat(NODE-1234)!: rewriting everything in coffeescript