Conversation
Imho this should be fixed in node-pty to populate the pixel dimensions where possible (I think there was an issue with conpty not providing it, but for other platforms it can be populated) Another workaround for console apps is to use WinOps sequences like CSI 14t (see here for an example https://github.com/jerch/sixel-puzzle/blob/56129538af70fa4ec9441d1d3553398dc8e66f1f/termlib.py#L95) But of couse this is beyond xterm.js and has to be implemented by the console app itself. |
| const binaryString = atob(base64Data); | ||
| const bytes = new Uint8Array(binaryString.length); | ||
| for (let i = 0; i < binaryString.length; i++) { | ||
| bytes[i] = binaryString.charCodeAt(i); | ||
| } |
There was a problem hiding this comment.
This has very bad runtime and is a nightmare creating memory pressure with huge images.
You might want to check the wasm based base64 decoder I used for the IIP protocol. It works on chunks, thus should be much friendlier to chunked data ingestion.
For basic usage see https://github.com/jerch/xterm-wasm-parts, for chunked usage addons/addon-image/src/IIPHandler.ts holds the needed bits.
There was a problem hiding this comment.
Thanks for this. Am I going in right direction with: bae314f
There was a problem hiding this comment.
Following this link https://sw.kovidgoyal.net/kitty/graphics-protocol/#the-graphics-escape-code the base form of all kitty sequences is
<ESC>_G<control data>;<payload><ESC>\
so the delimiter between control data and payload is a semicolon. Next you should clarify, whether the semicolon is an allowed char in control data itself. If not, then the separation of control data vs. payload is unique and you can just scan for the first semicolon in the byte stream. If a semicolon is allowed in control data, then the separation gets more ugly - prolly the = can be used as sentinel, it will not occur in base64 beside at the very end for padding.
About the put method - it gets called for every single chunk of sequence payload (here <control data>;<payload>). It is important to know, that the data can be split arbitrarily (worst case - one byte per put invocation) - so you have to collect the data.
Now to solve the separation of control data vs. base64 payload - I would assign a static Uint32Array to copy chunks over until the semicolon was found, schematically:
private _inControlCode = true; // must be reset in start()
private _controlData = new Uint32Array(4096); // or another size that exceeds expectations
private _controlLength = 0; // must be reset in start()
public put(data: Uint32Array, start: number, end: number): void {
if (this._aborted) return;
// perf note: in C/C++ the following then-else branches should be flipped
// as the payload branch is more likely (no clue if it makes a difference in JS)
if (this._inControlCode) {
let controlEnd = end;
// scan for payload delimiter
for (let i = start, i < end; ++i) {
if (data[i] === 0x3b) {
// ; was found, so here we have all control code data
this._inControlCode = false;
controlEnd = i;
break;
}
}
// copy control data over
this._controlData.set(data.subarray(start, controlEnd), this._controlLength);
this._controlLength += controlEnd - start;
if (!this._inControlCode) {
// this._controlData now contains all control data, as we finished it above
this._parseControlData();
// init the base64 decoder
this._b64Decoder.init(786432);
// we still may have base64 payload in the chunk
// +1 is needed to skip the ; itself
if (controlEnd+1 < end) {
if (this._b64Decoder.put(data, controlEnd+1, end)) {
this._aborted = true;
}
}
}
} else {
// base64 payload
if (this._b64Decoder.put(data, start, end)) {
this._aborted = true;
}
}
}Now this 1MB sharding makes the usage of the base64 decoder a bit more cumbersome. I suggest to wrap the b64Decoder.put calls to do the tracking transparently, schematically:
private _shardSize = 0;
private _b64put(data: Uint32Array, start: number, end: number): boolean {
// is there still room in the base64 shard?
const free = BASE64_SHARD_SIZE - this._shardSize;
if (free >= end - start) {
this._shardSize += end - start;
return this._b64Decoder.put(data, start, end);
}
// fill the shard
if (this._b64Decoder.put(data, start, start + free)) {
return true;
}
// grab the decoded data
this._b64Decoder.end();
const chunk = this._b64Decoder.data8; // borrowed!
// --> store the chunk somehow...
// handle leftover data:
// - restart decoder for next shard decoding
// - feed data to decoder
this._b64Decoder.init(786432);
this._shardSize = 0;
return this._b64Decoder.put(data, start + free, end));
}This last snippet also shows what to do in the end method:
public end(success: boolean): boolean | Promise<boolean> {
if (this._aborted) return true;
// finalize last base64 shard
this._b64Decoder.end();
const chunk = this._b64Decoder.data8; // borrowed!
// --> store the chunk somehow...
if (success) {
// process image data chunks further here...
}
// cleanup chunks here...
return true;There was a problem hiding this comment.
Just saw that the IIP handler is still on an old version of the base64 decoder. It refers to the very first commit in this repo 9a0692b. Please dont change this yet, as there were changes in API. Seem I need to cleanup this...
There was a problem hiding this comment.
Just saw, that there are also sequences, where the payload is omitted or not base64 encoded. For not base64 payload you basically would need to eval the control data before spinning up the base64 stuff.
For sequences without any payload the semicolon as delimiter is also absent. To properly grab those sequences you can do it in the end method, schematically:
public end(success: boolean): boolean | Promise<boolean> {
if (this._aborted) return true;
// sequence without any payload
if (this._inControlCode) {
if (!success) return true;
this._parseControlData();
// exec the control commands if legit
...
return true;
}
// payload handling goes here
...
}There was a problem hiding this comment.
Sorry overlooked this comment yesterday - well the full shard decoding introduces another copy, which will introduce lag spikes, when the decoder gets the shard for decoding.
Would it help to introduce a property on the decoder, that gives you the amount of loaded base64 bytes? Something like this:
interface Decoder {
...
// max allowed base64 bytes
maxBytes: number;
// loaded base64 bytes
loadedBytes: number;
// free base64 bytes (maxBytes - loadedBytes)
freeBytes: number;
}|
@anthonykim1 I start to wonder, if the kitty graphics handler should be integrated into the image addon. I don't know the requirements of this protocol for the render layering, but maybe things could be reused and unified? It would make the lifecycling easier and would not introduce another image layer. But as I said - i dont know the kitty protocol requirements, so fusing it might be a futile attempt. Thoughts? |
| // Load test images as base64 | ||
| const BLACK_1X1_BASE64 = readFileSync('./addons/addon-kitty-graphics/fixture/black-1x1.png').toString('base64'); | ||
| const RGB_3X1_BASE64 = readFileSync('./addons/addon-kitty-graphics/fixture/rgb-3x1.png').toString('base64'); |
There was a problem hiding this comment.
You also want tests for f=24 and 32
| for (let i = 0; i < pixelCount; i++) { | ||
| data[dstOffset ] = bytes[srcOffset ]; // R | ||
| data[dstOffset + 1] = bytes[srcOffset + 1]; // G | ||
| data[dstOffset + 2] = bytes[srcOffset + 2]; // B | ||
| data[dstOffset + 3] = isRgba ? bytes[srcOffset + 3] : ALPHA_OPAQUE; | ||
| srcOffset += bytesPerPixel; | ||
| dstOffset += BYTES_PER_PIXEL_RGBA; | ||
| } |
There was a problem hiding this comment.
This is still subpar - for RGBA data you can simply use the bytes directly, for RGB interleaving with the 4th byte in uint32 blocks might be faster, schematically:
if (format === KittyFormat.RGBA) {
imgData = new ImageData(bytes, width, height);
} else {
const bytes32 = new Uint32Array(BYTES.buffer);
const data32 = new Uint32Array(DATA.buffer);
let dstOffset = 0;
let srcOffset = 0;
// assuming little endian, PIXELS must be 4-aligned
for (let i = 0; i < PIXELS; i += 4) {
const bloc1 = bytes32[srcOffset++];
const bloc2 = bytes32[srcOffset++];
const bloc3 = bytes32[srcOffset++];
data32[dstOffset++] = bloc1 | 0xFF000000;
data32[dstOffset++] = (bloc1 >> 24) | (bloc2 << 8) | 0xFF000000;
data32[dstOffset++] = (bloc2 >> 16) | (bloc3 << 16) | 0xFF000000;
data32[dstOffset++] = (bloc3 >> 8) | 0xFF000000;
}
// handle leftover here...
}This is 28 vs. 7 reads/writes per 4 pixels (memory access is a lot heavier than the bit manipulations).
On a sidenote - this byte interleaving is a typical task where wasm-simd would outperform by far (can do it with 3 instructions - load, swizzle, store).
Edit: Fixing the bit shifts. Timing is 60 ms vs. 10 ms for 4M pixels (speedup 5-6x, or 0.2 GB/s vs. 1.3 GB/s, wasm-simd would reach here >4GB/s).
There was a problem hiding this comment.
Here is a gist of different versions: https://gist.github.com/jerch/e2f7695b887228d9e703c53af23c0737
Results:
$> node lib/test.wasm.js
original 50.4 ms, 250 MB/s
slightly optimized 23.9 ms, 526 MB/s
4 byte blocks 8.1 ms, 1553 MB/s
wasm simd 2.65 ms, 4748 MB/sThere was a problem hiding this comment.
This is crazy.. 50 -> 2.6 🤯🤩🔥
@jerch Thank you for the feedback :) |
|
Sidenote on base64 decoding - ECMA2026 brings Need to re-eval things for the base64 decoder + encoder, on a first glance they are reasonably fast enough (tested with node 25): There is an issue with the wasm impls in node 25, node 20 is much faster: Edit: There is indeed a serious performance issue in node 24. All wasm execution suffers from this, but with SIMD it gets really bad: |
jerch
left a comment
There was a problem hiding this comment.
Nice, have some comments left below.
| this._controlLength += copyLength; | ||
|
|
||
| if (!this._inControlData) { | ||
| // Found semicolon - stream remaining as payload |
There was a problem hiding this comment.
I suggest to parse the control data here already, this way you can cancel invalid sequences early, e.g. a sequence that does not expect payload can be finished/aborted here already (depends on what the spec says about malformed sequences).
There was a problem hiding this comment.
Good point! Added d44900f spec only seem to call out i and I explicitly so currently all other invalid value would be ignored, or fallback to base case atm.
Maybe we could be more opinionated + explicit and handle them explicitly like i and I?
There was a problem hiding this comment.
Hmm I hate sloppy specs 👿
I basically had the same issue with IIP, in an older version the byteSize was mandatory and excess data would invalidate the sequence (this was a good and concise spec), but got watered to optional in a newer version of iTerm2. Currently IIP is still on that older stricter spec, but we might need to change that too, as people use the newer sequences more often now.
If you ask me - I am more on the strict side of things regarding this. I would abort on any malformed sequence (e.g. invalid keys, payload in a sequence that doesn't expect payload) as it greatly lowers the chance, that malicious data enters the terminal. Ofc additional limitations to the base spec need to be clearly stated in the docs.
| const chunk = new Uint8Array(this._decoder.data8); | ||
| this._decodedChunks.push(chunk); |
There was a problem hiding this comment.
Hmm, maybe use the blob interface here? That would save precious memory in the JS context. But has the downside of being async on retrieval (thus works only, when the final payload is only used in end with a promise as return value).
| return this._handleTransmit(cmd, bytes, decodeError); | ||
| case KittyAction.TRANSMIT_DISPLAY: | ||
| return this._handleTransmitDisplay(cmd, bytes, decodeError); | ||
| case KittyAction.QUERY: | ||
| return this._handleQuery(cmd, bytes, decodeError); |
There was a problem hiding this comment.
If any of these follow-up actions store the bytes for later usage, then you must make a copy of them. Currently the image data gets passed on borrowed from the base64 decoder in end, which is optimal for actions that finalize the image processing before end returns (things like new ImageData or createImageBitmap or new Blob already copy the data itself).
Specifically _handleTransmit looks fishy to me, as it stores the data in your this._images map. Easy fix - just replace it with:
...
return this._handleTransmit(cmd, new Uint8Array(bytes), decodeError);
...Haven't checked the other 2 code paths _handleTransmitDisplay and _handleQuery.
There was a problem hiding this comment.
Wow.. Didnt even realize how that underlying memory could be reused later by decoder for different image, thank you this is very helpful!
Done here --> 596ab84
There was a problem hiding this comment.
Yeah that is the idea of keepSize - to reuse the same memory later on to lower GC pressure. But with this borrow pattern things need to be tracked carefully. It works well, if the final action copies or transforms the data anyway in sync (e.g. new ImageData(bytes)).
I see that you also did it for _handleTransmitDisplay - is that really needed? Does the kitty protocol demand you to hold the original bytes?
|
@anthonykim1 Question about your image map: This seems to store kitty image data outside of the Suggestion - check if we can reuse & extend |
@jerch Yeah, this is a challenging one. I don't think we can discard raw data all the time since we won't be able to re-display at different size, a=p (display previously transmitted), etc. I'll play around with the blob offloading, Flow would be like: And then the delete command |
This is already done by the xterm.js/addons/addon-image/src/Types.ts Lines 100 to 108 in b287397 The resized one (
You are reinventing what is already there. Explicit later resizing by the protocol introduces several issues, esp. around cell alignment and how to evict the previous tiles rendered. This prolly needs a clever separate method on Edit:
Eww, that is not possible at all currently, as the So yeah after you sorted out the storage needs, the render side of things will get even more funny due to technical limitations. |
|
@anthonykim1 I think we should create a discussion about the kitty protocol listing all its features and discuss first, what can be implemented and how, and what not. It is so loaded with features, that working into a certain direction might negate other aspects, if not considered from the beginning. |
|
I'll try to list out the components as sub-items/lists to the discussion, their status, etc tomorrow. I also think its easier for us to talk about if I can number the chart/list and reference them individually (This PR is getting big and there's alot to KGP! ) I'd hate to miss out on any valuable feedbacks! |
Part of #5592
WIP
Plan:
(Jan 22, 4:00 PM PST) There's a problem currently where kitty kitten +icat wont return image but using send-image file will.
Copilot's research on why normal kitty kitten +icat doesnt work on xterm demo rn:
(Jan 22, 6:30 PM PST)
Things like:
kitty +kitten icat --use-window-size=80,24,1280,768 --transfer-mode=stream ~/Desktop/xterm.js/demo/logo.pngwork atmTODO since image goes on top of text rn 🥶:
TODOs: