Support Kitty graphics protocol mvp by anthonykim1 · Pull Request #5619 · xtermjs/xterm.js

anthonykim1 · 2026-01-22T19:57:18Z

Part of #5592

WIP

Plan:

(Done with Add register apc handler #5617) First PR: Add registerAPCHandler to parser and expose in API, add tests
Scaffold addon:
- Create addon-kitty-graphics, copy structure from other addons
- Hook up in demo client.ts
  - Search addon-progress and ProgressAddon for how to do this
- Get it working and activating
  - (Checked, remember to remove comments) Add a console.log to the addon's activate function
Come up with a simple test command to verify in kitty or ghostty (see send-png)
Set up APC handler, get it to trigger with the test
Create a playwright test (test folder) in the addon that automates it for a simple image
- Use single 1x1 black pixel png image, write it to the terminal in your test, verify the image we see is black
- Use 3x1 png image, 255 red first, 255 green second, 255 blue third
Understand what APC sequences are being sent by the program
Implement the handlers
Draw to a canvas layer
Make sure we get the playwright tests to pass.

(Jan 22, 4:00 PM PST) There's a problem currently where kitty kitten +icat wont return image but using send-image file will.

Copilot's research on why normal kitty kitten +icat doesnt work on xterm demo rn:

kitty icat uses TIOCGWINSZ ioctl to get pixel dimensions from the PTY before sending any graphics. node-pty doesn't set pixel values in that ioctl, so it returns 0. kitty icat sees 0 pixels and errors out before even trying the graphics protocol.

(Jan 22, 6:30 PM PST)
Things like: kitty +kitten icat --use-window-size=80,24,1280,768 --transfer-mode=stream ~/Desktop/xterm.js/demo/logo.png work atm

TODO since image goes on top of text rn 🥶:

Calculate how many rows/cols the image spans
Move the cursor accordingly (or reserve blank space)

TODOs:

make sure wasm perf hasnt regressed
Make sure cursor is on the right place
Make sure replacement, deletion work
Make sure all the other feedbacks are addressed.
bit shifting things mentioned below
File system support probably need to hook them up with vscode.
Make sure placement works correctly with arguments, check other apps and how they place images.
Share decoder instance: Support Kitty graphics protocol mvp #5619 (comment)
Get the MVP out on feb!!

jerch · 2026-01-23T09:00:01Z

...node-pty doesn't set pixel values in that ioctl...

Imho this should be fixed in node-pty to populate the pixel dimensions where possible (I think there was an issue with conpty not providing it, but for other platforms it can be populated)

Another workaround for console apps is to use WinOps sequences like CSI 14t (see here for an example https://github.com/jerch/sixel-puzzle/blob/56129538af70fa4ec9441d1d3553398dc8e66f1f/termlib.py#L95) But of couse this is beyond xterm.js and has to be implemented by the console app itself.

jerch · 2026-01-23T09:13:18Z

addons/addon-kitty-graphics/src/KittyGraphicsAddon.ts

+    const binaryString = atob(base64Data);
+    const bytes = new Uint8Array(binaryString.length);
+    for (let i = 0; i < binaryString.length; i++) {
+      bytes[i] = binaryString.charCodeAt(i);
+    }


This has very bad runtime and is a nightmare creating memory pressure with huge images.

You might want to check the wasm based base64 decoder I used for the IIP protocol. It works on chunks, thus should be much friendlier to chunked data ingestion.

For basic usage see https://github.com/jerch/xterm-wasm-parts, for chunked usage addons/addon-image/src/IIPHandler.ts holds the needed bits.

Thanks for this. Am I going in right direction with: bae314f

Following this link https://sw.kovidgoyal.net/kitty/graphics-protocol/#the-graphics-escape-code the base form of all kitty sequences is

<ESC>_G<control data>;<payload><ESC>\

so the delimiter between control data and payload is a semicolon. Next you should clarify, whether the semicolon is an allowed char in control data itself. If not, then the separation of control data vs. payload is unique and you can just scan for the first semicolon in the byte stream. If a semicolon is allowed in control data, then the separation gets more ugly - prolly the = can be used as sentinel, it will not occur in base64 beside at the very end for padding.

About the put method - it gets called for every single chunk of sequence payload (here <control data>;<payload>). It is important to know, that the data can be split arbitrarily (worst case - one byte per put invocation) - so you have to collect the data.

Now to solve the separation of control data vs. base64 payload - I would assign a static Uint32Array to copy chunks over until the semicolon was found, schematically:

private _inControlCode = true; // must be reset in start() private _controlData = new Uint32Array(4096); // or another size that exceeds expectations private _controlLength = 0; // must be reset in start() public put(data: Uint32Array, start: number, end: number): void { if (this._aborted) return; // perf note: in C/C++ the following then-else branches should be flipped // as the payload branch is more likely (no clue if it makes a difference in JS) if (this._inControlCode) { let controlEnd = end; // scan for payload delimiter for (let i = start, i < end; ++i) { if (data[i] === 0x3b) { // ; was found, so here we have all control code data this._inControlCode = false; controlEnd = i; break; } } // copy control data over this._controlData.set(data.subarray(start, controlEnd), this._controlLength); this._controlLength += controlEnd - start; if (!this._inControlCode) { // this._controlData now contains all control data, as we finished it above this._parseControlData(); // init the base64 decoder this._b64Decoder.init(786432); // we still may have base64 payload in the chunk // +1 is needed to skip the ; itself if (controlEnd+1 < end) { if (this._b64Decoder.put(data, controlEnd+1, end)) { this._aborted = true; } } } } else { // base64 payload if (this._b64Decoder.put(data, start, end)) { this._aborted = true; } } }

Now this 1MB sharding makes the usage of the base64 decoder a bit more cumbersome. I suggest to wrap the b64Decoder.put calls to do the tracking transparently, schematically:

private _shardSize = 0; private _b64put(data: Uint32Array, start: number, end: number): boolean { // is there still room in the base64 shard? const free = BASE64_SHARD_SIZE - this._shardSize; if (free >= end - start) { this._shardSize += end - start; return this._b64Decoder.put(data, start, end); } // fill the shard if (this._b64Decoder.put(data, start, start + free)) { return true; } // grab the decoded data this._b64Decoder.end(); const chunk = this._b64Decoder.data8; // borrowed! // --> store the chunk somehow... // handle leftover data: // - restart decoder for next shard decoding // - feed data to decoder this._b64Decoder.init(786432); this._shardSize = 0; return this._b64Decoder.put(data, start + free, end)); }

This last snippet also shows what to do in the end method:

public end(success: boolean): boolean | Promise<boolean> { if (this._aborted) return true; // finalize last base64 shard this._b64Decoder.end(); const chunk = this._b64Decoder.data8; // borrowed! // --> store the chunk somehow... if (success) { // process image data chunks further here... } // cleanup chunks here... return true;

Just saw that the IIP handler is still on an old version of the base64 decoder. It refers to the very first commit in this repo 9a0692b. Please dont change this yet, as there were changes in API. Seem I need to cleanup this...

Just saw, that there are also sequences, where the payload is omitted or not base64 encoded. For not base64 payload you basically would need to eval the control data before spinning up the base64 stuff.

For sequences without any payload the semicolon as delimiter is also absent. To properly grab those sequences you can do it in the end method, schematically:

public end(success: boolean): boolean | Promise<boolean> { if (this._aborted) return true; // sequence without any payload if (this._inControlCode) { if (!success) return true; this._parseControlData(); // exec the control commands if legit ... return true; } // payload handling goes here ... }

Thank you! Got lots of inspiration from this for e4710d3 and a7c4a16

Im kind of hesitant to get rid of the full shared -> decode now, since tracking the bytes separately seem intimidating tho 🥲

Sorry overlooked this comment yesterday - well the full shard decoding introduces another copy, which will introduce lag spikes, when the decoder gets the shard for decoding.

Would it help to introduce a property on the decoder, that gives you the amount of loaded base64 bytes? Something like this:

interface Decoder { ... // max allowed base64 bytes maxBytes: number; // loaded base64 bytes loadedBytes: number; // free base64 bytes (maxBytes - loadedBytes) freeBytes: number; }

addons/addon-kitty-graphics/src/KittyGraphicsAddon.ts

jerch · 2026-01-23T09:38:34Z

@anthonykim1 I start to wonder, if the kitty graphics handler should be integrated into the image addon. I don't know the requirements of this protocol for the render layering, but maybe things could be reused and unified? It would make the lifecycling easier and would not introduce another image layer.

But as I said - i dont know the kitty protocol requirements, so fusing it might be a futile attempt. Thoughts?

addons/addon-kitty-graphics/src/KittyGraphicsAddon.ts

addons/addon-kitty-graphics/src/Types.ts

addons/addon-kitty-graphics/src/KittyGraphicsAddon.ts

addons/addon-kitty-graphics/src/KittyImageRenderer.ts

Tyriar · 2026-01-23T19:50:15Z

addons/addon-kitty-graphics/test/KittyGraphicsAddon.test.ts

+// Load test images as base64
+const BLACK_1X1_BASE64 = readFileSync('./addons/addon-kitty-graphics/fixture/black-1x1.png').toString('base64');
+const RGB_3X1_BASE64 = readFileSync('./addons/addon-kitty-graphics/fixture/rgb-3x1.png').toString('base64');


You also want tests for f=24 and 32

send-png

addons/addon-kitty-graphics/src/KittyGraphicsAddon.ts

jerch

A few comments from my side...

addons/addon-kitty-graphics/src/KittyApcHandler.ts

jerch · 2026-01-28T11:31:25Z

addons/addon-kitty-graphics/src/KittyApcHandler.ts

+    for (let i = 0; i < pixelCount; i++) {
+      data[dstOffset    ] = bytes[srcOffset    ]; // R
+      data[dstOffset + 1] = bytes[srcOffset + 1]; // G
+      data[dstOffset + 2] = bytes[srcOffset + 2]; // B
+      data[dstOffset + 3] = isRgba ? bytes[srcOffset + 3] : ALPHA_OPAQUE;
+      srcOffset += bytesPerPixel;
+      dstOffset += BYTES_PER_PIXEL_RGBA;
+    }


This is still subpar - for RGBA data you can simply use the bytes directly, for RGB interleaving with the 4th byte in uint32 blocks might be faster, schematically:

if (format === KittyFormat.RGBA) { imgData = new ImageData(bytes, width, height); } else { const bytes32 = new Uint32Array(BYTES.buffer); const data32 = new Uint32Array(DATA.buffer); let dstOffset = 0; let srcOffset = 0; // assuming little endian, PIXELS must be 4-aligned for (let i = 0; i < PIXELS; i += 4) { const bloc1 = bytes32[srcOffset++]; const bloc2 = bytes32[srcOffset++]; const bloc3 = bytes32[srcOffset++]; data32[dstOffset++] = bloc1 | 0xFF000000; data32[dstOffset++] = (bloc1 >> 24) | (bloc2 << 8) | 0xFF000000; data32[dstOffset++] = (bloc2 >> 16) | (bloc3 << 16) | 0xFF000000; data32[dstOffset++] = (bloc3 >> 8) | 0xFF000000; } // handle leftover here... }

This is 28 vs. 7 reads/writes per 4 pixels (memory access is a lot heavier than the bit manipulations).

On a sidenote - this byte interleaving is a typical task where wasm-simd would outperform by far (can do it with 3 instructions - load, swizzle, store).

Edit: Fixing the bit shifts. Timing is 60 ms vs. 10 ms for 4M pixels (speedup 5-6x, or 0.2 GB/s vs. 1.3 GB/s, wasm-simd would reach here >4GB/s).

Here is a gist of different versions: https://gist.github.com/jerch/e2f7695b887228d9e703c53af23c0737

Results:

$> node lib/test.wasm.js original 50.4 ms, 250 MB/s slightly optimized 23.9 ms, 526 MB/s 4 byte blocks 8.1 ms, 1553 MB/s wasm simd 2.65 ms, 4748 MB/s

This is crazy.. 50 -> 2.6 🤯🤩🔥

anthonykim1 · 2026-01-29T08:56:27Z

I start to wonder, if the kitty graphics handler should be integrated into the image addon. I don't know the requirements of this protocol for the render layering, but maybe things could be reused and unified? It would make the lifecycling easier and would not introduce another image layer.
But as I said - i dont know the kitty protocol requirements, so fusing it might be a futile attempt. Thoughts?

@jerch Thank you for the feedback :)
Will give this a shot! I started moving things around, but not done yet 🥸

jerch · 2026-02-01T13:19:16Z

Sidenote on base64 decoding - ECMA2026 brings Uint8Array.frombase64 & Uint8Array.setfromBase64 (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/fromBase64)

Need to re-eval things for the base64 decoder + encoder, on a first glance they are reasonably fast enough (tested with node 25):

   Context "lib/base64/base64.benchmark.js"
      Context "Base64 - decode"
         Context "Node - Buffer"
            Case "256" : 100 runs - average throughput: 56.09 MB/s
            Case "4096" : 100 runs - average throughput: 465.91 MB/s
            Case "65536" : 100 runs - average throughput: 1215.26 MB/s
            Case "1048576" : 100 runs - average throughput: 1663.89 MB/s
         Context "Node - Uint8Array.fromBase64"
            Case "256" : 100 runs - average throughput: 47.96 MB/s
            Case "4096" : 100 runs - average throughput: 613.69 MB/s
            Case "65536" : 100 runs - average throughput: 1398.99 MB/s
            Case "1048576" : 100 runs - average throughput: 1373.99 MB/s
         Context "Base64Decoder"
            Case "256" : 100 runs - average throughput: 72.56 MB/s
            Case "4096" : 100 runs - average throughput: 586.40 MB/s
            Case "65536" : 100 runs - average throughput: 1477.00 MB/s
            Case "1048576" : 100 runs - average throughput: 1515.69 MB/s
      Context "Base64 - encode"
         Context "Node - Buffer"
            Case "256" : 100 runs - average throughput: 30.34 MB/s
            Case "4096" : 100 runs - average throughput: 357.81 MB/s
            Case "65536" : 100 runs - average throughput: 1451.27 MB/s
            Case "1048576" : 100 runs - average throughput: 1992.45 MB/s
         Context "Node - Uint8Array.toBase64"
            Case "256" : 100 runs - average throughput: 66.50 MB/s
            Case "4096" : 100 runs - average throughput: 865.88 MB/s
            Case "65536" : 100 runs - average throughput: 2996.19 MB/s  <-- WTH?
            Case "1048576" : 100 runs - average throughput: 1086.55 MB/s
         Context "Base64Encoder"
            Case "256" : 100 runs - average throughput: 60.01 MB/s
            Case "4096" : 100 runs - average throughput: 521.15 MB/s
            Case "65536" : 100 runs - average throughput: 1415.68 MB/s
            Case "1048576" : 100 runs - average throughput: 1564.53 MB/s

There is an issue with the wasm impls in node 25, node 20 is much faster:

   Context "lib/base64/base64.benchmark.js"
      Context "Base64 - decode"
         Context "Node - Buffer"
            Case "256" : 100 runs - average throughput: 39.19 MB/s
            Case "4096" : 100 runs - average throughput: 313.93 MB/s
            Case "65536" : 100 runs - average throughput: 504.64 MB/s
            Case "1048576" : 100 runs - average throughput: 570.46 MB/s
         Context "Base64Decoder"
            Case "256" : 100 runs - average throughput: 46.53 MB/s
            Case "4096" : 100 runs - average throughput: 654.45 MB/s
            Case "65536" : 100 runs - average throughput: 2100.95 MB/s
            Case "1048576" : 100 runs - average throughput: 2213.01 MB/s
      Context "Base64 - encode"
         Context "Node - Buffer"
            Case "256" : 100 runs - average throughput: 24.87 MB/s
            Case "4096" : 100 runs - average throughput: 331.30 MB/s
            Case "65536" : 100 runs - average throughput: 1211.42 MB/s
            Case "1048576" : 100 runs - average throughput: 1684.94 MB/s
         Context "Base64Encoder"
            Case "256" : 100 runs - average throughput: 69.87 MB/s
            Case "4096" : 100 runs - average throughput: 587.26 MB/s
            Case "65536" : 100 runs - average throughput: 2078.57 MB/s
            Case "1048576" : 100 runs - average throughput: 2259.20 MB/s

Edit: There is indeed a serious performance issue in node 24. All wasm execution suffers from this, but with SIMD it gets really bad:

jerch@linpad:~/xterm-wasm-parts$ nvm use 24
Now using node v24.13.0 (npm v11.6.2)
jerch@linpad:~/xterm-wasm-parts$ node lib/base64/Base64Decoder.wasm.js
duration: 508 ms, rate: 2064 MB/s
duration: 493 ms, rate: 2126 MB/s
duration: 522 ms, rate: 2008 MB/s
jerch@linpad:~/xterm-wasm-parts$ node lib/base64/Base64Decoder.wasm.js
duration: 517 ms, rate: 2027 MB/s
duration: 482 ms, rate: 2177 MB/s
duration: 665 ms, rate: 1577 MB/s
jerch@linpad:~/xterm-wasm-parts$ nvm use 22
Now using node v22.22.0 (npm v10.9.4)
jerch@linpad:~/xterm-wasm-parts$ node lib/base64/Base64Decoder.wasm.js
duration: 254 ms, rate: 4124 MB/s
duration: 213 ms, rate: 4928 MB/s
duration: 223 ms, rate: 4700 MB/s
jerch@linpad:~/xterm-wasm-parts$ node lib/base64/Base64Decoder.wasm.js
duration: 220 ms, rate: 4767 MB/s
duration: 199 ms, rate: 5265 MB/s
duration: 234 ms, rate: 4472 MB/s

jerch

Nice, have some comments left below.