Migrate KHI File Format to Protocol Buffers

## Problem

### 1. Scalability Limit (JSON Size)

The current KHI file format consists of a JSON header followed by a gzip-compressed text body. Browsers and JavaScript engines often have a hard limit on the size of a single string or JSON object they can parse (typically around 500MB).
When an inspection report's metadata (JSON header) exceeds this limit, the browser fails to load the file, crashing the application or showing an empty state. This effectively puts a hard ceiling on the size/complexity of clusters KHI can inspect.

### 2. Type Synchronization Overhead

Currently, data models are defined separately in:

- **Backend (Go):** validation and serialization logic.
- **Frontend (TypeScript):** display and interaction logic.

This duplication requires manual maintenance to keep them in sync. Adding a new field or changing a type requires modification in both places, leading to potential bugs and increased development effort.

## Proposed Solution

Migrate the KHI file format and internal data models to **Protocol Buffers (protobuf)**.

### 1. Unified Schema Definition

Define the KHI data model (inspection data, timelines, logs, etc.) in `.proto` files.

- These files will serve as the single source of truth for the data structure.

### 2. Code Generation

Use `protoc` to generate:

- **Go structs** for the backend (parsing, analysis, and API response).
- **TypeScript interfaces/classes** for the frontend (view logic).

This ensures that the frontend and backend always share the exact same type definitions.

### 3. Binary File Format: Concatenated Containers

To overcome Protobuf size limits (~50MB) and browser string limits, we will use a **Concatenated Container** format. The file will be a sequence of binary blocks.

**Structure:**

1.  **Magic Bytes (3 bytes):** `KHI`
2.  **Metadata Size (4 bytes):** UInt32 size of the following metadata protobuf.
3.  **Container Metadata (N bytes):** A Protobuf message describing the file structure.
    - Contains a list of `Container` descriptors.
    - Each descriptor specifies:
      - `Size`: Byte size of the container.
      - `Type`: e.g., "TIMELINE_DATA", "LOG_DATA".
      - `Compression`: e.g., "GZIP", "NONE".
      - `Content`: What logic data this container holds (e.g., "Timelines for cluster X").
4.  **Container 1 (M bytes):** Binary data (e.g., a serialized Protobuf message or raw GZIP stream).
5.  **Container 2...**
6.  **Container N...**

**Key Features:**

- **Lazy Loading:** The frontend only decodes the "Container Metadata" first. It then seeks to and decompresses specific containers only when needed (e.g., decoding only the "Timelines" container for the initial view, and "Logs" on demand).
- **Segmentation:** Large datasets are split into multiple containers, avoiding the 50MB protobuf limit and 500MB string limit.
- **Efficient Decompression:** `GZIP` can be applied per-container, allowing the browser to decompress only relevant sections.

## Migration Strategy

1.  **Define `.proto` schema:** Map the existing Go structs for the file header and internal models to Protobuf messages.
2.  **Setup Logic:** Configure `protoc` generation in the `Makefile` for both Go and TS targets.
3.  **Backend Refactor:** Update the backend serializer to write the new binary format.
4.  **Frontend Refactor:** Update the frontend data loader to parse the binary format instead of JSON.
    - _Note:_ Backward compatibility for legacy JSON-based files should be maintained if possible, or a conversion tool provided.

## Definition of Done

- [ ] `.proto` files created for KHI data models.
- [ ] Automated code generation setup for Go and TypeScript.
- [ ] Backend produces valid protobuf-based files.
- [ ] Frontend successfully loads and renders protobuf-based files.
- [ ] 500MB+ metadata files can be loaded in the browser without crashing.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate KHI File Format to Protocol Buffers #425

Problem

1. Scalability Limit (JSON Size)

2. Type Synchronization Overhead

Proposed Solution

1. Unified Schema Definition

2. Code Generation

3. Binary File Format: Concatenated Containers

Migration Strategy

Definition of Done

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migrate KHI File Format to Protocol Buffers #425

Description

Problem

1. Scalability Limit (JSON Size)

2. Type Synchronization Overhead

Proposed Solution

1. Unified Schema Definition

2. Code Generation

3. Binary File Format: Concatenated Containers

Migration Strategy

Definition of Done

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions