-
Notifications
You must be signed in to change notification settings - Fork 82
Description
Problem
1. Scalability Limit (JSON Size)
The current KHI file format consists of a JSON header followed by a gzip-compressed text body. Browsers and JavaScript engines often have a hard limit on the size of a single string or JSON object they can parse (typically around 500MB).
When an inspection report's metadata (JSON header) exceeds this limit, the browser fails to load the file, crashing the application or showing an empty state. This effectively puts a hard ceiling on the size/complexity of clusters KHI can inspect.
2. Type Synchronization Overhead
Currently, data models are defined separately in:
- Backend (Go): validation and serialization logic.
- Frontend (TypeScript): display and interaction logic.
This duplication requires manual maintenance to keep them in sync. Adding a new field or changing a type requires modification in both places, leading to potential bugs and increased development effort.
Proposed Solution
Migrate the KHI file format and internal data models to Protocol Buffers (protobuf).
1. Unified Schema Definition
Define the KHI data model (inspection data, timelines, logs, etc.) in .proto files.
- These files will serve as the single source of truth for the data structure.
2. Code Generation
Use protoc to generate:
- Go structs for the backend (parsing, analysis, and API response).
- TypeScript interfaces/classes for the frontend (view logic).
This ensures that the frontend and backend always share the exact same type definitions.
3. Binary File Format: Concatenated Containers
To overcome Protobuf size limits (~50MB) and browser string limits, we will use a Concatenated Container format. The file will be a sequence of binary blocks.
Structure:
- Magic Bytes (3 bytes):
KHI - Metadata Size (4 bytes): UInt32 size of the following metadata protobuf.
- Container Metadata (N bytes): A Protobuf message describing the file structure.
- Contains a list of
Containerdescriptors. - Each descriptor specifies:
Size: Byte size of the container.Type: e.g., "TIMELINE_DATA", "LOG_DATA".Compression: e.g., "GZIP", "NONE".Content: What logic data this container holds (e.g., "Timelines for cluster X").
- Contains a list of
- Container 1 (M bytes): Binary data (e.g., a serialized Protobuf message or raw GZIP stream).
- Container 2...
- Container N...
Key Features:
- Lazy Loading: The frontend only decodes the "Container Metadata" first. It then seeks to and decompresses specific containers only when needed (e.g., decoding only the "Timelines" container for the initial view, and "Logs" on demand).
- Segmentation: Large datasets are split into multiple containers, avoiding the 50MB protobuf limit and 500MB string limit.
- Efficient Decompression:
GZIPcan be applied per-container, allowing the browser to decompress only relevant sections.
Migration Strategy
- Define
.protoschema: Map the existing Go structs for the file header and internal models to Protobuf messages. - Setup Logic: Configure
protocgeneration in theMakefilefor both Go and TS targets. - Backend Refactor: Update the backend serializer to write the new binary format.
- Frontend Refactor: Update the frontend data loader to parse the binary format instead of JSON.
- Note: Backward compatibility for legacy JSON-based files should be maintained if possible, or a conversion tool provided.
Definition of Done
-
.protofiles created for KHI data models. - Automated code generation setup for Go and TypeScript.
- Backend produces valid protobuf-based files.
- Frontend successfully loads and renders protobuf-based files.
- 500MB+ metadata files can be loaded in the browser without crashing.