Real-time, low-overhead visualization of LLM internals during training.
Synapse streams live tensor activation statistics from a PyTorch training loop into a browser-based 3D dashboard. The design goal is "debuggability without slowing training": keep the hot-path minimal, do aggregation/sparsification off-thread, and ship compact binary packets over WebSockets.
- Python 3.8+ for backend extension
- Node.js 18+ and npm for frontend
- Rust stable and wasm-pack for WebAssembly parser
- CMake 3.14+ for backend build
- C++17 compatible compiler
# Install all dependencies
make install
# Build all components
make build
# Run tests
make test
# Start development servers (backend in one terminal, frontend in another)
make devcd backend_extension
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e . --no-build-isolationcd wasm_parser
rustup target add wasm32-unknown-unknown
wasm-pack build --target web --out-dir ../frontend_dashboard/src/lib/wasm_pkgcd frontend_dashboard
npm install| Command | Description |
|---|---|
make |
Show help |
make build |
Build all components (wasm + backend) |
make wasm |
Build Wasm parser only |
make backend |
Build C++ extension only |
make frontend |
Build frontend only |
make test |
Run all tests |
make clean |
Clean all build artifacts |
| Command | Description |
|---|---|
make test-backend |
Run C++ unit tests |
make test-frontend |
Run frontend unit tests |
make test-wasm |
Run Rust unit tests |
| Command | Description |
|---|---|
make lint |
Run all linters |
make format |
Format all code |
make check |
Run type checks |
Terminal 1:
cd backend_extension
python python/simulate_llama8b.py --per-block attn+mlp+residual --threshold 0.6 --fps 5Available options:
--host HOST- WebSocket host (default: localhost)--port PORT- WebSocket port (default: 9000)--fps N- Packets per second (default: 10)--threshold N- Activation threshold (default: 0.5)--layers N- Number of layers (default: 32)--per-block MODE- Emissions per block (attn+mlp, attn+mlp+residual, residual)
Terminal 2:
cd frontend_dashboard
npm run devOpen browser to http://localhost:5173
synapse/
├── backend_extension/ # Python C++ extension (neural_probe)
│ ├── src/ # C++ source
│ ├── include/ # C++ headers
│ ├── python/ # Python scripts and simulator
│ └── tests/ # C++ unit tests
├── wasm_parser/ # WebAssembly packet parser (Rust)
│ ├── src/ # Rust source
│ └── tests/ # Rust unit tests
├── frontend_dashboard/ # SvelteKit dashboard (TypeScript/Svelte)
│ ├── src/ # Source code
│ │ ├── lib/ # Utilities and stores
│ │ └── routes/ # Pages and components
│ └── static/ # Static assets
└── project_documentation/ # Development documentation
- Python Training Loop → Calls
neural_probe.log_activation(layer_id, tensor) - C++ Extension → Processes tensor, applies threshold, creates binary packets
- Ring Buffer → Thread-safe queue between Python and WebSocket thread
- uWebSockets → Broadcasts packets to connected clients
- Frontend WebSocket → Receives binary data
- Wasm Parser → Decodes binary packets off main thread
- Svelte Stores → Update reactive state
- Three.js → Renders 3D visualization
Binary protocol with 32-byte header + typed payload:
- Layer Summary Batch: Macro view (mean/max per layer)
- Sparse Activations: Micro view (indices + values above threshold)
- Control Messages: Frontend→backend config changes
- Model Meta: Topology metadata for deterministic layout
See backend_extension/include/protocol.h for full specification.
- Frontend: ESLint + Prettier (auto-format on save)
- Backend: Clang-Format (manual via
make format-cpp) - Wasm: Rustfmt (automatic via
make format-rust)
- Frontend: Strict TypeScript, no
anytypes - Backend: C++17 with modern practices
- Wasm: Full type safety with serde
- Unit tests for all components
- Run
make testto verify builds - See
project_documentation/dev/TEST_PLAN.mdfor coverage details
cd backend_extension
pip install -e . --no-build-isolationcd wasm_parser
wasm-pack build --target web --out-dir ../frontend_dashboard/src/lib/wasm_pkg- Verify backend simulator is running on port 9000
- Check firewall settings
- Browser console shows connection status
- Verify simulator runs in multi-layer mode
- Rotate camera angle to see stacked planes
Enable verbose logging:
Frontend: Browser console shows all WebSocket events Backend: C++ logs to stdout with timestamps and levels Wasm: Parse errors thrown to browser console
CMAKE_BUILD_TYPE- Debug or Release (default: Debug)PYTHON- Python executable (default: python3)
All standard Makefile variables supported:
make CMAKE_BUILD_TYPE=Release build- Build optimized versionmake PYTHON=python3.9- Use specific Python version
See CONTRIBUTING.md for guidelines.
This is licensed Open Source Program under CDLI Non-Commercial Open Source License - Version 2.0, 2025. See LICENSE file for details.
backend_extension/— Python C++ extension (neural_probe) + uWebSockets broadcasterwasm_parser/— WebAssembly packet parser (Rust)frontend_dashboard/— SvelteKit + Three.js dashboard (macro layers + micro sparse neurons + control panel)
- Binary protocol v1 with a fixed header and typed payloads:
- Layer macro summaries (
NF_MSG_LAYER_SUMMARY_BATCH) - Sparse neuron activations (
NF_MSG_SPARSE_ACTIVATIONS) - Control messages (
NF_MSG_CONTROL) for live threshold changes - Topology metadata (
NF_MSG_MODEL_META) to make layout deterministic
- Layer macro summaries (
- Frontend:
- Deterministic "layer grid" placement (no random scatter)
- Auto-framing camera + orbit controls
- Control panel (threshold + theme + bloom)
- More robust parsing/handling when packets arrive out of order
- Tooling:
- Llama-8B-like multi-layer activation simulator:
backend_extension/python/simulate_llama8b.py
- Llama-8B-like multi-layer activation simulator:
Sparse neuron grids are useful, but transformers are often best understood via:
- Attention patterns (token→token edges per head, per layer)
- Residual stream norms and per-layer contribution
- Top-k routed activations (MoE) and expert utilization
The next "real" step is to define an attention packet type that can be rendered as token graphs without pretending we have full neuron connectivity.
- In case you have some issues with the simulator, I gathered some common ones here. Extensive FAQ will be added later.
If the frontend receives summaries before full topology is known, it can bootstrap from the first summary and later needs to rebuild as new layer summaries arrive. The dashboard now rebuilds the macro layer stack when it detects new layer IDs.
If you still see a single plane:
- Verify the simulator is running multi-layer mode (see "Run").
- Rotate the camera: stacked planes can overlap face-on.
- Start the backend simulator (requires
neural_probebuilt/installed):./backend_extension/python/run_simulate_llama8b.sh --per-block attn+mlp+residual --threshold 0.6 --fps 5
- Start the dashboard:
npm -C frontend_dashboard installnpm -C frontend_dashboard run dev- Open
http://localhost:5173
The backend extension historically used CMake FetchContent to git clone dependencies. In restricted/offline environments this fails. The project now supports vendoring dependencies under backend_extension/third_party/ and turning off FetchContent:
- See
backend_extension/third_party/README.md
