Releases: tzervas/tritter
Releases · tzervas/tritter
v0.2.0 - Multimodal Architecture & Training Optimization
🚀 Major Release: Multimodal Transformer with Training Optimization
Core Features
- Multimodal Architecture: Unified text, vision, and audio embedding space
- BitNet 1.58-bit Quantization: Ternary weights {-1, 0, +1} with STE training
- Training Optimization: VSA compression, ternary math, gradient prediction
- LoRA/QLoRA Fine-tuning: Train 40B models on 16GB GPU
New Components
- Tokenization: BPE (tiktoken) + AST-aware code tokenization (tree-sitter)
- Vision: SigLIP encoder + VQ-VAE image tokenizer
- Audio: EnCodec-style audio tokenization
- Curation: Dataset quality gates for security and quality
- Embedding: KNN/VQ rounding for embedding-prediction paradigm
- Optimization: Phase-based training (FULL → PREDICT → CORRECT cycles)
Infrastructure
- RTX 5080 (Blackwell) GPU support
- Python 3.13 compatibility
- HuggingFace Hub integration
- Complete training pipeline with data preparation
Model Sizes
| Size | Params | Packed Weights | Recommended VRAM |
|---|---|---|---|
| test | ~10M | ~2 MB | Any |
| 125M | 125M | ~29 MB | 8GB |
| 350M | 350M | ~82 MB | 8GB |
| 1B | 1.1B | 261 MB | 8GB |
| 7B | 6.2B | 1.45 GB | 16GB |
Standalone Modules
- vsa-training-optimizer - Training optimization toolkit
Installation
pip install tritterTest Results
- 600+ tests passing
- Verified on RTX 5080 16GB