A non-Transformer hierarchical recurrent network with differentiable Gumbel-Softmax routing and bounded memory slots. Runs 7B+ parameter models layer-by-layer on low-budget GPUs.
-
Updated
Jan 15, 2026 - Python
A non-Transformer hierarchical recurrent network with differentiable Gumbel-Softmax routing and bounded memory slots. Runs 7B+ parameter models layer-by-layer on low-budget GPUs.
🌟 Build efficient models with Transformer Hierarchical Layers for powerful text processing and enhanced performance in natural language tasks.
Add a description, image, and links to the memory-augmented topic page so that developers can more easily learn about it.
To associate your repository with the memory-augmented topic, visit your repo's landing page and select "manage topics."