Releases · tenstorrent/tt-metal

22 Feb 03:51

github-actions

Immutable

v0.68.0-dev20260222

ee7b53f

v0.68.0-dev20260222 Pre-release

Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22267362568

📦 Uncategorized

Fuse sdpa_reduce_to_all with post_sdpa
- PR: #38159
Decouple FDKernels from MetalContext
- PR: #38103
D2D Socket based Python Op
- PR: #38194
DeepSeek teacher forced accuracy : defer xfail until after accuracy metrics are computed
- PR: #38255
Fix uninitialized KernelHandle in LayerNormForwardKernels
- PR: #38195
Fix uninitialized KernelHandle in SDPA backward program factories
- PR: #37944
Fix dead store in matmul DRAM sharded program factory
- PR: #37946
Fix watcher assert in blitz broadcast
- PR: #38277
Fix core.NullDereference in rotate operation program factories
- PR: #38193
#36020: Add MOE/MLP weights infra and update tests
- PR: #38288
Add position tracking to mla
- PR: #38292

Assets 27

21 Feb 10:16

github-actions

Immutable

v0.68.0-dev20260221

aa7c4d6

v0.68.0-dev20260221 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22246649996

📦 Uncategorized

Create view MeshBuffer in tensor::view and store root MeshBuffer in DeviceStorage
- PR: #38101
Move cb ids to CTAs for blitz RMSNorm
- PR: #38184
Adding trace and 2cq functionality to pi0 model
- PR: #38174
Remove static variables from dispatch topology init code
- PR: #37588
A balanced traffic pattern for AG minimal.
- PR: #37878
#37464: Update unary LLK Tile API's
- PR: #37703
fix deepseek quad CI
- PR: #37995
[TTTv2] Add LMHead1D module for 1D topologies
- PR: #37542
Fix TTTv2 Galaxy CI: Set default HF_MODEL and simplify MLP2D test memory config
- PR: #38060
Optimise LCM for BH.
- PR: #38119
#37985: Overlap post-SDPA CB memory regions
- PR: #38096
Increase stable diffusion demo test timeout to 600s
- PR: #38197
SDPA decode bug fixes
- PR: #38004
Add MLA SDPA test to tests/didt
- PR: #38034
chore: update LLK submodule to e9428f4
- PR: #38171
Add new didt tests for the SDPA OP
- PR: #38014
Disable SD 1.4 on BH P150 Model perf pipeline
- PR: #38213
Revert "A balanced traffic pattern for AG minimal. (#37878)"
- PR: #38202
Add sub_core_grids to ttnn.pad
- PR: #37753
Disabling n300 and enabling more verbose output in triage tests
- PR: #38217
Non-causal SDPA data movement improvements
- PR: #38025
Fix remaining broken imports after SDPA test migration (#37713)
- PR: #38111
[QSR] Adding missing pieces to run compute kernels
- PR: #38081
Add socket pipeline rate tests
- PR: #37809
[DM] Fix hang in DM Test suite
- PR: #38222
[skip ci] temp skip sentence bert due to #38178
- PR: #38183
Add multicast write noc util to perf report CSV
- PR: #38155
Dumping all debug bus signals for block if any risc inside is broken
- PR: #38151
Update tt-triage instructions in kernel debugging tips
- PR: #38219
[skip ci] Add AI tool restrictions for bug bounty program
- PR: #37932
Revert "Dumping all debug bus signals for block if any risc inside is broken "
- PR: #38253
increase timeouts for longer deepseek tests
- PR: #38238
Add tiered model CI pipelines for multi-SKU unit and e2e testing
- PR: #37949
PDL perf bump
- PR: #38216
SDXL Relax unet PCC thresholds
- PR: #38204
Adding extra-tag to allow override
- PR: #38185
Cleanup sigmoid implementation
- PR: #37186
DeepSeek MOE/MLP fusion with reduce_to_one
- PR: #38244
rebase/update fabric ubench golden results
- PR: #38245
Fix multi-process safety issue with jit_link_additional_processor
- PR: #38187
Set umd-admins as owners of UMD submodule
- PR: #38242
Use consistent hash for JIT build cache paths
- PR: #38164
Switch broadcast noc usage
- PR: #38176
Revert "Use_VC propagation fix version 2 (#36529)"
- PR: #38240
TTNN Tensor Cleanup in preparation of Metal Tensor Split
- PR: #37690
MoE: Selective reduce combine
- PR: #37432

Assets 27

20 Feb 19:59

github-actions

Immutable

v0.67.0-rc1

4835593

v0.67.0-rc1 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22220491796

📦 Uncategorized

Improve custom_mm to performantly cover more shapes and enable transpose
- PR: #37121
changed reshape tensor layout to TILE for deepseek moe_gate
- PR: #37415
MLA Optimizations
- PR: #37279
Add precompiled headers to tt-train for faster compilation
- PR: #37694
Adding uneven output shard support to untilize
- PR: #37343
Fix minor typos in unary max/min comments.
- PR: #37636
Move prefetcher pytest option to avoid breaking CI tests
- PR: #37613
[Gemma3] Fix for gemma3 failing unit tests
- PR: #37644
[GPT-OSS] Add fused op unit tests for MoE
- PR: #35660
Disable stable_diffusion model perf test on blackhole (#37617)
- PR: #37619
Add program configs for Matmul ops in Embedding block to run across 40 cores in the SDXL Refiner
- PR: #37264
[tt-train] Add training log comparison plotting script
- PR: #37531
[skip ci] Enable watcher apc nightly debug
- PR: #37624
Adding test harness to check cache on device compatibility for Deepseek 671B
- PR: #37649
[Watcher] tt-train-cpp-unit tests have new watcher enabled fails due to recent changes
- PR: #37388
chore: update LLK submodule to 346a830
- PR: #37709
removes meta lib dependencies
- PR: #37046
[WATCHER] Following issues are detected when watcher is enabled on BH post commit
- PR: #37744
[skip ci] Add P300-viommu to BHPC multi card fast tests
- PR: #37758
SGLang generator
- PR: #35980
[tt-train] Complete nanoGPT Python impl
- PR: #36688
Add new CI pipeline for Deepseek to test long seq lens and refactor tests
- PR: #36690
Topology Mapper Integration with Topology Solver API
- PR: #35778
Make TP All reduce optional in Post SDPA
- PR: #37700
Fix misleading comment in dataflow_api for multicasts
- PR: #37760
[skip ci] Update llama demo upstream test id's
- PR: #37658
Enable multi-host neighbor-pad and RingAttentionAllGather CCLs
- PR: #37114
LLK API support for 8x32 tilize
- PR: #37481
Upgrade Pillow -> 12.1.1 to fix CVE-2026-25990
- PR: #37691
Fix moreh kernel runtime arg bounds issues (#37193, #37040)
- PR: #37400
Convert Sparse Multicast Static Asserts to Runtime Asserts
- PR: #37581
Do not use internal bh name in builtins
- PR: #37759
Quasar compute API bringup V1.0
- PR: #35206
[Deepseek Blitz] Split q a proj mm on inner dim
- PR: #37687
Reduce to one generic op and fusing it with moe routed expert
- PR: #37411
[TTTv2] Add attention_1d module with comprehensive unit tests
- PR: #36792
Matmul - Add Support for 2D DRAM interleaved in0 + batched height sharded in1
- PR: #37681
Changes for quad module tests CI
- PR: #37601
Subtract grid offset when computing 0-based indices in sharded LN factory
- PR: #37768
Decouple Cluster initialization from HAL
- PR: #37695
Switch llama 8b to DP=4 in vllm nightly
- PR: #37786
A balanced traffic pattern for AG minimal.
- PR: #36607
[skip ci] Remove t3k select pipeline extra-tag inputs
- PR: #37801
#36982: create_q_heads tilizes to 8x32 tiles
- PR: #37574
Enable (very) basic compute kernels
- PR: #37328
Migrate conv operations to free function style
- PR: #36382
Migrate fast dispatch frequent tests to CIv2 runners
- PR: #37803
reduction: migrate to free function binding + generic cleanup
- PR: #37584
Use gh_run_number for Superset dashboard links in Slack notifications
- PR: #37793
Fix race condition in parallel multi-source jit build
- PR: #37805
chore: update LLK submodule to f7cf929
- PR: #37798
Move SDPA and MLA tests from tt_eager/misc to ttnn/operations/sdpa
- PR: #37713
Revert "A balanced traffic pattern for AG minimal. (#36607)"
- PR: #37832
[skip ci] Fix galaxy perf tests yaml (bad merge)
- PR: #37836
[DM] Update data movement multi_interleaved tests
- PR: #37626
SDXL clip encoder perf targets updated
- PR: #37837
Fix timeouts in vllm nightly
- PR: #37842
DeepSeek Blitz moe fusion
- PR: #37757
Generate Welford reciprocals in Python and pass into distributed layernorm ops
- PR: #37080
Fix TTTv2 MLP 1d from model args mismatch + BH Stress test pytest id
- PR: #37589
[skip ci] Fix Package and release workflow
- PR: #37844
Update compute kernel API to reflect new changes to fast tilize
- PR: #37736
Fix timeouts for qwen in vllm nightly
- PR: #37854
[skip ci] Add back missing schedule to BH demos
- PR: #37862
Pool2D Alignment Fixes for Watcher
- PR: #37599
Add LLK_ASSERTs for verifying tile index in dest accumulator
- PR: #37780
Make mm respect first core from subdevice
- PR: #37511
Add TTTv2 rmsnorm module unit tests to T3K e2e pipeline
- PR: #37800
Unify kernel and firmware JIT build deduplication into JitBuildCache
- PR: #37452
fix(sweep): correct lead-models Slack notifier's run context, counts, and alerting
- PR: #37864
Propagating new unpack LLK for reduce ops
- PR: #37220
#37471: Output dtype parameter - fix for fp32 dst mode conflict
- PR: #37612
Add indexes to TTNN report db
- PR: #36629
DeepSeek Blitz MLP fusion
- PR: #37860
[skip ci] Move conv test to run last in upstream didt suite
- PR: #37875
Delete Event as it is unused code
- PR: #37766
Kwerblinski tt/37656 blitz lm head
- PR: #37761
fix processor names in watcher tests
- PR: #37892
Migrate experimental operations to use bind_function template and free functions
- PR: #37815
Reorder device params to fix deepseek tests cache paths
- PR: #37894
Split initialization of various components into their own classes
- PR: #37453
Add CQ_PREFETCH_CMD_RELAY_LINEAR_PACKED_H command
- PR: #37598
H<->D Ops for Blitz + Changes to support Async Slow Dispatch
- PR: #37705
Migrate pool and adaptive pool operations to free function style
- PR: #37810
Halo Check Output Grid Matches Input Grid
- PR: #37667
Expose tile dim reconfig template flag in metal
- PR: #37568
TT-triage device and core hardening
- PR: #37684
Improve venv relocatability for distributed and tt-run env inherit
- PR: #36282
#37896: Fix silu_init for BH
- PR: #37897
Fix broken import in test_deepseek_mla_ops.py after SDPA test migration
- PR: #37919
Add tt_symbiote: PyTorch-to-TTNN transparent acceleration framework
- PR: #35699
[Blitz Decode] Integrate Embedding with H2D
- PR: #37913
#0: Fix noc_async_write_multicast to pass noc when using one packet version
- PR: #37918
Full flash mla for blitz
- PR: #37867
Implement FMOD as LLK op
- PR: #37050
[gpt-oss] batched prefill and prefill tracing
- PR: #37848
[WATCHER]: Fix reader runtime args for idle cores in SDPA decode
- PR: #37698
Fix deepseek test_moe device_params ordering for cache paths
- PR: #37926
[UMD Bump] Automated UMD Bump 09.02.2026
- PR: #37377
Reduce DeepSeek long-seq decoder override to 12288
- PR: #37834
DeepSeekV3 teacher forcing: KV cache + improved refpt generation
- PR: #37538
fix galaxy quick tests
- PR: #37935
latency packet index ack move to back
- PR: #37806
Add fused minimal matmul addcmul operation
- PR: #36502
Update micro op kernels to not use full inits, and use reconfigures + short inits
- PR: #37937
Updates trace region size for Qwen3-32B on Galaxy to avoid running out of memory
- PR: #37933
Add Fabric multi-host test on ExaBox BH Quad
- PR: #37764
Improve model tracer infra
- PR: #37390
[gpt-oss] fix b=1 demo
- PR: #37950
Fix yaml path reading in nano_gpt and mesh shape in autograd
- PR: #37838
ci(sweeps): restrict lead-model slack notifications to scheduled main runs
- PR: #37948
jit: remove redundand unpack bfp format conversion
- PR: #37606
Fix buffer not sharded error in ring matmul 1d unit tests
- PR: #37945
Deepseek: Optimized OP for MoE Gate
- PR: #37446
SDPA reduce to all positional logic
- PR: #37888
Fused rmsnorm allow fp32 stat and rope inputs
- PR: #37859
Fabric Fused Scatter Write + Atomic Increment Messaging
- PR: #37751
Add deepseek decode layer test into galaxy-quick
- PR: #37788
Add DeepSeek V3 B1 demo host interface integration tests
- PR: #37952
[tt_dit] Reduce module cache data size
- PR: #37777
Pipe compute config to reduce scatter
- PR: #37748
Adding ND Sharding Support for the Untilize With Unpadding Op
- PR: #37156
[skip ci] Run test_host_io.py on viommu runners only
- PR: #37967
Add argmax based k=1 sampling micro-op to be used in the fused LM head + sampling layer
- PR: #37889
added demo profiling script and device perf utils
- PR: #37413
[skip ci] Rename workflow and update repository references
- PR: #37283
Increased core count for paged SDPA for Qwen
- PR: #37872
Add GitHub merge queue data workflow
- PR: #37357
Updating slice_write tests to use the ttnn.experimental module
- PR: #37971
use nested skus for deepseek perf test [skip ci]
- PR: #37980
Simplified the way to select a program factory
- PR: #37661
Consolidate JIT-generated descriptor headers
- PR: #37818
add wrapraround for neighbor exchange
- PR: #37648
[QSR] Enable all Neos
- PR: #37987
Remove hostname suffix for TT_METAL_CACHE in ttrun.py
- PR: #37989
Revert "[skip ci] Remove parallelism as we suspect a race condition somewhere"
- PR: #37965
Added support for per-batch sampling params for Whisper
- PR: #3...

Assets 27

20 Feb 14:55

github-actions

Immutable

v0.67.0-dev20260220

d5bfb8d

v0.67.0-dev20260220 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22206237336

📦 Uncategorized

Add #ifdef guards to chlkc_descriptors.h
- PR: #38077
Use named CBs in matmul factories and kernels
- PR: #38074
Use regex in watcher assert test to avoid hard-coded line number
- PR: #38083
[QSR]: Fix missing comma in ncrisc_noc_fast_read template
- PR: #38095
[tt_dit] Fix reciprocal tensor reuse in DistributedLayerNorm
- PR: #37979
#37982: Overlap pre-SDPA memory regions
- PR: #37956
[apc break] Revert "Add check for proper configuration of unpacker and packer during init and block (#37265)"
- PR: #38100
[tt-train] softmax_backward kernel implementation
- PR: #31580
[skip ci] Remove dead/unused code from the repo
- PR: #38113
ci: update condition for AI assistant job execution
- PR: #38109
Fixing triage tests on p150b
- PR: #38051
[skip ci] DeepSeek prefill directory
- PR: #38118
Remove unnecessary flushed barrier between data and semaphore multicast in conv ops
- PR: #37858
Revert "[tt-train] softmax_backward kernel implementation (#31580)"
- PR: #38128
[tt-triage] Increase console width for better output formatting
- PR: #38132
Fix forward_prefill calls in Galaxy MLP prefill tests
- PR: #38059
Fix tools test: update watcher assert string to match uppercase BRISC
- PR: #38043
Added tensor dimensional stability to moe prefill gating on Mixtral
- PR: #38115
SDPA Decode Optimization: Tree Reduce
- PR: #37004
Re-enable SD 1.4 on Model perf BH pipeline
- PR: #38123
[umd] Use semver_t::from_wormhole_eth_firmware_tag
- PR: #37507
[build]: Fix #37904 — build_metal.sh fails on Fedora/RHEL
- PR: #37905
[TT-Train] Fix GCC build: qualify self-referential using declarations (#37922)
- PR: #37924
Removed previously used llk_unpack_AB_reduce_init
- PR: #37958
Optimize DeepSeekV3 weight dequantization
- PR: #38127
[skip ci] #0: add two nightly subdirectories to CODEOWNERS
- PR: #38141
Align Wan pipeline to reference
- PR: #37968
Multi-mesh Topology Mapping Utility
- PR: #36174
[Quasar DFB] Update to support running on Tensix and update tile counter assignment to respect remapper rules
- PR: #37861
#38022: add ttnn reduction tests to l2 nightly
- PR: #38135
Add a function to completely tear down metal
- PR: #34643
Allow writing to sharded memory from pinned memory
- PR: #37769
Add DeepSeekV3 B1 demo CLI script
- PR: #38023
Add watcher stack usage support for Quasar
- PR: #37966
Allow usage of freed row/col with slow dispatch
- PR: #36600
Cleanup of untilize and untilize_with_padding nd-sharded reader kernel and factories
- PR: #38161
VADv2 bug fix (ttnn.repeat crashing)
- PR: #38162
Fuse reduce to one with D2H
- PR: #38088
#36020: Add infra and tests for overlapping blitz decode weights
- PR: #38106

Assets 27

20 Feb 00:59

github-actions

Immutable

v0.67.0-dev20260219

2198847

v0.67.0-dev20260219 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22163635166

📦 Uncategorized

add wrapraround for neighbor exchange
- PR: #37648
[QSR] Enable all Neos
- PR: #37987
Remove hostname suffix for TT_METAL_CACHE in ttrun.py
- PR: #37989
Revert "[skip ci] Remove parallelism as we suspect a race condition somewhere"
- PR: #37965
Added support for per-batch sampling params for Whisper
- PR: #37699
[skip ci] Add exit logic to analyze_validation_results.py to support automation
- PR: #37683
Fixing triage tests on blackhole
- PR: #37772
Fix race in Blitz Flash MLA
- PR: #37998
#37414: Prefill optimised MLA op.
- PR: #37721
disable padding[0] check for conv3d, add test config
- PR: #31720
#37716: Fix block-sharded conv2d producing wrong results with dilation > 1
- PR: #38002
Enabling blackhole triage CI
- PR: #38005
Revert "Enabling blackhole triage CI (#38005)"
- PR: #38013
Improve precision, range, and performance of sin/cos/tan.
- PR: #37714
Clean-up topk and topk sweep tests
- PR: #37951
[GPT-OSS] Experts matmul changes
- PR: #37930
Fix SDPA TT_METAL_WATCHER issues
- PR: #37928
[tt-triage] Add aggregated callstacks script
- PR: #37501
[Quasar] Fix Quasar build: Add return statements
- PR: #38007
Internalize DeepSeek MOE/MLP op looping
- PR: #37974
[skip ci] Add BH WH differential tags to the workflows
- PR: #37986
Use_VC propagation fix version 2
- PR: #36529
Add bfloat8 kv cache update
- PR: #37685
#29206 certain model comparison mode failed for bcast op golden function
- PR: #37890
Bump ttsim version to v1.3.5
- PR: #38046
Fix TensixTestL1ToPCIeAt16BAlignedAddress race condition
- PR: #38045
Fix CB wrapping blocking writer test hang (wrong TRISC core)
- PR: #38055
Add Deepseek 16x32 fast tilize test
- PR: #37863
Fix docker image ubuntu python versions
- PR: #38017
Add check for proper configuration of unpacker and packer during init and block
- PR: #37265
Move decode warmup from vLLM to metal side
- PR: #37447
Fix dynamic noc mode support for blitz mcast
- PR: #38032
38015: move/fix/remove some eager tests
- PR: #38020
Created matmul lab 3 for universities
- PR: #37895

Assets 27

19 Feb 15:54

github-actions

Immutable

v0.66.0-rc16

077ec5a

v0.66.0-rc16 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22163672659

no changes

Assets 27

18 Feb 08:19

github-actions

Immutable

v0.67.0-dev20260218

e25b1e7

v0.67.0-dev20260218 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22121609614

📦 Uncategorized

Updates trace region size for Qwen3-32B on Galaxy to avoid running out of memory
- PR: #37933
Add Fabric multi-host test on ExaBox BH Quad
- PR: #37764
Improve model tracer infra
- PR: #37390
[gpt-oss] fix b=1 demo
- PR: #37950
Fix yaml path reading in nano_gpt and mesh shape in autograd
- PR: #37838
ci(sweeps): restrict lead-model slack notifications to scheduled main runs
- PR: #37948
jit: remove redundand unpack bfp format conversion
- PR: #37606
Fix buffer not sharded error in ring matmul 1d unit tests
- PR: #37945
Deepseek: Optimized OP for MoE Gate
- PR: #37446
SDPA reduce to all positional logic
- PR: #37888
Fused rmsnorm allow fp32 stat and rope inputs
- PR: #37859
Fabric Fused Scatter Write + Atomic Increment Messaging
- PR: #37751
Add deepseek decode layer test into galaxy-quick
- PR: #37788
Add DeepSeek V3 B1 demo host interface integration tests
- PR: #37952
[tt_dit] Reduce module cache data size
- PR: #37777
Pipe compute config to reduce scatter
- PR: #37748
Adding ND Sharding Support for the Untilize With Unpadding Op
- PR: #37156
[skip ci] Run test_host_io.py on viommu runners only
- PR: #37967
Add argmax based k=1 sampling micro-op to be used in the fused LM head + sampling layer
- PR: #37889
added demo profiling script and device perf utils
- PR: #37413
[skip ci] Rename workflow and update repository references
- PR: #37283
Increased core count for paged SDPA for Qwen
- PR: #37872
Add GitHub merge queue data workflow
- PR: #37357
Updating slice_write tests to use the ttnn.experimental module
- PR: #37971
use nested skus for deepseek perf test [skip ci]
- PR: #37980
Simplified the way to select a program factory
- PR: #37661
Consolidate JIT-generated descriptor headers
- PR: #37818

Assets 27

17 Feb 03:40

github-actions

Immutable

v0.67.0-dev20260217

3d4d450

v0.67.0-dev20260217 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22081829437

📦 Uncategorized

[Blitz Decode] Integrate Embedding with H2D
- PR: #37913
#0: Fix noc_async_write_multicast to pass noc when using one packet version
- PR: #37918
Full flash mla for blitz
- PR: #37867
Implement FMOD as LLK op
- PR: #37050
[gpt-oss] batched prefill and prefill tracing
- PR: #37848
[WATCHER]: Fix reader runtime args for idle cores in SDPA decode
- PR: #37698
Fix deepseek test_moe device_params ordering for cache paths
- PR: #37926
[UMD Bump] Automated UMD Bump 09.02.2026
- PR: #37377
Reduce DeepSeek long-seq decoder override to 12288
- PR: #37834
DeepSeekV3 teacher forcing: KV cache + improved refpt generation
- PR: #37538
fix galaxy quick tests
- PR: #37935
latency packet index ack move to back
- PR: #37806
Add fused minimal matmul addcmul operation
- PR: #36502
Update micro op kernels to not use full inits, and use reconfigures + short inits
- PR: #37937

Assets 27

16 Feb 03:28

github-actions

Immutable

v0.67.0-dev20260216

ecc3ca4

v0.67.0-dev20260216 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22046210687

📦 Uncategorized

Fix broken import in test_deepseek_mla_ops.py after SDPA test migration
- PR: #37919
Add tt_symbiote: PyTorch-to-TTNN transparent acceleration framework
- PR: #35699

Assets 27

15 Feb 03:26

github-actions

Immutable

v0.67.0-dev20260215

53f7c88

v0.67.0-dev20260215 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/22026945186

📦 Uncategorized

fix(sweep): correct lead-models Slack notifier's run context, counts, and alerting
- PR: #37864
Propagating new unpack LLK for reduce ops
- PR: #37220
#37471: Output dtype parameter - fix for fp32 dst mode conflict
- PR: #37612
Add indexes to TTNN report db
- PR: #36629
DeepSeek Blitz MLP fusion
- PR: #37860
[skip ci] Move conv test to run last in upstream didt suite
- PR: #37875
Delete Event as it is unused code
- PR: #37766
Kwerblinski tt/37656 blitz lm head
- PR: #37761
fix processor names in watcher tests
- PR: #37892
Migrate experimental operations to use bind_function template and free functions
- PR: #37815
Reorder device params to fix deepseek tests cache paths
- PR: #37894
Split initialization of various components into their own classes
- PR: #37453
Add CQ_PREFETCH_CMD_RELAY_LINEAR_PACKED_H command
- PR: #37598
H<->D Ops for Blitz + Changes to support Async Slow Dispatch
- PR: #37705
Migrate pool and adaptive pool operations to free function style
- PR: #37810
Halo Check Output Grid Matches Input Grid
- PR: #37667
Expose tile dim reconfig template flag in metal
- PR: #37568
TT-triage device and core hardening
- PR: #37684
Improve venv relocatability for distributed and tt-run env inherit
- PR: #36282
#37896: Fix silu_init for BH
- PR: #37897

Assets 27

Releases: tenstorrent/tt-metal

v0.68.0-dev20260222

📦 Uncategorized

Uh oh!

v0.68.0-dev20260221

📦 Uncategorized

Uh oh!

v0.67.0-rc1

📦 Uncategorized

Fix moreh kernel runtime arg bounds issues (#37193, #37040)

Uh oh!

v0.67.0-dev20260220

📦 Uncategorized

Uh oh!

v0.67.0-dev20260219

📦 Uncategorized

Uh oh!

v0.66.0-rc16

Uh oh!

v0.67.0-dev20260218

📦 Uncategorized

Uh oh!

v0.67.0-dev20260217

📦 Uncategorized

Uh oh!

v0.67.0-dev20260216

📦 Uncategorized

Uh oh!

v0.67.0-dev20260215

📦 Uncategorized

Uh oh!