Release v1.0.0 · alibaba/TorchEasyRec

Major Features and Improvements

Train/Eval/Predict/Export

Support training with dynamic batch size by sample cost in #343
Support logging train metrics in #310
Support predicting checkpoint in #320 #322 #324
[EXPERIMENTAL] Support exporting with AOTInductor in #239 #274
Support exporting with TensorRT in #318
Support exporting the best model in #294
Support exporting to RTP in #298 #307 #329 #332 #339
Support AdamW optimizer and label smoothing in #297
Support setting an optimizer for a subset of parameters in #297
Support PanguDFS in #311 #348 #349 #350

Embedding

Support dynamic embedding in #279 #281 #283 #286 #289 #316
Support initialize dynamic embeddings from tables in #282 #288
MLPEmbedding support feature value_dim > 1 in #331

Model

Optimize and refactor DlrmHSTU preprocessor to support MTGR style preprocessing in #290 #296 #300 #314
Decouple contextual feature dimension from sequence id embedding dimension in DlrmHSTU in #302
DlrmHSTU support uih and contextual share embedding in #337
DlrmHSTU support global average loss option in #334
Add TMA support for hstu attn in #336
Optimize gpu memory usage of GAUC metric in #312

Feature

Support kv dot product feature in #276
Support bool mask feature in #285
Support farm hash in #295

Upgrade

Upgrade pytorch to v2.9 and torchrec to v1.4.0 in #345

Note

For TorchEasyRec 1.0.x, you should use Docker image version 1.0.

For the GPU version (CUDA 12.6):
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:1.0-cu126
- PyTorch: v2.9 CUDA: v12.6 FBGEMM: v1.4.0 TorchRec: v1.4.0 Python: v3.11
- We drop support for the 470 GPU driver version. If you still want to use the 470 GPU driver version, you can set LD_LIBRARY_PATH=/usr/local/cuda-12.6/compat
For the CPU version:
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:1.0-cpu
- PyTorch: v2.9 FBGEMM: v1.4.0 TorchRec: v1.4.0 Python: v3.11

Bug Fixes and Other Changes

[feat] make bash as default shell by @tiankongdeguiji in #273
[feat] add benchmark odps quota and skip trt test when trt not avaiable by @tiankongdeguiji in #278
[feat] add rdma addons into dockerfile by @tiankongdeguiji in #280
[feat] clean up fg_encoded in docs by @tiankongdeguiji in #287
support create tzrec config based on pyfg json by @chengaofei in #284
[bugfix] fix finetune checkpoint path runtime error print when path not exist by @tiankongdeguiji in #291
[feat] refactor export model by @tiankongdeguiji in #293
fix sequence raw feature pyfg sub_type not effective by @chengaofei in #292
[feat] optimize hstu triton op warning by @tiankongdeguiji in #301
[bugfix] improve create init ckpt for dynamic embedding when certain id_feature in the config lack embedding_dim by @tiankongdeguiji in #303
[feat] bump up tzrec version to 0.9.7 by @tiankongdeguiji in #305
[bugfix] fix dlrm hstu gauc and l2 loss support by @tiankongdeguiji in #306
[bugfix] fix content encoder with additional_content_features and target_enrich_features by @tiankongdeguiji in #304
[bugfix] fix create dynamic embedding ckpt when raw feature in config by @tiankongdeguiji in #308
Add evaluation metrics documentation by @yanzhen1233 in #309
[bugfix] fix fsspec ci test by @tiankongdeguiji in #317
add train_metric docs by @chengaofei in #315
Update custom development model documentation by @yanzhen1233 in #313
[feat] Adapt integration test config to local cuda device count by @eric-gecheng in #319
[bugfix] fix dlrm hstu preprocessor doc by @tiankongdeguiji in #321
[feat] add dlrm hstu demo data by @tiankongdeguiji in #326
[bugfix] avoid jit convert error when using large number by @eric-gecheng in #328
[feat] improve prune_unused_param_and_buffer when export model by @tiankongdeguiji in #327
[feat] refactor ops directory to fix import triton error by @tiankongdeguiji in #335
[feat]add assert to avoid using ckpt predict for two tower models by @eric-gecheng in #333
[feat] upgrade 2025 dingtalk qrcode by @tiankongdeguiji in #340
[bugfix] always lazy init predict checkpoint writer by @tiankongdeguiji in #341
Feature/dynamic routing support zero init by @eric-gecheng in #342
[bugfix] fix parse batch empty MapArray error in NegativeSampler by @tiankongdeguiji in #344
[feat] add doc for dynamic batch by @tiankongdeguiji in #346
[bugfix] fix array type in feature doc by @tiankongdeguiji in #347

Full Changelog: v0.9.0...v1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Major Features and Improvements

Train/Eval/Predict/Export

Embedding

Model

Feature

Upgrade

Note

Bug Fixes and Other Changes

Contributors

Uh oh!