-
Notifications
You must be signed in to change notification settings - Fork 711
Pull requests: pytorch/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Implement cached member_id upper bound search
cla signed
module: rocm
#5365
opened Feb 3, 2026 by
avbokovoy
Loading…
Fix embedding lookup bug for embedding_dim > 1024
cla signed
#5364
opened Feb 2, 2026 by
Jitterx69
Loading…
Implement asynchronous LDS loads for MI350
cla signed
#5348
opened Jan 26, 2026 by
avbokovoy
Loading…
Port fbgemm CPU warnings to GPU targets and fix warnings (#5303)
cla signed
fb-exported
meta-exported
module: rocm
#5342
opened Jan 23, 2026 by
q10
Loading…
Fix unused exception parameter compiler errors
cla signed
fb-exported
meta-exported
#5337
opened Jan 22, 2026 by
fandan-nyc
Loading…
Enable specs and feature map on TBEParamsReporter frontend
cla signed
fb-exported
meta-exported
#5335
opened Jan 22, 2026 by
spcyppt
Loading…
Fix missing intraining_embedding_pruning_cpu in Cmake
cla signed
fb-exported
meta-exported
#5330
opened Jan 21, 2026 by
spcyppt
Loading…
Fix missing CPU source in OSS CMake build
cla signed
fb-exported
meta-exported
#5327
opened Jan 20, 2026 by
gchalump
Loading…
Fix for T250930299 ("Your diff, D90268850, broke some tests")
cla signed
fb-exported
meta-exported
#5306
opened Jan 12, 2026 by
q10
Loading…
[fbgemm_gpu] Fix triton imports for CPU-only mode
cla signed
fb-exported
meta-exported
#5295
opened Jan 7, 2026 by
q10
Loading…
handle inf value scaling
cla signed
fb-exported
meta-exported
#5285
opened Jan 5, 2026 by
DatHuynh
Loading…
Adding support for bias addition + rescaling with token weights to grouped_gemm
cla signed
fb-exported
meta-exported
#5280
opened Dec 29, 2025 by
metastableB
Loading…
Refactor cumem_utils CPU library to cut GPU dependencies
cla signed
fb-exported
meta-exported
#5279
opened Dec 29, 2025 by
crypt3lx2k
Loading…
Add tidy fixes (#5268)
cla signed
fb-exported
meta-exported
#5277
opened Dec 24, 2025 by
q10
Loading…
Tune max segment length per cta in triton table batched embeddings, and expose the param via cli
cla signed
fb-exported
meta-exported
#5270
opened Dec 22, 2025 by
OmarPavel
Loading…
Refactor TBE benchmark reporter to use structured data config
cla signed
fb-exported
meta-exported
#5260
opened Dec 18, 2025 by
gchalump
Loading…
Fix blackwell CUTLASS attention meta registration + actually test compile
cla signed
fb-exported
meta-exported
#5259
opened Dec 18, 2025 by
jbschlosser
Loading…
Optimize benchmark index generation with std::sample()
cla signed
fb-exported
meta-exported
#5254
opened Dec 17, 2025 by
terdogan
Loading…
Remove unused dedup_map and associated includes from benchmarks
cla signed
fb-exported
meta-exported
#5253
opened Dec 17, 2025 by
terdogan
Loading…
Move the prefetched info to preallocated buffers
cla signed
fb-exported
meta-exported
#5251
opened Dec 17, 2025 by
chouxi
Loading…
Add aarch64 intrinsic-based dequantization to autovec routine
cla signed
fb-exported
meta-exported
#5249
opened Dec 17, 2025 by
Nicoshev
Loading…
Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions
cla signed
module: rocm
#5233
opened Dec 16, 2025 by
aryaman-gupta
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.