[FEA] Slow unbucketize permute operation in SequenceEmbeddingsAllToAll for row-wise sharding

## Background

Currently, DynamicEmb has custom `input_dist` implementation (`RwSparseFeaturesDist` in `input_dist.py`) but still relies on TorchRec's original `output_dist` implementation. This causes:

1. **Performance issue**: The `unbucketize_permute` operation in TorchRec's output distribution is slow, especially for non-contiguous distribution patterns (e.g., round-robin)
2. **Limited customization**: Cannot optimize the output distribution without modifying TorchRec source code

## Objective

Port TorchRec's output distribution classes to DynamicEmb library, enabling future performance optimizations.

## Tasks

### PR 1: Port output distribution classes to DynamicEmb
- [x] Create `dynamicemb/output_dist.py` with:
  - `RwSequenceEmbeddingDist`
  - `RwPooledEmbeddingDist`
- [x] Update `dynamicemb/planner/rw_sharding.py` to override `create_output_dist()` methods
- [x] Verify with existing tests (`test_sequence_embedding_fw.py`, `test_pooled_embedding_fw.py`)

---

### PR 2: Optimize unbucketize permute with custom kernel
- [ ] Design optimized data format for permute tensor
- [ ] Implement CUDA kernel for efficient unbucketize operation
- [ ] Integrate with `output_dist.py`
- [ ] Benchmark and validate performance improvement




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Slow unbucketize permute operation in SequenceEmbeddingsAllToAll for row-wise sharding #296

Background

Objective

Tasks

PR 1: Port output distribution classes to DynamicEmb

PR 2: Optimize unbucketize permute with custom kernel

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Slow unbucketize permute operation in SequenceEmbeddingsAllToAll for row-wise sharding #296

Description

Background

Objective

Tasks

PR 1: Port output distribution classes to DynamicEmb

PR 2: Optimize unbucketize permute with custom kernel

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions