Skip to content

Conversation

@Logiquo
Copy link
Collaborator

@Logiquo Logiquo commented Feb 4, 2026

Contributor: Yongda Fan (yongdaf2@illinois.edu)

Contribution Type: Interpretability

Description
The old implementation suffers from

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 5 has a total capacity of 47.53 GiB of which 17.06 MiB is free. Including non-PyTorch memory, this process has 47.51 GiB memory in use. Of the allocated memory 47.20 GiB is allocated by PyTorch, and 2.27 MiB is reserved by PyTorch but unallocated. 

The new code accumulate tensor in cpu while maintain most of the compute on gpu.

@Logiquo Logiquo requested a review from jhnwu3 February 4, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant