-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Specific objective: Running AF3 on large complexes (>5,120 tokens).
Specific issue: Flags previously used to enable unified memory do not seem to work on most recent version of AF3.
Have I ever gotten AF3 to work? Yes! I've gotten a previous version of AF3 before the tokamax changes on Novermber 29, 2025 to run no problem.
Details: I tried building a Docker container with the most recent AF3 modifications (pulled from most recent update 5 days ago). The new version will not run large complexes. I have pulled the most recent version and rebuilt my Docker container to make sure that bugs fixed recently are not causing the issue. I suspect that the flags previously recommended are not working for some reason. In my docker container I set
ENV XLA_PYTHON_CLIENT_PREALLOCATE=false
ENV TF_FORCE_UNIFIED_MEMORY=true
ENV XLA_CLIENT_MEM_FRACTION=3.2
as is recommended in the documentation. I am still getting OOM issues. These flags were sufficient to get the previous version to work when predicting large complexes without issues.
Hardware specifications
GPU: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (96GB)
CUDA Driver Version: 580.105.08
Local CUDA Version: 13.0
Note: These previous specifications were not a problem when running the older version of AF3.