Skip to content

Running python tests at the same time as batch can cause calculation errors #29

@pechersky

Description

@pechersky

Built with CUDA 13, CCCL 3.0, rdkit (pip installed) 2025.9.1, built at tag v0.2.0

Compare successes when on their own, to the combined failure below

$ python3.12 -m pytest --pyargs /app/nvMolKit/nvmolkit/tests -k test_memory_constrained
================================================================================================================ test session starts =================================================================================================================
platform linux -- Python 3.12.10, pytest-8.4.2, pluggy-1.6.0
rootdir: /app/nvMolKit
configfile: pyproject.toml
collected 111 items / 101 deselected / 10 selected                                                                                                                                                                                                   

../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_tanimoto_self PASSED                                                                                                                                                [ 10%]
../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_tanimoto_cross[nxmdims0] PASSED                                                                                                                                     [ 20%]
../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_tanimoto_cross[nxmdims1] PASSED                                                                                                                                     [ 30%]
../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_tanimoto_cross[nxmdims2] PASSED                                                                                                                                     [ 40%]
../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_cosine_self PASSED                                                                                                                                                  [ 50%]
../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_cosine_cross[nxmdims0] PASSED                                                                                                                                       [ 60%]
../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_cosine_cross[nxmdims1] PASSED                                                                                                                                       [ 70%]
../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_cosine_cross[nxmdims2] PASSED                                                                                                                                       [ 80%]
../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_segmented_path_large_cross[tanimoto] SKIPPED (Insufficient CPU/GPU memory delta to force segmented path)                                                            [ 90%]
../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_segmented_path_large_cross[cosine] SKIPPED (Insufficient CPU/GPU memory delta to force segmented path)                                                              [100%]

==================================================================================================== 8 passed, 2 skipped, 101 deselected in 3.18s ====================================================================================================
$ python3.12 -m pytest --pyargs /app/nvMolKit/nvmolkit/tests -k "mmff and batch"
================================================================================================================ test session starts =================================================================================================================
platform linux -- Python 3.12.10, pytest-8.4.2, pluggy-1.6.0
rootdir: /app/nvMolKit
configfile: pyproject.toml
collected 111 items / 93 deselected / 18 selected                                                                                                                                                                                                    

../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-0-gpu_ids0] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [  5%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-0-gpu_ids1] PASSED                                                                                                                           [ 11%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-0-gpu_ids2] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 16%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-2-gpu_ids0] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 22%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-2-gpu_ids1] PASSED                                                                                                                           [ 27%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-2-gpu_ids2] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 33%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-5-gpu_ids0] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 38%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-5-gpu_ids1] PASSED                                                                                                                           [ 44%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-5-gpu_ids2] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 50%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-0-gpu_ids0] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 55%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-0-gpu_ids1] PASSED                                                                                                                           [ 61%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-0-gpu_ids2] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 66%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-2-gpu_ids0] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 72%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-2-gpu_ids1] PASSED                                                                                                                           [ 77%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-2-gpu_ids2] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 83%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-5-gpu_ids0] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [ 88%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-5-gpu_ids1] PASSED                                                                                                                           [ 94%]
../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-5-gpu_ids2] SKIPPED (Test requires at least 2 GPUs for batch mode comparison)                                                                [100%]

==================================================================================================== 6 passed, 12 skipped, 93 deselected in 3.46s ====================================================================================================

Failures:

$ python3.12 -m pytest --pyargs /app/nvMolKit/nvmolkit/tests -k "not async"
[...]
________________________________________________________________________________________________________ test_memory_constrained_cosine_self _________________________________________________________________________________________________________

size_limited_mols = [<rdkit.Chem.rdchem.Mol object at 0x7f6cf41492a0>, <rdkit.Chem.rdchem.Mol object at 0x7f6cf4149310>, <rdkit.Chem.rdche...7f6cf41494d0>, <rdkit.Chem.rdchem.Mol object at 0x7f6cf41493f0>, <rdkit.Chem.rdchem.Mol object at 0x7f6cf41495b0>, ...]

    def test_memory_constrained_cosine_self(size_limited_mols):
        fpgen = rdFingerprintGenerator.GetMorganGenerator(radius=3, fpSize=1024)
        nvmolkit_fpgen = MorganFingerprintGenerator(radius=3, fpSize=1024)
    
        fps = [fpgen.GetFingerprint(mol) for mol in size_limited_mols]
        ref = torch.empty(len(fps), len(fps), dtype=torch.float64)
        for i in range(len(fps)):
            ref[i] = torch.tensor(BulkCosineSimilarity(fps[i], fps))
    
        nvmolkit_fps_cu = nvmolkit_fpgen.GetFingerprints(size_limited_mols, num_threads=1)
        nvmolkit_fps_torch = nvmolkit_fps_cu.torch()
    
        got = crossCosineSimilarityMemoryConstrained(nvmolkit_fps_torch)
>       np.testing.assert_allclose(got, ref.cpu().numpy(), rtol=1e-5, atol=1e-5)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-05, atol=1e-05
E       
E       Mismatched elements: 4890 / 7921 (61.7%)
E       Max absolute difference among violations: 1.
E       Max relative difference among violations: 1.
E        ACTUAL: array([[1.      , 0.171802, 0.100815, ..., 0.124065, 0.173676, 0.201357],
E              [0.171802, 1.      , 0.177822, ..., 0.328244, 0.134022, 0.18313 ],
E              [0.100815, 0.177822, 1.      , ..., 0.220707, 0.168526, 0.195387],...
E        DESIRED: array([[1.      , 0.171802, 0.100815, ..., 0.124065, 0.173676, 0.201357],
E              [0.171802, 1.      , 0.177822, ..., 0.328244, 0.134022, 0.18313 ],
E              [0.100815, 0.177822, 1.      , ..., 0.220707, 0.168526, 0.195387],...

/app/nvMolKit/nvmolkit/tests/test_similarity.py:247: AssertionError
============================================================================================================== short test summary info ===============================================================================================================
FAILED ../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_serial_vs_rdkit - AssertionError: Molecule 3, Conformer 0: energy mismatch: RDKit=-207.435992, nvMolKit=-206.847232, abs_diff=0.588760, rel_error=0.002838
FAILED ../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-0-gpu_ids1] - AssertionError: Molecule 0, Conformer 0: energy mismatch: RDKit=26.874311, nvMolKit=125669.641792, abs_diff=125642.767481, rel_error=4675.199514
FAILED ../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-2-gpu_ids1] - AssertionError: Molecule 2, Conformer 0: energy mismatch: RDKit=-18.732622, nvMolKit=89871.538000, abs_diff=89890.270622, rel_error=4798.595123
FAILED ../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[1-5-gpu_ids1] - AssertionError: Molecule 0, Conformer 0: energy mismatch: RDKit=26.874311, nvMolKit=26.939253, abs_diff=0.064942, rel_error=0.002416
FAILED ../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-0-gpu_ids1] - AssertionError: Molecule 0, Conformer 0: energy mismatch: RDKit=26.874311, nvMolKit=125669.641792, abs_diff=125642.767481, rel_error=4675.199514
FAILED ../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_batch_vs_rdkit[3-5-gpu_ids1] - AssertionError: Molecule 2, Conformer 0: energy mismatch: RDKit=-18.732622, nvMolKit=-17.874977, abs_diff=0.857646, rel_error=0.045784
FAILED ../app/nvMolKit/nvmolkit/tests/test_mmff_optimization.py::test_mmff_optimization_allows_large_molecule_interleaved - AssertionError: Molecule 0, Conformer 0: energy mismatch: RDKit=-2.627330, nvMolKit=1.910659, abs_diff=4.537989, rel_error=1.727224
FAILED ../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_tanimoto_self - AssertionError: 
FAILED ../app/nvMolKit/nvmolkit/tests/test_similarity.py::test_memory_constrained_cosine_self - AssertionError: 
========================================================================================= 9 failed, 73 passed, 28 skipped, 1 deselected in 88.10s (0:01:28) ==========================================================================================

nvidia-smi:

$ nvidia-smi
Tue Oct 21 03:25:06 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.02              Driver Version: 581.42         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A4500 Laptop GPU    On  |   00000000:01:00.0 Off |                  Off |
| N/A   50C    P0             33W /   91W |       0MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions