Skip to content

Comments

Refactor genomics.py: lazy-load optional dependencies#276

Open
andrewsu wants to merge 1 commit intosnap-stanford:mainfrom
andrewsu:fix/lazy-import-optional-dependencies-clean
Open

Refactor genomics.py: lazy-load optional dependencies#276
andrewsu wants to merge 1 commit intosnap-stanford:mainfrom
andrewsu:fix/lazy-import-optional-dependencies-clean

Conversation

@andrewsu
Copy link
Contributor

Summary

This PR refactors biomni/tool/genomics.py to use lazy imports for optional/specialized dependencies, preventing import errors when these packages are not installed.

Problem

Currently, genomics.py imports several optional dependencies at module-level:

  • esm (fair-esm) - only used by 1 function
  • gseapy - only used by 1 function
  • pybiomart.Dataset - only used by 1 function
  • tqdm - only used by 1 function

When any of these packages are missing, the entire module fails to load, blocking unrelated functionality. For example, ARCHS4 queries would fail with "No module named 'esm'" even though ARCHS4 doesn't use ESM.

Solution

Move these imports from module-level to function-level (lazy imports), so they're only loaded when the specific functions that need them are called.

Changes

  • Move import esm into generate_gene_embeddings_with_ESM_models()
  • Move import gseapy into get_gene_set_enrichment_analysis_supported_database_list()
  • Move from pybiomart import Dataset into interspecies_gene_conversion()
  • Move from tqdm import tqdm into generate_gene_embeddings_with_ESM_models()

Benefits

  1. Better modularity - Optional dependencies only loaded when needed
  2. Faster module loading - Fewer upfront imports
  3. Improved resilience - Missing optional packages don't block core functionality
  4. Fixes ARCHS4 issue - get_rna_seq_archs4() now works without ESM installed

Testing

  • Core functions like get_rna_seq_archs4() now work without optional dependencies
  • Functions using lazy-imported packages should work identically when those packages are installed
  • No behavioral changes, only import timing changes

Related

This change is particularly useful for minimal environment setups (e.g., using environment.yml instead of full setup.sh with all bioinformatics tools).

🤖 Generated with Claude Code

Move optional/specialized imports (esm, gseapy, pybiomart, tqdm) from
module-level to function-level to prevent import errors when these
packages are not installed.

This allows core genomics functions (e.g., get_rna_seq_archs4) to work
without requiring all optional dependencies to be installed.

Changes:
- Move 'import esm' into generate_gene_embeddings_with_ESM_models()
- Move 'import gseapy' into get_gene_set_enrichment_analysis_supported_database_list()
- Move 'from pybiomart import Dataset' into interspecies_gene_conversion()
- Move 'from tqdm import tqdm' into generate_gene_embeddings_with_ESM_models()

Benefits:
- Faster module loading (fewer upfront imports)
- Better modularity (dependencies only loaded when needed)
- Prevents missing optional dependencies from blocking unrelated functions
- Fixes issue where ARCHS4 queries failed due to missing ESM package

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant