Skip to content

Added more species support to convert the ensembl to symbol#738

Merged
Starlitnightly merged 3 commits intoaristoteleo:masterfrom
Starlitnightly:master
Dec 6, 2025
Merged

Added more species support to convert the ensembl to symbol#738
Starlitnightly merged 3 commits intoaristoteleo:masterfrom
Starlitnightly:master

Conversation

@Starlitnightly
Copy link
Collaborator

This pull request enhances species detection and gene symbol conversion utilities in the preprocessing pipeline, with a focus on supporting more species and improving robustness. It also adds an automatic installation step for the polars library in the GTF parser. The most important changes are grouped below:

Species Detection and Mapping Improvements

  • Expanded the supported species in _infer_species_and_release and the gene symbol conversion function, now covering human, mouse, rat, zebrafish, fly, chicken, dog, pig, cow, and macaque, with improved Ensembl ID prefix matching logic to avoid misclassification due to overlapping prefixes. [1] [2]
  • Updated docstrings and documentation in convert2gene_symbol to reflect the expanded species support and clarify behavior for unknown or missing gene symbols.

Gene Symbol Conversion Robustness

  • Improved species prefix validation in gene symbol conversion, including a helper function to handle overlapping prefixes and ensure only valid IDs for the detected species are queried.
  • Extended the test gene IDs for database validation to include all supported species, improving reliability of the conversion process.

Default Behavior Adjustments

  • Changed the default Ensembl release version from 109 to 77 for database initialization, likely to improve compatibility or stability.

Dependency Management

  • Added logic to automatically install the polars library if not present when importing the GTF parser, improving ease of use for new users.

- Expanded the `_infer_species_and_release` function to support additional species and their corresponding Ensembl ID prefixes, improving accuracy in species detection.
- Refactored the `convert2gene_symbol` function to include a more comprehensive species prefix validation, ensuring correct mapping and handling of overlapping prefixes.
- Updated documentation to reflect new species support and clarify the behavior of the functions.
- Added a try-except block to check for the presence of the Polars library and automatically install it if not found, enhancing user experience.
- Included logging to inform users about the installation process and potential errors, ensuring clarity in case of installation failures.
- Changed the error handling in the Polars installation process to print a message instead of raising an ImportError, providing clearer feedback to users about installation issues.
- Set the `polars` variable to None if the installation fails, ensuring that the code can handle the absence of the library gracefully.
@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 0% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.08%. Comparing base (d1f5ee6) to head (f7cc739).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
dynamo/preprocessing/utils.py 0.00% 18 Missing ⚠️
dynamo/external/gtfparse/read_gtf.py 0.00% 14 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #738   +/-   ##
=======================================
  Coverage   27.08%   27.08%           
=======================================
  Files         324      324           
  Lines       49452    49472   +20     
=======================================
+ Hits        13392    13401    +9     
- Misses      36060    36071   +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Starlitnightly
Copy link
Collaborator Author

dyn.preprocessing.convert2gene_symbol(
    ['ENSG00000167286'],#ensembl_release=109,
)
image image

@Starlitnightly Starlitnightly merged commit ce0e5fe into aristoteleo:master Dec 6, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant