Added more species support to convert the ensembl to symbol#738
Merged
Starlitnightly merged 3 commits intoaristoteleo:masterfrom Dec 6, 2025
Merged
Added more species support to convert the ensembl to symbol#738Starlitnightly merged 3 commits intoaristoteleo:masterfrom
Starlitnightly merged 3 commits intoaristoteleo:masterfrom
Conversation
- Expanded the `_infer_species_and_release` function to support additional species and their corresponding Ensembl ID prefixes, improving accuracy in species detection. - Refactored the `convert2gene_symbol` function to include a more comprehensive species prefix validation, ensuring correct mapping and handling of overlapping prefixes. - Updated documentation to reflect new species support and clarify the behavior of the functions.
- Added a try-except block to check for the presence of the Polars library and automatically install it if not found, enhancing user experience. - Included logging to inform users about the installation process and potential errors, ensuring clarity in case of installation failures.
- Changed the error handling in the Polars installation process to print a message instead of raising an ImportError, providing clearer feedback to users about installation issues. - Set the `polars` variable to None if the installation fails, ensuring that the code can handle the absence of the library gracefully.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #738 +/- ##
=======================================
Coverage 27.08% 27.08%
=======================================
Files 324 324
Lines 49452 49472 +20
=======================================
+ Hits 13392 13401 +9
- Misses 36060 36071 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


This pull request enhances species detection and gene symbol conversion utilities in the preprocessing pipeline, with a focus on supporting more species and improving robustness. It also adds an automatic installation step for the
polarslibrary in the GTF parser. The most important changes are grouped below:Species Detection and Mapping Improvements
_infer_species_and_releaseand the gene symbol conversion function, now covering human, mouse, rat, zebrafish, fly, chicken, dog, pig, cow, and macaque, with improved Ensembl ID prefix matching logic to avoid misclassification due to overlapping prefixes. [1] [2]convert2gene_symbolto reflect the expanded species support and clarify behavior for unknown or missing gene symbols.Gene Symbol Conversion Robustness
Default Behavior Adjustments
Dependency Management
polarslibrary if not present when importing the GTF parser, improving ease of use for new users.