Fixed 1M cells's error in `cell_velocity` by Starlitnightly · Pull Request #737 · aristoteleo/dynamo-release

Starlitnightly · 2025-12-04T19:26:46Z

This pull request introduces a new external dependency, gtfparse and pyemsembl, into the dynamo/external directory. It adds a complete implementation for parsing GTF (Gene Transfer Format) files and avoids the installation of mygene, including attribute expansion, missing feature construction, robust error handling, and support for both Polars and Pandas DataFrames. The changes are grouped into the addition of new modules for GTF parsing functionality and the integration of these modules via an updated __init__.py.

New GTF parsing functionality:

Added read_gtf.py, which implements the main GTF parsing logic, including attribute expansion, flexible column handling, support for both Polars and Pandas DataFrames, and biotype inference. It also defines the required columns and default data types for GTF files.
Added attribute_parsing.py, providing the expand_attribute_strings function for parsing and expanding the GTF attribute column into separate columns.
Added create_missing_features.py, which allows for the construction of missing features (e.g., genes or transcripts) from available annotations in cases where they are absent in the GTF file.
Added parsing_error.py, defining a custom ParsingError exception for robust error handling during parsing.

Integration and module setup:

Updated __init__.py to expose all major functions and classes from the new modules, establish the module version, and define the public API for gtfparse.

Documentation update:

Updated the docs/tutorials/notebooks subproject commit, likely to reflect the new or updated tutorials related to GTF parsing.This pull request includes updates across several files to improve functionality, fix potential issues, and prepare for a new release. The most significant changes are an improved neighbor index calculation, a version bump for the upcoming release candidate, and minor formatting and submodule updates.

Core functionality improvements:

Improved handling of neighbor indices in get_neighbor_indices within dynamo/tools/utils.py: Now uses NumPy arrays for index management and more robustly handles NaN values when appending new neighbors, reducing the risk of errors during neighbor calculations.

Release and dependency updates:

Updated package version in setup.py from v1.4.3 to v1.4.4rc1 to mark a new release candidate.
Updated the submodule reference in docs/tutorials/notebooks to a newer commit, ensuring documentation is up to date.

Code style and formatting:

Reformatted the convert2gene_symbol function signature in dynamo/preprocessing/utils.py for improved readability and consistency.

…res for improved readability in utils.py

codecov · 2025-12-04T19:40:17Z

Codecov Report

❌ Patch coverage is 0.39331% with 2026 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.08%. Comparing base (4b9a620) to head (d1f5ee6).
⚠️ Report is 14 commits behind head on master.

Files with missing lines	Patch %	Lines
dynamo/external/pyensembl/genome.py	0.00%	354 Missing ⚠️
dynamo/external/pyensembl/database.py	0.00%	215 Missing ⚠️
dynamo/external/pyensembl/serializable.py	0.00%	191 Missing ⚠️
dynamo/external/pyensembl/transcript.py	0.00%	186 Missing ⚠️
dynamo/external/pyensembl/download_cache.py	0.00%	125 Missing ⚠️
dynamo/external/pyensembl/locus.py	0.00%	93 Missing ⚠️
dynamo/external/pyensembl/species.py	0.00%	93 Missing ⚠️
dynamo/external/gtfparse/read_gtf.py	0.00%	90 Missing ⚠️
dynamo/external/pyensembl/shell.py	0.00%	83 Missing ⚠️
dynamo/external/pyensembl/sequence_data.py	0.00%	77 Missing ⚠️
... and 20 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #737      +/-   ##
==========================================
- Coverage   28.24%   27.08%   -1.17%     
==========================================
  Files         297      324      +27     
  Lines       47431    49452    +2021     
==========================================
- Hits        13397    13392       -5     
- Misses      34034    36060    +2026

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Implemented a validation step to ensure the number of PCA components matches the count of genes marked for PCA usage in adata.var. - Added a descriptive error message to guide users in resolving dimension mismatches, enhancing robustness of the perturbation function.

- Improved smart quote removal in `expand_attribute_strings` to handle both single and double quotes for better compatibility with various GTF sources. - Added checks in `read_gtf` to only process existing columns in the DataFrame, with warnings for missing columns, enhancing robustness. - Converted categorical columns to object dtype before applying converters to prevent issues with shared categories in Polars. - Updated `convert2gene_symbol` to utilize `pyensembl` for gene ID conversion, supporting auto-detection of species and release selection.

Starlitnightly added 2 commits November 20, 2025 02:17

Update subproject reference in notebooks to latest commit 8b2958e7

f7e2535

Update version in setup.py to v1.4.4rc1 and refactor function signatu…

e1503ce

…res for improved readability in utils.py

Starlitnightly added 3 commits December 4, 2025 17:45

Add pyemsembl to external replace mygene

6ec5925

Starlitnightly merged commit 641f7f3 into aristoteleo:master Dec 5, 2025
7 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed 1M cells's error in `cell_velocity`#737

Fixed 1M cells's error in `cell_velocity`#737
Starlitnightly merged 5 commits intoaristoteleo:masterfrom
Starlitnightly:master

Starlitnightly commented Dec 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Starlitnightly commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Starlitnightly commented Dec 4, 2025 •

edited

Loading

codecov bot commented Dec 4, 2025 •

edited

Loading