Skip to content

Conversation

@kusurin
Copy link

@kusurin kusurin commented Feb 8, 2026

Summary

  • Align CSV/TSV delimiter selection across datatable and pandas paths.
  • Move duplicate checks after transpose to validate correct dimensions (genes/cells).
  • Add early validation for empty inputs.

Motivation

  • Previously, CSV files were parsed with different delimiters depending on whether datatable was installed (\t vs ,), leading to inconsistent behavior.
  • When the delimiter was wrong, parsing often didn’t fail immediately; errors raised later during downstream processing, making debugging difficult.
  • The old datatable implement validated uniqueness before transpose, which inverted the gene/cell dimension checks.

Changes

  • Synchronized the delimiter selection logic in pandas paths to match the datatable implementation.
  • Perform transpose before duplicate checks, so gene/cell uniqueness is validated on the correct axis.
  • Add early checks for empty cell dimension to fail fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant