feat: add `LazyCategoricalDtype` for lazy categorical columns by katosh · Pull Request #2288 · scverse/anndata

katosh · 2026-01-08T12:02:23Z

feat: add `LazyCategoricalDtype` for lazy categorical columns

Closes feat: Efficient category count and partial loading for lazy AnnData #2283
Tests added
Release note added

Summary

Add LazyCategoricalDtype extending pd.CategoricalDtype with lazy loading support for categorical columns in lazy AnnData. This enables efficient access to categorical metadata without loading all categories into memory.

lazy_adata = ad.experimental.read_lazy("large_dataset.h5ad")
dtype = lazy_adata.obs["cell_type"].dtype  # LazyCategoricalDtype

# Cheap metadata access (no I/O)
dtype.n_categories     # 100000
dtype.ordered          # False

# Partial reads (efficient)
dtype.head_categories()     # first 5 categories
dtype.head_categories(10)   # first 10 categories
dtype.tail_categories()     # last 5 categories
dtype.tail_categories(10)   # last 10 categories

# Full load (cached after first access)
dtype.categories       # pd.Index with all categories

Motivation

When working with lazy AnnData objects containing many categories (e.g., 100k+ cell IDs as categories), loading all categories just to display a preview or check metadata is inefficient. This is particularly important for:

repr/HTML display - showing category info without triggering full loads
Data exploration - quickly inspecting category names
Memory efficiency - avoiding unnecessary allocations

API Design

Following Ilan's suggestion, the API uses familiar pandas naming conventions:

Property/Method	Returns	Behavior
`.categories`	`pd.Index`	Full load, cached (standard pandas)
`.ordered`	`bool`	Standard pandas
`.n_categories`	`int`	Cheap metadata access
`.head_categories(n=5)`	`np.ndarray`	First n categories (partial read)
`.tail_categories(n=5)`	`np.ndarray`	Last n categories (partial read)

The head/tail naming follows pandas DataFrame.head()/DataFrame.tail() conventions.

Implementation Details

LazyCategoricalDtype extends pd.CategoricalDtype to maintain compatibility
Categories are loaded lazily on first .categories access and cached
head_categories/tail_categories use read_elem_partial for efficient partial reads
Works with both zarr and h5ad backends

Benchmark Results

Tested with 100k categories (median of 5 runs):

Method	H5AD	Zarr
`n_categories`	0.05 ms	0.11 ms
`head_categories(10)`	0.19 ms	8.82 ms
`categories` (full)	30.32 ms	19.19 ms

Speedups vs full load:

Method	H5AD	Zarr
`n_categories`	621x	168x
`head_categories(10)`	160x	2.2x

Note: zarr speedup for partial reads is limited because categories are currently written without explicit chunking.

codecov · 2026-01-08T12:04:33Z

Codecov Report

❌ Patch coverage is 97.05882% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.64%. Comparing base (4376302) to head (edb04fc).
⚠️ Report is 6 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/anndata/experimental/backed/_lazy_arrays.py	97.05%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2288      +/-   ##
==========================================
- Coverage   86.74%   84.64%   -2.10%     
==========================================
  Files          46       46              
  Lines        7204     7289      +85     
==========================================
- Hits         6249     6170      -79     
- Misses        955     1119     +164

Files with missing lines	Coverage Δ
src/anndata/experimental/backed/_lazy_arrays.py	`93.67% <97.05%> (+1.94%)`	⬆️

... and 11 files with indirect coverage changes

The merge code checks `dtype == "category"` which requires LazyCategoricalDtype to handle string comparison in __eq__.

…-dtype # Conflicts: # src/anndata/experimental/backed/_lazy_arrays.py # tests/lazy/test_read.py

ilan-gold

Thanks! Looking good!

src/anndata/experimental/backed/_lazy_arrays.py

ilan-gold · 2026-01-08T15:33:23Z

src/anndata/experimental/backed/_lazy_arrays.py

+
+        arr = self._get_categories_array()
+        total = self.n_categories
+        return read_elem_partial(arr, indices=slice(0, min(n, total)))


If arr is just a {H5,Zarr}Array, just use their raw slicing methods

Suggested change

return read_elem_partial(arr, indices=slice(0, min(n, total)))

return arr[0:min(n, total))]

Just to check that this is what you want: Raw slicing can end up returning encoded byte strings while users might expect to receive the decoded strings.

import h5py import tempfile import numpy as np from anndata._io.specs.registry import read_elem_partial # Create HDF5 file with string data with tempfile.NamedTemporaryFile(suffix='.h5') as f: with h5py.File(f.name, 'w') as h5: # Store strings (HDF5 stores as bytes internally) h5.create_dataset('categories', data=['Cat_000', 'Cat_001', 'Cat_002']) with h5py.File(f.name, 'r') as h5: arr = h5['categories'] # DIRECT SLICING: Returns bytes direct_result = arr[:2] print(f"Direct slice: {direct_result}") # Output: [b'Cat_000' b'Cat_001'] print(f"Type: {type(direct_result[0])}") # Output: <class 'bytes'> # read_elem_partial: Returns decoded strings partial_result = read_elem_partial(arr, indices=slice(0, 2)) print(f"read_elem_partial: {partial_result}") # Output: ['Cat_000' 'Cat_001'] print(f"Type: {type(partial_result[0])}") # Output: <class 'str'>

Justification: read_elem_partial handles:

HDF5 byte-to-string decoding

Various string encodings (vlen strings, fixed-length)

Nullable string arrays with masks

ilan-gold · 2026-01-08T15:36:35Z

src/anndata/experimental/backed/_lazy_arrays.py

+        if self.__categories is not None:
+            return np.asarray(self.__categories[-n:])
+
+        if self._categories_array is None:
+            return np.array([])
+
+        from anndata._io.specs.registry import read_elem_partial
+
+        arr = self._get_categories_array()
+        total = self.n_categories
+        start = max(total - n, 0)
+        return read_elem_partial(arr, indices=slice(start, total))


Duplicated with head_categories - please deduplicate

src/anndata/experimental/backed/_lazy_arrays.py

- Use @cached_property for categories (cleaner than manual caching) - Simplify cache detection to "categories" in self.__dict__ - Remove _cached_n_categories double caching (use shape[0] directly) - Rename _categories_array to _categories_elem (reflects group case) - Extract _read_partial_categories helper to deduplicate head/tail - Add ZarrGroup | H5Group to type annotation (code handles it)

katosh · 2026-01-08T18:07:25Z

Thanks for the thorough review! I've implemented most of your suggestions:

Implemented:

@cached_property for categories - cleaner than manual caching
"categories" in self.__dict__ for cache detection
Removed _cached_n_categories double caching - now uses shape[0] directly
Renamed _categories_array → _categories_elem
Extracted _read_partial_categories helper to deduplicate head/tail logic
Added ZarrGroup | H5Group to type annotation (you were right - if the code handles it, types should reflect that)

Kept for now with justification:

read_elem_partial instead of direct slicing - required for HDF5 byte-to-string decoding (direct slicing returns b'Cat_000' instead of 'Cat_000')
None support in type annotation - kept for API completeness/defensive programming, though I confirmed it's never used in practice (even empty categoricals write an empty array, not None)
name property - essential for dtype == "category" comparison in merge.py (CI failed without it)
__hash__ method - required for sets/dicts (e.g., collecting unique dtypes, @lru_cache functions)

src/anndata/experimental/backed/_lazy_arrays.py

ilan-gold · 2026-01-09T13:09:30Z

src/anndata/experimental/backed/_lazy_arrays.py

+    @property
+    def name(self) -> str:
+        """String identifier for this dtype."""
+        return "category"


I think there is no code overriding the existing name on CategoricalDtype from which we inherit. I assume these lines work whether or not you have this property here or not because self.name should still be defined.

src/anndata/experimental/backed/_lazy_arrays.py

ilan-gold · 2026-01-09T13:17:55Z

tests/lazy/test_read.py

+    from anndata.experimental.backed._lazy_arrays import LazyCategoricalDtype
+
+    categories = ["a", "b", "c"]
+    adata = AnnData(
+        X=np.zeros((3, 2)),
+        obs=pd.DataFrame({"cat": pd.Categorical(categories)}),
+    )
+
+    path = tmp_path / "test.zarr"
+    adata.write_zarr(path)
+
+    lazy = read_lazy(path)
+    dtype = lazy.obs["cat"].dtype
+    assert isinstance(dtype, LazyCategoricalDtype)


I don't think you need to go through AnnData for doing most of these tests, anndata.io.write_elem can handle writing a categorical and read_elem can return the in-memory once while read_elem_lazy will give you a CategoricalArray (although I think one test that embeds this inside in the anndata object enough and then tests that read_lazy(path).to_memory() == in_memory_adata is good). Then you could reuse the categorical fixture you create :)

src/anndata/experimental/backed/_lazy_arrays.py

- Remove `name` property (inherited from CategoricalDtype) - Remove `None` support from type annotations and guards - Simplify `categories` property to use `read_elem` uniformly - Unify `head_categories`/`tail_categories` into `_get_categories_slice` helper - Keep `bool(ordered)` - required because HDF5 returns np.bool_ - Refactor tests to use `write_elem`/`read_elem_lazy` directly - Update equality check for `None` categories comparison

…_lazy

katosh · 2026-01-09T14:47:20Z

Thanks for the thorough second review! I've addressed most of your suggestions. Here's a summary:

Implemented

Removed name property - You were right, it's inherited from CategoricalDtype as a class attribute. I had mistakenly thought it might be reset like __hash__ when defining __eq__, but that's not the case.
Removed None support - Removed from type annotations and all associated guards. The __eq__ method now returns False when comparing to a CategoricalDtype with None categories.
Simplified categories property - Now just return pd.Index(read_elem(self._categories_elem)). You were right that read_elem handles both zarr and h5 uniformly.
Refactored tests to use write_elem/read_elem_lazy - Most unit tests now work at the element level. Added a _write_categorical_zarr() helper for creating test fixtures.

Implemented slightly differently

head_categories/tail_categories refactor - I am not entirely sure what you mean. I refactored both to use a single _get_categories_slice method with a from_end to fork between the two cases internally while keeping the public API unchanged. Let me know if you'd prefer a different approach!
Integration test - I avoided the round trip throu Anndata in most test and only mad a single test_lazy_categorical_roundtrip_via_anndata integration test which tests the full workflow including read_lazy(path).to_memory() == original_adata. It also verifies dtype caching and ordered categoricals through the AnnData path.

Not yet implemented

bool(ordered) removal - I kept bool(ordered) because HDF5 returns np.bool_ instead of Python bool:
```
>>> with h5py.File('test.h5', 'r') as f:
...     ordered = f['cat'].attrs['ordered']
...     print(type(ordered))
<class 'numpy.bool'>
```
While np.bool_ works in most contexts, normalizing to Python bool ensures consistent behavior for hashing and serialization. That said, if you'd prefer to remove it and handle np.bool_ downstream or trust it works fine, I'm happy to change it!

Let me know if you'd like any adjustments to the approach.

src/anndata/experimental/backed/_lazy_arrays.py

ilan-gold · 2026-01-12T10:22:34Z

src/anndata/experimental/backed/_lazy_arrays.py

+        if not isinstance(other, pd.CategoricalDtype):
+            return False
+        # Compare with regular CategoricalDtype - need to load categories
+        if self.ordered != other.ordered:
+            return False
+        if other.categories is None:
+            return False  # LazyCategoricalDtype always has categories
+        return self.categories.equals(other.categories)


Considering how much more extensive the base implementation, I think we should just get our specialized checks out of the way fast and then fall back to that https://github.com/pandas-dev/pandas/blob/v2.3.3/pandas/core/dtypes/dtypes.py#L401

ilan-gold · 2026-01-12T10:28:23Z

src/anndata/experimental/backed/_lazy_arrays.py

+    def __repr__(self) -> str:
+        if "categories" in self.__dict__:
+            # Fully loaded - use standard repr
+            return f"CategoricalDtype(categories={self.categories!r}, ordered={self.ordered})"
+        return f"LazyCategoricalDtype(n_categories={self.n_categories}, ordered={self.ordered})"


Maybe the repr should always show categories, but just the first n for some nice-seeming n?

ilan-gold · 2026-01-12T10:32:02Z

tests/lazy/test_read.py

+    cat_group = _write_categorical_zarr(tmp_path, cat)
+    lazy_cat = read_elem_lazy(cat_group)


This is closer to what I want but I think the paradigm should be "write once, read many" i.e., write the fixture once (scope="session") and then have a fixture that does read_elem_lazy every time. You probably need one or two different fixtures (maybe for ordered and not and one or two other things) but I don't think every test (as appears here) needs its own special underlying categories array written to disc. You even have a few "fixtures" (the pd.Categorical at the beginning of each test that I would like to become a proper pytest.fixture) that are completely identical.

ilan-gold · 2026-01-12T10:33:50Z

Every review, fewer comments, getting there, thanks :)

…ixtures - Simplify __eq__ to defer to pandas base implementation after fast paths: 1. Same Python object (identity check) 2. Same on-disk location (avoids loading categories when comparing dtypes from the same file opened multiple times) - Update __repr__ to always show categories (truncated for large n): small: LazyCategoricalDtype(categories=['a', 'b', 'c']) large: LazyCategoricalDtype(categories=['a', ..., 'z'], n=100) - Extract _N_CATEGORIES_REPR_SHOW constant to module level - Refactor tests to use session-scoped fixtures (write once, read many) instead of creating new categoricals in each test

katosh · 2026-01-12T17:09:54Z

Thanks for the review! I am glad to get this polished. Addressed all three points:

1. __eq__ simplification Now defers to super().__eq__() for pandas edge cases. Added two fast paths to avoid loading categories:

Same Python object (is check)
New: Same on-disk location check discovered that zarr/h5py arrays already compare equal by location (not content), so we just use arr1 == arr2 directly. This avoids loading categories when comparing dtypes from the same file opened multiple times.

2. __repr__ always shows categories Truncated for large counts:

LazyCategoricalDtype(categories=['a', 'b', 'c'])
LazyCategoricalDtype(categories=['cat_0', 'cat_1', 'cat_2', '...', 'cat_97', 'cat_98', 'cat_99'], n=100)

Moved constant to module level as _N_CATEGORIES_REPR_SHOW.

3. Test fixtures Refactored to session-scoped "write once, read many" pattern with 5 reusable fixtures.

Edit: Additional testing improvements after further review:

Verified arr1 == arr2 location-based comparison behavior:

Investigated h5py and zarr source code to confirm equality is location-based, not content-based
h5py: compares HDF5 object IDs via self.id == other.id (source)
zarr 3.x: uses dataclass-generated __eq__ comparing StorePath (URL string comparison)
Both return True for same location (even from different open() calls), False for different files with same content

Parametrized all LazyCategoricalDtype tests for both backends:

Refactored fixtures with helper functions for writing categorical data to zarr/h5ad
Created session-scoped path fixtures for each category type and backend
Created parametrized store fixtures that automatically test both zarr and h5ad

…ality - Fix RUF005: use list unpacking [*head, "...", *tail] - Remove _same_disk_location helper - zarr/h5py arrays already compare equal by on-disk location, not content

Verify that comparing two dtypes from the same file (opened twice) uses the fast path and doesn't load categories.

Replace the previous same-location equality test with a more rigorous parametrized test that covers both zarr and h5py backends. The new test uses `unittest.mock.patch.object` to patch `__getitem__` on the underlying category arrays to raise `AssertionError` if called. This proves that both backends use location-based equality comparison that doesn't read array contents: - h5py: compares HDF5 object IDs (file number + object number) - zarr 3.x: compares StorePath (URL string comparison via dataclass) The previous test only verified our `LazyCategoricalDtype.categories` cache wasn't populated, which doesn't prove the storage layer didn't load data internally.

Refactor categorical test fixtures to support both backends: - Add helper functions for writing categorical data to zarr/h5ad - Create path fixtures for each category type and backend (session-scoped) - Create parametrized store fixtures that test both zarr and h5ad All LazyCategoricalDtype tests now run for both backends, increasing test coverage from 12 to 24 tests: - test_lazy_categorical_dtype_n_categories[zarr/h5ad] - test_lazy_categorical_dtype_head_tail_categories[zarr/h5ad] - test_lazy_categorical_dtype_categories_caching[zarr/h5ad] - test_lazy_categorical_dtype_ordered[zarr/h5ad] - test_lazy_categorical_dtype_repr[zarr-zarr/zarr-h5ad/h5ad-zarr/h5ad-h5ad] - test_lazy_categorical_dtype_equality[zarr/h5ad] - test_lazy_categorical_dtype_equality_no_load[zarr/h5ad] - test_lazy_categorical_dtype_hash[zarr/h5ad] - test_lazy_categorical_dtype_n_categories_from_cache[zarr/h5ad] - test_lazy_categorical_dtype_name[zarr/h5ad] - test_lazy_categorical_dtype_inequality_with_none_categories[zarr/h5ad]

…tion Consolidate redundant tests and add proper verification for lazy behavior: 1. Merged n_categories tests: - test_lazy_categorical_dtype_n_categories now verifies: - Metadata-only access (categories not loaded) - Cache behavior after categories are loaded - Removed redundant test_lazy_categorical_dtype_n_categories_from_cache 2. Improved head_tail_categories test: - Added verification that partial reads don't load all categories - Each head/tail call now checks "categories" not in __dict__ 3. Consolidated equality test: - Merged test_lazy_categorical_dtype_name (trivial 1-assertion test) - Merged test_lazy_categorical_dtype_inequality_with_none_categories - Now tests name property and None-categories edge case Test count reduced from 24 to 18 while improving coverage quality: - Tests now verify lazy behavior claims, not just return values - Removed redundant test code without losing coverage

katosh · 2026-01-20T19:41:52Z

@ilan-gold if you like, I could also address #2296 in this PR by setting a default chunk size of 10,000 for category arrays at

anndata/src/anndata/_io/specs/methods.py

Lines 1107 to 1112 in c6f6f54

    
           _writer.write_elem( 
        
               g, 
        
               "categories", 
        
               v.categories.to_numpy(), 
        
               dataset_kwargs=dataset_kwargs, 
        
           )

by implementing

categories = v.categories.to_numpy()
cat_kwargs = dataset_kwargs
if len(categories) > 10_000 and "chunks" not in dataset_kwargs:
    cat_kwargs = dict(dataset_kwargs, chunks=(10_000,))
_writer.write_elem(g, "categories", categories, dataset_kwargs=cat_kwargs)

This would increase the benefit of this PR for zarr stores.

ilan-gold

I will need to look at the tests in a bit but in general they are still a little too repetitive. What is the difference between small medium large and 50? Why not parametrize by ordered and n_obs?

ilan-gold · 2026-01-23T14:22:59Z

src/anndata/experimental/backed/_lazy_arrays.py

 )

+# Number of categories to show at head/tail in LazyCategoricalDtype repr
+_N_CATEGORIES_REPR_SHOW = 3


I don't even think pandas is this aggressive - It seems they use 10 so let's go with that

Done. Note that this will add up to a total of 20 previewed categories.

src/anndata/experimental/backed/_lazy_arrays.py

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Reduce repetition in categorical test fixtures by using a config-driven factory pattern instead of separate fixture groups for each category size. Changes: - Replace 15 individual fixtures with 3 generated fixtures + 1 data fixture - Consolidate n50 and n100 into single n100 config (serves both use cases) - Use `_make_cat_fixture()` factory for zarr/h5ad parametrization - Update tests to use new fixture names (cat_n3_store, cat_n100_store) Addresses review feedback about fixture repetitiveness.

katosh · 2026-01-23T19:55:33Z

Hi @ilan-gold,

Thanks for the continued review feedback! I've addressed your comments about the test fixtures being too repetitive.

Test fixture consolidation (`ac1cab52`)

Refactored the categorical fixtures from 15 individual fixtures to a config-driven factory pattern:

_CAT_CONFIGS = [
    ("n3", 3, False, ["a", "b", "c"]),      # basic tests, equality, hashing
    ("n100", 100, False, None),              # truncation, n_categories, head/tail
    ("ordered", 3, True, ["low", "medium", "high"]),
]

This follows the "write once, read many" pattern you suggested - data is written once per session via cat_data_paths, then _make_cat_fixture() generates store fixtures that open fresh handles for each test.

I also consolidated n50 and n100 into just n100 since it serves both the head/tail testing and truncation testing use cases.

Improved equality_no_load test (`edb04fc2`)

Switched from patching __getitem__ to patching read_elem:

__getitem__ on zarr/h5py arrays can't be reliably patched (C-level methods)
read_elem is the actual function called to load categories

Also added a positive control within the same test that verifies comparison with pd.CategoricalDtype does trigger read_elem, proving the patch approach works.

Note on force pushes

I made a few force pushes while iterating on the test improvements - apologies for the noise. The history should be clean now with just the two commits above on top of the previous work.

Let me know if there's anything else you'd like me to address!

- Switch from patching __getitem__ to patching read_elem (more reliable) - Add positive control: comparison with pd.CategoricalDtype triggers read_elem - This proves both that the optimization works AND that the patch detects loads

implement LazyCategoricalDtype

4b5d7ab

katosh mentioned this pull request Jan 8, 2026

feat: Efficient category count and partial loading for lazy AnnData #2283

Open

katosh added 3 commits January 8, 2026 13:29

fix: LazyCategoricalDtype.__eq__ handle string comparison

7edb510

The merge code checks `dtype == "category"` which requires LazyCategoricalDtype to handle string comparison in __eq__.

increase testing coverage of LazyCategoricalDtype

90ac52e

Merge remote-tracking branch 'origin/main' into feat/lazy-categorical…

667c823

…-dtype # Conflicts: # src/anndata/experimental/backed/_lazy_arrays.py # tests/lazy/test_read.py

katosh marked this pull request as ready for review January 8, 2026 15:08

manipulate cache for better testing

c6a68da

ilan-gold requested changes Jan 8, 2026

View reviewed changes

katosh added 2 commits January 8, 2026 19:10

remove unnecessary docstring

03fe0b0

remove remaining docstring examples

b57bdab

ilan-gold requested changes Jan 9, 2026

View reviewed changes

katosh added 2 commits January 9, 2026 15:34

test: refactor LazyCategoricalDtype tests to use write_elem/read_elem…

9ff164c

…_lazy

ilan-gold requested changes Jan 12, 2026

View reviewed changes

katosh added 6 commits January 12, 2026 13:06

fix linting and simplify __eq__ using zarr/h5py built-in location equ…

3d8bbea

…ality - Fix RUF005: use list unpacking [*head, "...", *tail] - Remove _same_disk_location helper - zarr/h5py arrays already compare equal by on-disk location, not content

test: add same-location equality check for LazyCategoricalDtype

e8ee005

Verify that comparing two dtypes from the same file (opened twice) uses the fast path and doesn't load categories.

test: fix misleading comment about hash requirement

6259d14

This was referenced Jan 14, 2026

feat: Apply reasonable default chunking to 1D arrays in obs/var #2295

Closed

feat: Chunking for categorical categories string arrays only #2296

Open

feat: Add HTML representation #2236

Open

ilan-gold reviewed Jan 23, 2026

View reviewed changes

katosh and others added 3 commits January 23, 2026 11:09

simplify LazyCategoricalDtype comparison

87d399b

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

increase number of preview categories in LazyCategoricalDtype

f740784

katosh force-pushed the feat/lazy-categorical-dtype branch 2 times, most recently from d5ee71f to 2fddb8c Compare January 23, 2026 21:33

katosh force-pushed the feat/lazy-categorical-dtype branch from 2fddb8c to edb04fc Compare January 23, 2026 21:34

	return read_elem_partial(arr, indices=slice(0, min(n, total)))
	return arr[0:min(n, total))]

		cat_group = _write_categorical_zarr(tmp_path, cat)
		lazy_cat = read_elem_lazy(cat_group)

Conversation

katosh commented Jan 8, 2026

feat: add LazyCategoricalDtype for lazy categorical columns

Summary

Motivation

API Design

Implementation Details

Benchmark Results

Uh oh!

codecov bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

katosh Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

katosh commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

katosh commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implemented

Implemented slightly differently

Not yet implemented

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold commented Jan 12, 2026

Uh oh!

katosh commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

katosh commented Jan 20, 2026

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

katosh commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test fixture consolidation (ac1cab52)

Improved equality_no_load test (edb04fc2)

Note on force pushes

Uh oh!

Reviewers

Assignees

Labels

feat: add `LazyCategoricalDtype` for lazy categorical columns

codecov bot commented Jan 8, 2026 •

edited

Loading

katosh Jan 8, 2026 •

edited

Loading

katosh commented Jan 8, 2026 •

edited

Loading

katosh commented Jan 9, 2026 •

edited

Loading

katosh commented Jan 12, 2026 •

edited

Loading

katosh commented Jan 23, 2026 •

edited

Loading

Test fixture consolidation (`ac1cab52`)

Improved equality_no_load test (`edb04fc2`)