Add properties to get chunk and shard slices#3573
Add properties to get chunk and shard slices#3573dstansby wants to merge 1 commit intozarr-developers:mainfrom
Conversation
e3bee0b to
f5ea95a
Compare
| return await self.store_path.store.getsize_prefix(self.store_path.path) | ||
|
|
||
| @property | ||
| def chunk_slices(self) -> Generator[tuple[slice, ...]]: |
There was a problem hiding this comment.
nitpicking the name: i find "slice" to be kind of ambiguous between a verb and a noun, and it also locks us in to returning slice objects, so what if we use the word "region" instead?
and also I think it's helpful if the name of this routine makes it clear that it's an iterator. So what if we call it iter_chunk_regions, i.e. the name of the routine it wraps 😜
There was a problem hiding this comment.
re. name, I think "slices" is unambiguously a noun because it's plural? I looked at NumPy (https://numpy.org/doc/stable/user/basics.indexing.html#slicing-and-striding), and they use the terms "index", "selection tuple", or "slicing tuple". If we try and stay consistent with NumPy, how about "chunk_indices"?
I like "regions", but to me it's ambiguous whether that means the index, or the array data at that index. If we settle on it that could be fixed by documentation and consistent use though.
re. iterator, do you mean generator? A list/string etc. are also iterators, but using a yield makes this into more specifically a generator.
So perhaps generate_chunk_indices? I find that a bit clunky though,
for chunk_index in arr.chunk_indices:is much nicer than
for chunk_index in arr.generate_chunk_indices:(or iter_chunk_indices)
which is why I prefer simply chunk_indices. I don't knkow if there's prior art in other libraries for naming generators?
There was a problem hiding this comment.
generators are iterators, and my thinking was that putting "iter" in the name conveys that users should expect to iterate over the value returned by calling this method.
when I think of iterating over indices, I think of iterating over tuples of coordinates, e.g., (0, 0, 0), (0, 0, 1), .... which is what _iter_chunk_coords / _iter_shard_coords do right now.
There was a problem hiding this comment.
I like "regions", but to me it's ambiguous whether that means the index, or the array data at that index. If we settle on it that could be fixed by documentation and consistent use though.
I worry that this ambiguity will hold for any name we pick :)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3573 +/- ##
==========================================
- Coverage 61.87% 61.83% -0.04%
==========================================
Files 85 85
Lines 10134 10146 +12
==========================================
+ Hits 6270 6274 +4
- Misses 3864 3872 +8
🚀 New features to boost your workflow:
|
Fixes #2454.