-
Notifications
You must be signed in to change notification settings - Fork 91
Description
This would be cool to have, though it definitely requires a bit of work.
Basically: hdf5 allows you to store (arrays) as a collection of smaller chunks e.g. a 20x20 matrix as 4 5x5 chunks.
Chunks can then be compressed individually, accessed individually and added incrementally.
-> Not all chunks need to exist when writing the array at first.
This requires a bit of API, but more importantly, HDF5 constructs B-Trees to keep track of the chunk locations.
I added support to read these in 2022 in b5c09ef
The code is here:
https://github.com/JuliaIO/JLD2.jl/blob/master/src/fractal_heaps.jl
The best entry point into the hdf5 format spec is the description of the DataLayout header message that can be set to chunked.
Description to V1-BTrees and everything else is referenced from there.
https://support.hdfgroup.org/documentation/hdf5/latest/_f_m_t4.html#subsubsec_fmt4_dataobject_hdr_msg_layout