Misc format changes

I'll gather a list of format changes here that is WIP. Once the items are formalized, I'll create separate issues.

* Change sub-track types to start at LSB instead of MSB, that way we can use CTZ instead of CLZ to iterate which is slightly faster.
* Swizzle constant rotations to facilitate AVX loads
* Merge the per track format and segment range info into a single buffer? Less to prefetch and used together
* Remove segment data alignment, no longer required?
* Reverse bytes for raw f32 storage? Would allow to always swizzle when unpacking with SIMD or alternatively remove swizzle from variable packing
* Store vec3 segment range data in SoA form for bulk unpacking

Revisit variable segment data format. Currently, segment data is as follows:
[Segment0 8-bit metadata] [Segment0 24-byte range data] [Segment0 variable animated data per keyframe]
[Segment1 ...] Other segments follow
This is simple and nice from a conceptual point of view but it might not be ideal for performance. There are two cases we have to handle:
* We sample 2 keyframes within the same segment in which case a single set of metadata/range is used
* We sample 2 keyframes from two different segments in which case we need two sets of metadata/range data
In the latter case, the second set of segment metadata lives close to our last keyframe data but it lives near the last bit of keyframe data we'll touch. As a result, we pay for the potential TLB misses upfront.
Even in the former case, if our clip has many tracks, the segment metadata and range data might end up on separate memory pages along with the keyframe data.

The end result of this is that when the algorithm starts unpacking, we first touch many new pages which all TLB miss at the same time (potentially, for medium/large track counts).

Can we use a [Z-order curve](https://en.wikipedia.org/wiki/Z-order_curve) (aka Morton code) to improve packing and traversal order? Can we think of our 3 data parts as 3 axes? And so we have a 3D tuple to represent each element touched: [metadata offset, range offset, animated data offset]. If we increment each offset for each keyframe, we end up traversing in a straight line in that 3D space. In practice, we would have to interleave groups of data to facilitate unpacking (e.g. 4 sub-tracks together). It might complicate alignment requirements, especially for the animated data as it has bit alignment. Each group would need to pad up to the end of the byte, adding at most 7 bits of padding. 400 animated rotations would thus have 100 groups with each at most 7 bits of padding for a total of 87.5 bytes. If we make groups of 8 instead, the padding cost is halved.

See these links for details:
* https://www.forceflow.be/2013/10/07/morton-encodingdecoding-through-bit-interleaving-implementations/
* https://en.wikipedia.org/wiki/Z-order_curve
* https://fgiesen.wordpress.com/2009/12/13/decoding-morton-codes/
* https://github.com/trevorprater/pymorton

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc format changes #528

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Misc format changes #528

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions