-
Notifications
You must be signed in to change notification settings - Fork 1
Description
To support efficient storage and future analysis of multimodal simulation data (e.g., for agents that use vision, proprioception, rewards, and abstract state), we need a robust logging framework. HDF5 provides hierarchical organization, compression, and fast access to large arrays, making it an ideal format for this use case.
This issue tracks the design, implementation, and integration of HDF5 logging into our simulation framework.
Objectives:
✅ Phase 1: Research and Planning
-
Investigate the
h5pylibrary for HDF5 manipulation in Python. -
Compare compression options (e.g., GZIP, LZF) and their tradeoffs.
-
Define data access patterns:
- Do we log every step?
- Do we access by agent, timestep, or modality?
✅ Phase 2: Schema Design
-
Design a clear HDF5 layout for storing multimodal data.
-
Define naming conventions for groups, datasets, and metadata.
-
Include support for:
-
Global metadata (e.g., simulation config, timestamp, seed).
-
Per-agent groups:
observations/vision(e.g., tensor: T × R × R × C)observations/proprioception(e.g., T × D)observations/abstract_state(optional high-level vars)rewards(scalar per step)actions(categorical or vector)states(internal agent variables per step)
-
Environment state:
- Optional snapshots of full grid every N steps
-
✅ Phase 3: Implementation
-
Create
HDF5Loggerclass:__init__(filename, schema, compression=None)log_agent_step(agent_id, timestep, data_dict)log_env_step(timestep, env_data)finalize()to flush and close
-
Support auto-creation of agents/groups if they don't exist yet.
-
Optionally: implement chunked writes for long simulations.
-
Test logging performance for N agents over T steps.
✅ Phase 4: Reader Utilities
-
Create a utility for loading datasets by:
- Agent ID
- Modality (vision, reward, action)
- Step range
-
Support slicing large datasets (e.g.,
load_observations(agent_id, start=100, end=200)). -
Create small helper to visualize one episode’s data (for debugging).
✅ Phase 5: Documentation and Examples
-
Add docstrings and in-line schema description in the logger.
-
Write a notebook or script showing:
- How to use
HDF5Loggerin a simulation loop. - How to read logged data post-simulation.
- How to use
-
Document the full HDF5 schema layout in
docs/.
Suggested HDF5 File Layout:
/simulation_0001/
attrs:
seed = 42
config = JSON blob or YAML string
date = "2025-06-27"
environment_type = "grid_world"
/agents/
/agent_1/
/observations/
vision: float32 [T x R x R x C]
proprioception: float32 [T x D]
abstract_state: float32 [T x D]
actions: int32 [T]
rewards: float32 [T]
states: float32 [T x D]
/agent_2/
...
/environment/
state_snapshots: int32 [T x H x W]
resource_count: float32 [T]
Resources:
- [h5py API Reference](https://docs.h5py.org/en/stable/index.html)
- [HDF5 Format Intro (The HDF Group)](https://portal.hdfgroup.org/display/HDF5/HDF5)
- [SciPy Cookbook: HDF5](https://scipy-cookbook.readthedocs.io/items/HDF5.html)
Stretch Goals:
- Create a CLI tool to inspect and summarize
.h5files. - Add support for logging from distributed or parallel agents.
- Add TensorBoard integration or similar visualization hooks.