Skip to content

Conversation

@sanrishi
Copy link
Contributor

@sanrishi sanrishi commented Feb 3, 2026

Summary of Changes

This PR fixes a regression in read_hdf where reading HDF5 files created by older pandas versions (containing datetime64 metadata without an explicit unit) would raise a TypeError: 'generic' is not a valid TimeUnit.

Implementation Details

  • Added a check in pandas/io/pytables.py (in both restore_kwargs and read_array) to handle the bare string "datetime64".

  • If the dtype string is exactly "datetime64" (generic), it now defaults to "datetime64[ns]".

  • This mirrors the existing legacy handling logic for timedelta64 found immediately adjacent to the fix.

Verification

Since generating "broken" legacy files with modern pandas/numpy is difficult (as they now enforce units), I added a regression test in pandas/tests/io/pytables/test_store.py that uses h5py to manually strip unit metadata from a test file, simulating the legacy format.

Tests Added:

  1. test_read_hdf_datetime64_without_unit_gh64006: Verifies that generic datetime64 metadata is correctly read as [ns].

  2. test_read_hdf_preserves_explicit_units: Verifies that files with explicit units (e.g., [s], [ms]) are not overwritten by this fix.

(pandas-dev) C:\Users\My\Documents\GitHub\pandas>pytest pandas/tests/io/pytables/test_store.py -k "gh64006 or datetime"
C:\Users\My\.conda\envs\pandas-dev\Lib\site-packages\pytest_cython\__init__.py:2: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import get_distribution
=========================================================================================================== test session starts ============================================================================================================
platform win32 -- Python 3.11.14, pytest-9.0.2, pluggy-1.6.0
PySide6 6.9.3 -- Qt runtime 6.9.3 -- Qt compiled 6.9.3
rootdir: C:\Users\My\Documents\GitHub\pandas
configfile: pyproject.toml
plugins: anyio-4.12.0, hypothesis-6.148.7, cov-7.0.0, cython-0.3.1, localserver-0.0.0, qt-4.5.0, xdist-3.8.0
collected 78 items / 72 deselected / 6 selected

pandas\tests\io\pytables\test_store.py s....s

---------------------------------------------------------------------------------- generated xml file: C:\Users\My\Documents\GitHub\pandas\test-data.xml -----------------------------------------------------------------------------------
=========================================================================================================== slowest 30 durations ===========================================================================================================
0.04s call     pandas/tests/io/pytables/test_store.py::test_read_hdf_datetime_units_preserved[s]
0.03s setup    pandas/tests/io/pytables/test_store.py::test_read_hdf_datetime64_without_unit_gh64006
0.02s call     pandas/tests/io/pytables/test_store.py::test_read_hdf_datetime_units_preserved[ns]
0.02s call     pandas/tests/io/pytables/test_store.py::test_read_hdf_datetime_units_preserved[us]
0.02s call     pandas/tests/io/pytables/test_store.py::test_read_hdf_datetime_units_preserved[ms]

(13 durations < 0.005s hidden.  Use -vv to show these durations.)
=============================================================================================== 4 passed, 2 skipped, 72 deselected in 1.12s ================================================================================================

@sanrishi
Copy link
Contributor Author

sanrishi commented Feb 3, 2026

Pre-commit.ci autofix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: pd.read_hdf unable to retrieve a pd.Series with dtype as "datetime64" since v3.0.0

1 participant