Skip to content

DataFrame Typing Changes in pandas 3.0.0 #2212

@erklem

Description

@erklem

Describe the bug
pandas DataFrames not consistently typed with pandas 3.0.0. Even when a dataframe is a pandera-typed dataframe within a function, it reverts to a standard pandas dataframe in the calling context.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Code Sample, a copy-pastable example

The following code no longer works with pandas 3.0. This will also affect tests using pandas.testing.assert_frame_equal(), which checks dataframe type by default.

import pandas as pd
import pandera.typing as pat
from pandera.pandas import DataFrameModel, check_types

DATA_DICT = {
    'a': [1, 2, 3],
    'b': [4, 5, 6]
}


class TestSchema(DataFrameModel):
    """A simple pandera schema for testing."""

    a: int
    b: int


@check_types
def generate_test_dataframe() -> pat.DataFrame[TestSchema]:
    """Generate a test DataFrame conforming to TestSchema."""
    df = pd.DataFrame(DATA_DICT)

    typed_df = df.pipe(pat.DataFrame[TestSchema])

    print(f'Return type: {type(typed_df)}')

    return typed_df


expected_df = pat.DataFrame[TestSchema](DATA_DICT)
return_df = generate_test_dataframe()

assert isinstance(return_df, type(expected_df)), f'Expected {type(expected_df)}, got {type(return_df)}'
print('Successful completion')

pandas 2.3.2 and pandera 0.28.1

  • No mypy warnings/errors
  • generate_test_dataframe() prints Return type: <class 'pandera.typing.pandas.DataFrame'>
  • Successful completion is printed

pandas 2.3.3 and pandera 0.28.1

  • No mypy warnings/errors
  • generate_test_dataframe() prints Return type: <class 'pandera.typing.pandas.DataFrame'>
  • Successful completion is printed

pandas 3.0.0 and pandera 0.28.1

  • No mypy warnings/errors
  • generate_test_dataframe() prints Return type: <class 'pandera.typing.pandas.DataFrame'>

File prints:

Traceback (most recent call last):
  File "c:\Users\m277249\Documents\56649_Speech_AI\CASLIM\pandera_typing_check.py", line 33, in <module>
    assert isinstance(return_df, type(expected_df)), f'Expected {type(expected_df)}, got {type(return_df)}'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Expected <class 'pandera.typing.pandas.DataFrame'>, got <class 'pandas.DataFrame'>

Expected behavior

I would expect the DataFrame type to be consistent within the function, when leaving the function (especially when checked via check_types), and in the calling context.

Desktop (please complete the following information):

  • OS: Windows 11
  • Version: pandera 0.28.1 and pandas 3.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions