Skip to content

BUG: Pandas converts nullable int to float, even when this loses data#63925

Open
kjmin622 wants to merge 15 commits intopandas-dev:mainfrom
kjmin622:MapReturnValue
Open

BUG: Pandas converts nullable int to float, even when this loses data#63925
kjmin622 wants to merge 15 commits intopandas-dev:mainfrom
kjmin622:MapReturnValue

Conversation

@kjmin622
Copy link
Contributor

@kjmin622 kjmin622 commented Jan 29, 2026

Summary

Fix precision loss when using Series.apply() or Series.map() on nullable integer dtypes (Int64, UInt64, etc.) with None values.

Problem

When applying a function to a Series with nullable integer dtype containing NA values, the data was being converted to float64, causing precision loss for large integers that exceed float64's integer precision limit (2^53 ≈ 9×10^15).

import pandas as pd
def add_two(x):
    if pd.isna(x): 
        return pd.NA 
    return x + 2
sequence = [10000000000000001, None] # above float64 precision limit
ser = pd.Series(sequence, dtype='Int64')
result = ser.apply(add_two)

Before: 10000000000000002 (wrong - precision lost)
After: 10000000000000003 (correct)

Solution

Modified BaseMaskedArray.map() to:

  1. Use to_numpy(dtype=object, na_value=pd.NA) instead of to_numpy() to preserve integer values
  2. Apply _cast_pointwise_result() to restore the appropriate nullable dtype

@aaron-seq

This comment was marked as spam.

@kjmin622
Copy link
Contributor Author

@aaron-seq Thank you for the review.

As you suggested, adding a preserve_dtype parameter would eliminate the breaking change and remove potential issues. I will implement it.

However, all tests are currently passing. Only one test failed, but it is caused by #63936 and is unrelated to the current code changes.

@mroeschke
Copy link
Member

@aaron-seq do not post AI generated pull request reviews again. Please review our AI policy. Similar contributions in the future may lead to a ban.

@aaron-seq
Copy link

@aaron-seq do not post AI generated pull request reviews again. Please review our AI policy. Similar contributions in the future may lead to a ban.

Thanks for this, will note this when contributing in future

@kjmin622 kjmin622 force-pushed the MapReturnValue branch 2 times, most recently from f29bebb to 5ebba13 Compare January 31, 2026 15:01
@kjmin622
Copy link
Contributor Author

As you suggested, adding a preserve_dtype parameter would eliminate the breaking change and remove potential issues. I will implement it.

Instead of adding preserve_dtype, the map function was modified to return the same dtype as before.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Comment on lines 1697 to 1700
try:
return type(self)._from_sequence(result, dtype=self.dtype)
except (ValueError, TypeError):
return result
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will preserve the type if it can be preserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we should return a masked array here, but if the user returns floats we should not convert them back to integers even if they have no fractional value. E.g.

ser = Series([1, 2, 3], dtype="Int64")
result = ser.apply(lambda x: 3.0)

should result in Float64. Just use self._from_sequence I think.

Copy link
Member

@jorisvandenbossche jorisvandenbossche Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can use self._cast_pointwise_result instead of _from_sequence, I think. That latter function was exactly added to be used in those kind of situations

xref #62164

@kjmin622
Copy link
Contributor Author

kjmin622 commented Feb 3, 2026

Thanks for the PR!

@rhshadrach Thank you for your review. I reflected them.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm positive on preserving masked EAs in map/apply, things like this would be useful especially when making NumPy-nullable the default. But this can be a large break for current users. I'm thinking this could need a deprecation instead. Perhaps if we are going to make a feature flag for NumPy-nullable as a default this would go behind it?

cc @jbrockmendel @jorisvandenbossche @mroeschke

Comment on lines 1697 to 1700
try:
return type(self)._from_sequence(result, dtype=self.dtype)
except (ValueError, TypeError):
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we should return a masked array here, but if the user returns floats we should not convert them back to integers even if they have no fractional value. E.g.

ser = Series([1, 2, 3], dtype="Int64")
result = ser.apply(lambda x: 3.0)

should result in Float64. Just use self._from_sequence I think.

- Fixed a bug in :func:`col` where unary operators (``-``, ``+``, ``abs``) were not supported (:issue:`63939`)
- Fixed a bug in the :func:`comparison_op` raising a ``TypeError`` for zerodim
subclasses of ``np.ndarray`` (:issue:`63205`)
- Fixed bug in :meth:`Series.apply` and :meth:`Series.map` where nullable integer dtypes were converted to float, causing precision loss for large integers (:issue:`63903`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not a regression, note should be in 3.1.0. Also need to note map/apply are now preserving NumPy-nullable EAs.

@jbrockmendel
Copy link
Member

I'm positive on preserving masked EAs in map/apply, things like this would be useful especially when making NumPy-nullable the default. But this can be a large break for current users. I'm thinking this could need a deprecation instead. Perhaps if we are going to make a feature flag for NumPy-nullable as a default this would go behind it?

I lean towards "treat this as a bugfix" since "preserve dtype backend" is a mostly-consistent policy. But not a super-strong opinion.


Other enhancements
^^^^^^^^^^^^^^^^^^
- :meth:`Series.apply` and :meth:`Series.map` now preserve nullable (masked) extension array dtypes where appropriate; e.g. when the result is float, the output dtype is ``Float64`` rather than being cast back to the input dtype (:issue:`63903`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed and...


ExtensionArray
^^^^^^^^^^^^^^
- Fixed bug in :meth:`Series.apply` and :meth:`Series.map` where nullable integer dtypes were converted to float, causing precision loss for large integers (:issue:`63903`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...add a little detail here.

Suggested change
- Fixed bug in :meth:`Series.apply` and :meth:`Series.map` where nullable integer dtypes were converted to float, causing precision loss for large integers (:issue:`63903`).
- Fixed bug in :meth:`Series.apply` and :meth:`Series.map` where nullable integer dtypes were converted to float, causing precision loss for large integers; now the nullable dtype will be preserved (:issue:`63903`).

mapper,
na_action=na_action,
)
if isinstance(result, np.ndarray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should always be a NumPy array; can you check? Change this to assert isinstance(result, np.ndarray) and see that tests still pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach I confirmed that the test passes even when I add assert isinstance(result, np.ndarray). Thank you.

@rhshadrach rhshadrach added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Feb 11, 2026
@rhshadrach rhshadrach added the Apply Apply, Aggregate, Transform, Map label Feb 11, 2026
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added this to the 3.1 milestone Feb 11, 2026
@rhshadrach
Copy link
Member

@jbrockmendel @mroeschke - plan to merge in a few days if you want to take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apply Apply, Aggregate, Transform, Map Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Pandas converts nullable int to float, even when this loses data

6 participants