diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b1b39dd863..544ff6e5a6 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -158,7 +158,14 @@ If you add code that should be tested, please add tests. - To run tests using `cudf.pandas`, run `NARWHALS_DEFAULT_CONSTRUCTORS=pandas python -m cudf.pandas -m pytest` - To run tests using `polars[gpu]`, run `NARWHALS_POLARS_GPU=1 pytest --constructors=polars[lazy]` -### Backend-specific advice +### General considerations + +In general we assume that dataframes are used to store and process columnar data. Therefore: + +- Iterating over rows in Python is never allowed. Assume that there's an infinite number of rows. +- Iterating over columns is acceptable (though native APIs that do the iteration in a low-level language are preferred if possible!). + +### Backend-specific considerations - pandas: @@ -175,6 +182,9 @@ If you add code that should be tested, please add tests. deprecated/removed, but please keep it for older pandas versions https://github.com/pandas-dev/pandas/pull/51466/files. - Instead of `rename`, prefer `alias` at the compliant level. + - pandas supports any hashable object as a column name, whereas other libraries tend to only support + strings. We tend to just type `: str` in places which accept column names, with the understanding + that for pandas, other data types will silently work. - Polars: