Conversation
99db30b to
87fb4b8
Compare
|
cc @sjshim, please let me know if you have any feedback on this ! 👩💻 |
|
This is definitely much better! Hoping that materials re:packaging will finalize more following next week's meeting? |
|
Definitely will be good to get more ideas down following that discussion 😸 @poldrack, let us know if this looks OK (enough, for now) to merge on your end ! |
| def confirm_data_frame_index_alignment(df1, df2): | ||
| assert all(df1.index == df2.index) |
There was a problem hiding this comment.
This is more of a unit test helper. Something like assert_matching_indices(dataframe1, dataframe2). A unit test would then be something like:
def test_dataframe_transformation():
df = make_default_df()
transformed_df = my_transformation(df)
# Whatever else we do, do not break the index
assert_matching_indices(df, transformed_df)| ## Python packaging | ||
|
|
||
| For projects that aim to develop pip-installable packages should follow current best-practices in Python Packaging. | ||
| As of May 2024, this is outlined in [this blog post](https://effigies.gitlab.io/posts/python-packaging-2023/) by lab member Chris Markiewicz. No newline at end of file |
There was a problem hiding this comment.
I might suggest https://www.pyopensci.org/python-package-guide/package-structure-code/python-package-build-tools.html as a more thorough guide.
| h=read_csv('https://raw.githubusercontent.com/poldrack/clean_coding/master/data/health.csv',index_col=0)[hc].dropna().mean(1) | ||
| ``` | ||
|
|
||
| Compare this with a modular, portable refactoring: | ||
|
|
||
| ```python | ||
| # load health data | ||
| def load_health_data(datadir, filename='health.csv'): | ||
| return pd.read_csv(os.path.join(datadir, filename), index_col=0) |
There was a problem hiding this comment.
These don't quite do the same things. If you're going to say, don't do A, do B, it would be good if A and B produced the same result.
Do you want to add something like:
data = load_health_data(datadir)
demeaned = data[columns].dropna().mean(1)Do you want to go into getting datadir from os.environ or sys.argv? Given the bullet points above, that might help make clear what the alternatives look like.
Addresses #31, #62