|
1 | 1 | # Coding standards |
2 | 2 |
|
3 | | -- Code should be readable |
| 3 | +Code is a core research product of the lab. |
| 4 | +We expect that lab members write code with the intention of it being reviewed (and potentially re-used) by other lab members. |
4 | 5 |
|
5 | | -- All lab members should be |
6 | | -familiar with principles of readable coding: |
| 6 | +To help in this, all lab members should be familiar with principles of clean code. |
| 7 | +Dr. Poldrack has a [presentation on this topic](https://github.com/poldrack/clean_coding/blob/master/CleanCoding_Python.pdf) that we encourage you to review. |
| 8 | +For a more in-depth introduction, two particularly useful recommendations here are: |
7 | 9 | - [Art of Readable Code](https://www.oreilly.com/library/view/the-art-of/9781449318482/) |
8 | 10 | - [Clean Code](https://www.oreilly.com/library/view/clean-code-a/9780136083238/) |
9 | 11 |
|
10 | | -- Code should be modular |
11 | | - - Functions should do a single |
12 | | - thing that is clearly expressed in the name of the function |
13 | | - - Functions should include a |
14 | | - docstring that clearly specifies input and output |
| 12 | +When writing readable code, many different design patterns can be followed. |
| 13 | +We therefore provide some general purpose recommendations below as well as some Python examples taken from [Dr. Poldrack's clean code tutorial](https://github.com/poldrack/clean_coding/tree/master/python_example). |
15 | 14 |
|
16 | | -- Code should be portable |
17 | | - - Any absolute paths should be |
18 | | - specified as a variable in a single location, or preferably as a |
19 | | - command line argument |
20 | | - - Any required environment |
21 | | - variables should be clearly described |
22 | | - - Any non-standard requirements |
23 | | - (e.g. Python libraries not available through PYPI) should be |
24 | | - described with instructions on how to install |
| 15 | +We also recommend checking out relevant tutorials, like the [Good Research Code Handbook](https://goodresearch.dev/index.html). |
25 | 16 |
|
26 | | -- Important functions should be |
27 | | -tested |
| 17 | +## Code should be modular |
28 | 18 |
|
| 19 | +Write code such that it can be reviewed as individual "units," each of which have one well-scoped function. |
29 | 20 |
|
30 | | - |
| 21 | +- Functions should do a single thing that is clearly expressed in the name of the function |
| 22 | +- Functions should include a docstring that clearly specifies input and output |
| 23 | + |
| 24 | +```python |
| 25 | +sc=[] |
| 26 | +for i in range(data.shape[1]): |
| 27 | + if data.columns[i].split('.')[0][-7:] == '_survey': |
| 28 | + sc=sc+[data.columns[i]] |
| 29 | +data=data[sc] |
| 30 | +``` |
| 31 | + |
| 32 | +Compare this with a modularized refactoring: |
| 33 | + |
| 34 | +```python |
| 35 | +def extract_surveys_from_behavioral_data(behavioral_data_raw): |
| 36 | + """ |
| 37 | + Extract survey data from behavioral data. |
| 38 | + survey variables are labeled <survey_name>_survey.<variable name> |
| 39 | + so filter on variables that include "_survey" in their name |
| 40 | +
|
| 41 | + Parameters |
| 42 | + ---------- |
| 43 | + behavioral_data_raw : pandas.DataFrame |
| 44 | + """ |
| 45 | + survey_variables = [i for i in behavioral_data_raw.columns if i.find('_survey') > -1] |
| 46 | + return(behavioral_data_raw[survey_variables]) |
| 47 | +``` |
| 48 | + |
| 49 | +## Code should be portable |
| 50 | + |
| 51 | +Aim to be able to execute your code on a new machine. |
| 52 | + |
| 53 | +- Any absolute paths should be specified as a variable in a single location, or preferably as a command line argument |
| 54 | +- Any required environment variables should be clearly described |
| 55 | +- Any non-standard requirements (e.g. Python libraries not available through PyPI) should be described with instructions on how to install |
| 56 | + |
| 57 | +Here is an example of what _not_ to do: |
| 58 | + |
| 59 | +```python |
| 60 | +h=read_csv('https://raw.githubusercontent.com/poldrack/clean_coding/master/data/health.csv',index_col=0)[hc].dropna().mean(1) |
| 61 | +``` |
| 62 | + |
| 63 | +Compare this with a modular, portable refactoring: |
| 64 | + |
| 65 | +```python |
| 66 | +# load health data |
| 67 | +def load_health_data(datadir, filename='health.csv'): |
| 68 | + return(pd.read_csv(os.path.join(datadir, filename), index_col=0)) |
| 69 | +``` |
| 70 | + |
| 71 | +## Important functions should be tested |
| 72 | + |
| 73 | +Functions that are critical for correct outputs should be tested. |
| 74 | +At a minimum, unit tests should be writen to check that the correctness of outputs. |
| 75 | +For example, here is a minimal unit test to ensure that two data frames have the same index: |
| 76 | + |
| 77 | +```python |
| 78 | +def confirm_data_frame_index_alignment(df1, df2): |
| 79 | + assert all(df1.index == df2.index) |
| 80 | +``` |
| 81 | + |
| 82 | +More detailed recommendations on testing---including testing frameworks such as PyTest---are available in the [Hitchhiker's Guide to Python](https://docs.python-guide.org/writing/tests/). |
| 83 | + |
| 84 | +## Python packaging |
| 85 | + |
| 86 | +For projects that aim to develop pip-installable packages should follow current best-practices in Python Packing. |
| 87 | +As of May 2024, this is outlined in [this blog post](https://effigies.gitlab.io/posts/python-packaging-2023/) by lab member Chris Markiewicz. |
0 commit comments