Skip to content

Commit 99db30b

Browse files
committed
doc: expand coding standards
1 parent e49e5af commit 99db30b

File tree

2 files changed

+78
-20
lines changed

2 files changed

+78
-20
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
.DS_Store
12
.token
23
/.quarto/
34
config.toml
Lines changed: 77 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,87 @@
11
# Coding standards
22

3-
- Code should be readable
3+
Code is a core research product of the lab.
4+
We expect that lab members write code with the intention of it being reviewed (and potentially re-used) by other lab members.
45

5-
- All lab members should be
6-
familiar with principles of readable coding:
6+
To help in this, all lab members should be familiar with principles of clean code.
7+
Dr. Poldrack has a [presentation on this topic](https://github.com/poldrack/clean_coding/blob/master/CleanCoding_Python.pdf) that we encourage you to review.
8+
For a more in-depth introduction, two particularly useful recommendations here are:
79
- [Art of Readable Code](https://www.oreilly.com/library/view/the-art-of/9781449318482/)
810
- [Clean Code](https://www.oreilly.com/library/view/clean-code-a/9780136083238/)
911

10-
- Code should be modular
11-
- Functions should do a single
12-
thing that is clearly expressed in the name of the function
13-
- Functions should include a
14-
docstring that clearly specifies input and output
12+
When writing readable code, many different design patterns can be followed.
13+
We therefore provide some general purpose recommendations below as well as some Python examples taken from [Dr. Poldrack's clean code tutorial](https://github.com/poldrack/clean_coding/tree/master/python_example).
1514

16-
- Code should be portable
17-
- Any absolute paths should be
18-
specified as a variable in a single location, or preferably as a
19-
command line argument
20-
- Any required environment
21-
variables should be clearly described
22-
- Any non-standard requirements
23-
(e.g. Python libraries not available through PYPI) should be
24-
described with instructions on how to install
15+
We also recommend checking out relevant tutorials, like the [Good Research Code Handbook](https://goodresearch.dev/index.html).
2516

26-
- Important functions should be
27-
tested
17+
## Code should be modular
2818

19+
Write code such that it can be reviewed as individual "units," each of which have one well-scoped function.
2920

30-
21+
- Functions should do a single thing that is clearly expressed in the name of the function
22+
- Functions should include a docstring that clearly specifies input and output
23+
24+
```python
25+
sc=[]
26+
for i in range(data.shape[1]):
27+
if data.columns[i].split('.')[0][-7:] == '_survey':
28+
sc=sc+[data.columns[i]]
29+
data=data[sc]
30+
```
31+
32+
Compare this with a modularized refactoring:
33+
34+
```python
35+
def extract_surveys_from_behavioral_data(behavioral_data_raw):
36+
"""
37+
Extract survey data from behavioral data.
38+
survey variables are labeled <survey_name>_survey.<variable name>
39+
so filter on variables that include "_survey" in their name
40+
41+
Parameters
42+
----------
43+
behavioral_data_raw : pandas.DataFrame
44+
"""
45+
survey_variables = [i for i in behavioral_data_raw.columns if i.find('_survey') > -1]
46+
return(behavioral_data_raw[survey_variables])
47+
```
48+
49+
## Code should be portable
50+
51+
Aim to be able to execute your code on a new machine.
52+
53+
- Any absolute paths should be specified as a variable in a single location, or preferably as a command line argument
54+
- Any required environment variables should be clearly described
55+
- Any non-standard requirements (e.g. Python libraries not available through PyPI) should be described with instructions on how to install
56+
57+
Here is an example of what _not_ to do:
58+
59+
```python
60+
h=read_csv('https://raw.githubusercontent.com/poldrack/clean_coding/master/data/health.csv',index_col=0)[hc].dropna().mean(1)
61+
```
62+
63+
Compare this with a modular, portable refactoring:
64+
65+
```python
66+
# load health data
67+
def load_health_data(datadir, filename='health.csv'):
68+
return(pd.read_csv(os.path.join(datadir, filename), index_col=0))
69+
```
70+
71+
## Important functions should be tested
72+
73+
Functions that are critical for correct outputs should be tested.
74+
At a minimum, unit tests should be writen to check that the correctness of outputs.
75+
For example, here is a minimal unit test to ensure that two data frames have the same index:
76+
77+
```python
78+
def confirm_data_frame_index_alignment(df1, df2):
79+
assert all(df1.index == df2.index)
80+
```
81+
82+
More detailed recommendations on testing---including testing frameworks such as PyTest---are available in the [Hitchhiker's Guide to Python](https://docs.python-guide.org/writing/tests/).
83+
84+
## Python packaging
85+
86+
For projects that aim to develop pip-installable packages should follow current best-practices in Python Packing.
87+
As of May 2024, this is outlined in [this blog post](https://effigies.gitlab.io/posts/python-packaging-2023/) by lab member Chris Markiewicz.

0 commit comments

Comments
 (0)