Code and software practices
Maintaining good code is essential to the efficient functioning of the lab. Best software practices mandate that the code is well-documented, commented, and modular. As code may be shared within our lab (and outside), these guidlines ensure that the code is understandable and usable quickly. There are plenty of great resources online regarding how to follow these practices.
Here's how we may format our code as a lab:
Repository Types
-
Project (“Lab Notebook”): A repository for managing the day-to-day progress of a research project — analyses in progress, exploratory modeling, notes, and experiment logs. The cookie cutter repository (as described in the good research code handbook) is a good place to start, which will get you started with a file structure and environment. Each project should have its own repository.
-
Package: In the course of a project, you’ll likely develop something reusable — e.g., a model, simulation tool, or analysis pipeline. A package is a cleaned-up, documented, and versioned version of that tool, designed so others (future lab members and the broader community) can easily use and extend it. This repository should be jointly owned by you and the lab organization (e.g., under the lab’s GitHub org). As your project moves from an exploration to an exploitation phase, work with Viggy to create and maintain a package repository.
-
Publication: A repository for reproducing all figures and analyses in a specific paper. This repository is owned by the lab organization (since publications represent lab outputs). As you prepare to submit a manuscript, work with Viggy to set up this repository. The paper repository should depend on the relevant package (don’t reimplement analysis code) and should pin specific package versions (e.g., via requirements.txt, pyproject.toml, or git submodules) to ensure reproducibility. Include figure-generation scripts, processed data, and notebooks.
Conventions
Patrick Mineault's code handbook above is a must-read before you start coding. It reviews the best ways to set up your project, to maintain clarity and cleanliness, to test your code, and to document your code as well. The handbook is the best way to learn these conventions, especially when it comes to efficiency and testing, which rely on great examples. However a few helpful tips to get you started in the right direction
- Use
gitoften, often more than you think you should. SeeResources and How-Tos/basic_github.mdfor additional information. - Setup your repository correctly, from the beginning.
- Mineault's base directory structure works really well to keep the inputs and outputs of your code separate:
datafolder: raw input datadocsfolder: documentation, kept separate for later publishingresultsfolder: code outputs, figures, tables, CSVsscriptsfolder: scripts for analysis like.ipynbnotebookssrcsfolder: reusable code that works at the base of all the scriptstestsfolder: unit tests
- This can be downloaded with the
true-neutral-cookiecutter.
- Mineault's base directory structure works really well to keep the inputs and outputs of your code separate:
- Maintain your environments. Packages you import may have dependences on other packages with specific versions, or perhaps even specific versions of python. Installing all packages into one environment will make it extremely difficult to move the code to another computer or user (i.e. it's less portable).
- We can use
condaas a package manager and virtual environment manager for this. condacan be installed via miniconda, which is a lighter-weight version of the original Anaconda Distribution of packages/python. This will allow you to usecondato make virtual environments and manage packages.- Use a
.ymlfile. These are special files that specify instructions on which packages comprise an environment. A different researcher using your code can automatically recreate your environment exactly as you have it by using this file, ensuring proper portability.
- We can use
- Maintain good style.
- This includes commenting, which should be done at the top of functions, classes, modules, files, etc. This helps both you and others understand code.
- Pythonic code can be confusing sometimes; you'll need to make the tradeoff between clarity and brevity.
- Download a linter (e.g. Ruff) to help you format your code.
VS Code
VS Code is a great IDE because it's open source (unlike e.g. PyCharm), highly customizable via extensions, and has great remote development support (which is important when working with the cluster). You can download it here. If you are unfamiliar with VS Code/IDEs in general, see the VS Code tutorial for more info on how to get started.