Skip to content

Latest commit

Β 

History

History
134 lines (108 loc) Β· 7.1 KB

README.md

File metadata and controls

134 lines (108 loc) Β· 7.1 KB

Cookiecutter for Science Projects

A cookiecutter template for science and data science projects that include data, code, and dissemination.

  • Optimized for data-based publications
  • Optimized for use with VS Code
  • Docker-based, version-controlled environment using VS Code Dev Containers
  • conda based environment inside the Dev Container - just add packages to envrionment.yml and rebuild. Same environment for the whole team
  • use of Dev container Features with pre-installed, Python, oh-my-zsh and LaTeX
  • Optimised for use with Python but could also be used with Julia, and R
  • Make commands for: collecting data, generating, figures, typsetting latex, clean temp files, clean demo files
  • use of VS Code tasks to trigger data collection, plotting and paper compilation
  • LaTeX-based paper
  • Added path definitions in the project_package Python module
  • Kedro-inspired data folder structure
  • filled with a demo - which can be cleaned with "make delete_demo"
  • used in at least 5 papers

For more detailed information, please see the README of the resulting project.

Quick Start

cookiecutter https://github.com/tgoelles/cookiecutter_science

File Structure

β”œβ”€β”€ .devcontainer                      # Definition of the Docker container and environment for VS Code
β”‚   β”œβ”€β”€ Dockerfile                     # Defines the Docker container
β”‚   β”œβ”€β”€ devcontainer.json              # Defines the devcontainer settings for VS Code
β”‚   └── noop.txt                       # Placeholder file to ensure the COPY instruction does not fail if no environment.yml exists
β”œβ”€β”€ .gitattributes                     # Git attributes for handling line endings and merge strategies
β”œβ”€β”€ .gitignore                         # Git ignore file to exclude files and directories from version control
β”œβ”€β”€ Makefile                           # Makefile with commands like `make data` and `make clean`
β”œβ”€β”€ README.md                          # Project readme
β”œβ”€β”€ code                               # Source code and notebooks
β”‚   β”œβ”€β”€ notebooks                      # Jupyter notebooks
β”‚   β”‚   └── exploratory                # Data explorations
β”‚   β”‚       └── 1.0-tg-example.ipynb   # Jupyter notebook with naming conventions. tg are initials
β”‚   β”œβ”€β”€ project_package                # Project-specific Python package
β”‚   β”‚   β”œβ”€β”€ __init__.py                # Makes project_package a Python module
β”‚   β”‚   β”œβ”€β”€ data                       # Scripts to download, generate and parse data
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ config.py              # Project-wide path definitions
β”‚   β”‚   β”‚   β”œβ”€β”€ example.py             # Example script
β”‚   β”‚   β”‚   β”œβ”€β”€ import_data.py         # Functions to read raw data
β”‚   β”‚   β”‚   └── make_dataset.py        # Scripts to download or generate data (used in the Makefile)
β”‚   β”‚   β”œβ”€β”€ tools                      # Scripts and functions for general use
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   └── convert_latex.py       # Functions to convert elements for use in LaTeX
β”‚   β”‚   └── visualization              # Scripts and functions to create visualizations
β”‚   β”‚       β”œβ”€β”€ __init__.py
β”‚   β”‚       β”œβ”€β”€ make_plots.py          # Scripts to make all plots for the publication
β”‚   β”‚       └── visualize.py           # Functions to produce final plots
β”‚   └── pyproject.toml                 # Configuration file for the project
β”œβ”€β”€ data                               # Data directories
β”‚   β”œβ”€β”€ 01_raw                         # The original, immutable data dump
β”‚   β”‚   └── demo.csv                   # Example raw data file
β”‚   β”œβ”€β”€ 02_intermediate                # Intermediate processed data
β”‚   β”œβ”€β”€ 03_primary                     # cleanes data, used for the publication
β”‚   β”œβ”€β”€ 04_feature                     # For Machine learning, features based on the primary data
β”‚   β”œβ”€β”€ 05_model_input                 # The final data used for machine learning
β”‚   β”œβ”€β”€ 06_models                      # Stored, serialized pre-trained machine learning models
β”‚   β”œβ”€β”€ 07_model_output                # Output from trained machine learning models
β”‚   └── 08_reporting                   # Reporting data like log files
β”œβ”€β”€ dissemination                      # Materials for dissemination
β”‚   β”œβ”€β”€ figures                        # Figures for paper generated with Python
β”‚   β”‚   └── demo.png                   # Example figure file
β”‚   β”œβ”€β”€ presentations                  # All related PowerPoint files, especially for deliverables
β”‚   └── papers                         # LaTeX-based papers
β”‚       └── paper.tex                  # Example LaTeX paper
β”œβ”€β”€ environment.yml                    # Conda environment configuration file
└── literature                         # References and explanatory materials
    └── references.bib                 # Bibliography file for LaTeX documents

Tasks

Use of VS Code tasks:

VS code Tasks

Requirements

  • Git: Should be part of your OS or install it here
  • GitHub account
  • GitHub CLI: Install from here
  • Docker Desktop: Install from here
  • VS Code: Install from here
  • VS Code Extension: Remote Development: Install from here
  • Cookiecutter Python package: Install like this:
pip install cookiecutter

For Mac users:

brew install cookiecutter

Getting Started

  1. Navigate to the folder where you want to create the project (on your local drive):

    cookiecutter https://github.com/tgoelles/cookiecutter_science
  2. Answer the questions prompted by cookiecutter.

  3. A new VS Code window will open automatically.

  4. Click "OK" to reopen the folder in a container (only asked the first time).

  5. Read the README.md in the generated project folder.

Git and GitHub

Cookiecutter can generate a GitHub repository for you. This initializes the git repo and pushes it to GitHub. You can then invite your team members to join the project.

  • Each team member works on their local version of the project, regularly committing and pushing changes.
  • Avoid working on the same folder over a network.

Note for Windows Users

If you want to use git inside the container (recommended), you need to clone the repo from WSL, as Windows may mess up the .git folder. Git inside the container uses the same .gitconfig as Windows, which is copied into the container.

Ensure user.email and user.name are set (in PowerShell):

git config --global user.name "your_name"
git config --global user.email "[email protected]"