Keeping well organized data science projects is difficult, but important. Ideally every project should be reproducible---that is, anyone starting from the same data, code, and similar hardware should be able to obtain the same results.
The goal of this template is to make our data science work at Talus reproducible and approachable. Using this template ensures that we organize code and data consistently across projects. This means when you join another project that is organized using this template, you will immediately know the lay of the land.
This template uses cookiecutter to create a directory structure for a repository that is consistent and well-suited for data science projects. There are biases and assumptions in this template: we assume that Python will be your primary tool (although it doesn't have to be) and that some amount of modeling will be performed as part of the project (although there doesn't have to be). As such, this template is intended to serve as a guideline rather than a rule and you should feel free to modify it as needed for your specific project.
This template was adapted from the The Cookiecutter Data Science Project and Bill Noble's "A Quick Guide to Organizing Computational Biology Projects".
See our documentation for how to use this template at https://TalusBio.github.io/cookiecutter-data-science