Skip to content

Commit

Permalink
release 2.1 (#41)
Browse files Browse the repository at this point in the history
* feat: add option to specify a temporary folder for the experiment. (#5)

* added option to rsync input and output data

* added docstring

* logging to stdout now

* fixed script for clusters - now using slurm tmpdir to write temp results

* fixing travis

* added missing docstring

* fixed tensorflow part (method signature change)

* renamed variables

* Seed for reproducibility (#6)

* added option to rsync input and output data

* added docstring

* logging to stdout now

* fixed script for clusters - now using slurm tmpdir to write temp results

* fixing travis

* added missing docstring

* fixed tensorflow part (method signature change)

* added seed for pytorch

* fixed typo

* added comment on how to use seed

* fixed flake8

* added test on reproducibility

* removed pytorch part from tensorflow

* fixed cookiecutter syntax

* added check for tensorflow

* fixed typo in test file

* added command to set the seed in tensorflow

* fixed flake8 error

* fixed typos

* removed duplicate log

* typo in docstring

* better error message in test

* added test to check repro using Orion (#8)

* added test to check repro using Orion

* more log into travis

* more info to debug travis

* running two trials for orion

* added seed to orion

* added orion test to tensorflow part

* better log messages in travis

* Add support for keras and Pytorch Lighning (#12)

* added code for keras - still need to complete all tests

* fixed flake8

* started adding PyTorch Lightning support - note that mlflow and loading/saving model still does not work

* fixed api change

* fixed pytorch early stopping

* fixed flake8

* fixed flake8 for pytorch version

* fixed keras part for flake8

* added code to resume a model - for pytorch lightning

* removed forgotten diff

* fixed start_from_scratch (not loading a model even if present) / now printing the val loss in the logs

* pytorch lightning now correctly logging under the same run

* now pytorch is correctly resuming training and continues to plot in the same mlflow run

* added github actions

* using a different ubuntu image

* printing folder - trying to fix github actions

* telling git who I am..

* removed not useful test

* fixed typo in test folder

* removed travis configuration - using github actions from now on

* correctly handling the saved models in pytorch

* now passing the full hyper-parameter object to train_impl method (for more flexibility).

* added option to ask for gpus in pytorch

* improved error message

* Fixups for the lightning_and_keras PR (#12) (#22)

* Update torch model to pl-lightning model

* Refactor train+model impl w/ optim module

* Refactor data loader w/ data module for plightning

* removing codecov from cookeicutter. (#24)

* moving to github actions (#25)

* removing coverage computation

* moving from travis to gitbug actions.

* setting fake name/email for git.

* removed (not-correct) duplicate for github actions config file.

* fixing tests.

* refactored pytorch models. (#26)

Co-authored-by: Mirko Bronzi <[email protected]>

* running CI also on develop.

Co-authored-by: Pierre-Luc St-Charles <[email protected]>

* Adding more CI backends. (#27)

* added github actions.

* moved python version to 3.9 - by default.

* added support for azure continuous integration.

* updated mlflox/orion dependencies.

* now correctly restoring models for pytorch. (#28)

* Now running test-coverage locally. (#30)

* running test coverage locally.

* fixed project name.

* correctly allowing mlflow to work in any folder. (#29)

* removed duplicate CI.

* Update cookiecutter doc url (#37)

* made the template generic by default - will add mila-specific aspects only if enabled at template instantiation time (#38)

* default branch is now main (#39)

* made the template generic by default - will add mila-specific aspects only if enabled at template instantiation time

* now using main as the default branch for github

* Fixed typo

Co-authored-by: Pierre-Luc St-Charles <[email protected]>
Co-authored-by: Mathieu Germain <[email protected]>
  • Loading branch information
3 people authored Aug 20, 2021
1 parent f139850 commit f5c9701
Show file tree
Hide file tree
Showing 18 changed files with 43 additions and 110 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ A cookiecutter is a generic project template that will instantiate a new project
* Flake8
* Pytest

More information on what a cookiecutter is [here.](https://cookiecutter.readthedocs.io/en/)
More information on what a cookiecutter is [here.](https://cookiecutter.readthedocs.io)

Quickstart
----------
Expand Down
1 change: 1 addition & 0 deletions cookiecutter.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"project_short_description": "{{ cookiecutter.project_name }} is wonderful!",
"python_version": "3.8",
"dl_framework": ["pytorch", "tensorflow_cpu", "tensorflow_gpu"],
"environment": ["generic", "mila"],
"pypi_username": "{{ cookiecutter.github_username }}",
"version": "0.0.1",
"open_source_license": ["MIT license", "BSD license", "ISC license", "Apache Software License 2.0", "GNU General Public License v3", "Not open source"]
Expand Down
4 changes: 2 additions & 2 deletions {{cookiecutter.project_slug}}/.github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ on:
# but only for the main/develop branch
push:
branches:
- master
- main
- develop
pull_request:
branches:
- master
- main
- develop
jobs:
build:
Expand Down
29 changes: 15 additions & 14 deletions {{cookiecutter.project_slug}}/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
[![Build Status](https://travis-ci.com/{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}.png?branch=master)](https://travis-ci.com/{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }})

{% set is_open_source = cookiecutter.open_source_license != 'Not open source' -%}

# {{ cookiecutter.project_name }}
Expand Down Expand Up @@ -46,9 +44,12 @@ These hooks will:
Go on github and follow the instructions to create a new project.
When done, do not add any file, and follow the instructions to
link your local git to the remote project, which should look like this:
(PS: these instructions are reported here for your convenience.
We suggest to also look at the GitHub project page for more up-to-date info)

git remote add origin [email protected]:{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}.git
git push -u origin master
git branch -M main
git push -u origin main

### Setup Continuous Integration

Expand All @@ -66,7 +67,7 @@ Check the following instructions for more details.
Github actions are already configured in `.github/workflows/tests.yml`.
Github actions are already enabled by default when using Github, so, when
pushing to github, they will be executed automatically for pull requests to
`master` and to `develop`.
`main` and to `develop`.

#### Travis

Expand Down Expand Up @@ -120,12 +121,10 @@ Note you have two new folders now:
You can run mlflow from this folder (`examples/local`) by running
`mlflow ui`.

#### Run on the Mila cluster
(NOTE: this example also apply to Compute Canada - use the folders
`slurm_cc` and `slurm_cc_orion` instead of `slurm_mila` and `slurm_mila_orion`.)
#### Run on a remote cluster (with Slurm)

First, bring you project on the Mila cluster (assuming you didn't create your
project directly there). To do so, simply login on the Mila cluster and git
First, bring you project on the cluster (assuming you didn't create your
project directly there). To do so, simply login on the cluster and git
clone your project:

git clone [email protected]:{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug }}.git
Expand All @@ -135,12 +134,13 @@ Then activate your virtual env, and install the dependencies:
cd {{ cookiecutter.project_slug }}
pip install -e .

To run with SLURM, just:
To run with Slurm, just:

cd examples/slurm_mila
cd examples/slurm
sh run.sh

Check the log to see that you got an almost perfect loss (i.e., 0).
{%- if cookiecutter.environment == 'mila' %}

#### Measure GPU time (and others) on the Mila cluster

Expand Down Expand Up @@ -184,11 +184,12 @@ In a separate shell on your local computer, run the following command:
where `<username>` is your user name on the Mila cluster and `<hostname>` is the name of the machine your job is currenty running on (`leto35` in our example). You can then navigate your local browser to `http://localhost:19999/` to view the ressources being used on the cluster and monitor your job. You should see something like this:

![image](https://user-images.githubusercontent.com/18450628/88088807-fe2acd80-cb58-11ea-8ab2-bd090e8a826c.png)
{%- endif %}

#### Run with Orion on the Mila cluster
#### Run with Orion on the Slurm cluster

This example will run orion for 2 trials (see the orion config file).
To do so, go into `examples/slurm_mila_orion`.
To do so, go into `examples/slurm_orion`.
Here you can find the orion config file (`orion_config.yaml`), as well as the config
file (`config.yaml`) for your project (that contains the hyper-parameters).

Expand All @@ -204,7 +205,7 @@ Inside these folders, you can find the models (the best one and the last one), t
the hyper-parameters for this trial, and the log file.

You can check orion status with the following commands:
(to be run from `examples/slurm_mila_orion`)
(to be run from `examples/slurm_orion`)

export ORION_DB_ADDRESS='orion_db.pkl'
export ORION_DB_TYPE='pickleddb'
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
#!/bin/bash
#SBATCH --partition=long
{%- if cookiecutter.environment == 'mila' %}
## this is for the mila cluster (uncomment it if you need it):
##SBATCH --account=rrg-bengioy-ad
## this instead for ComputCanada (uncomment it if you need it):
##SBATCH --partition=long
# to attach a tag to your run (e.g., used to track the GPU time)
# uncomment the following line and add replace `my_tag` with the proper tag:
##SBATCH --wckey=my_tag
{%- endif %}
{%- if cookiecutter.environment == 'generic' %}
## set --account=... or --partition=... as needed.
{%- endif %}
#SBATCH --cpus-per-task=2
#SBATCH --gres=gpu:1
#SBATCH --mem=5G
Expand Down
16 changes: 0 additions & 16 deletions {{cookiecutter.project_slug}}/examples/slurm_cc/to_submit.sh

This file was deleted.

23 changes: 0 additions & 23 deletions {{cookiecutter.project_slug}}/examples/slurm_cc_orion/to_submit.sh

This file was deleted.

14 changes: 0 additions & 14 deletions {{cookiecutter.project_slug}}/examples/slurm_mila/config.yaml

This file was deleted.

2 changes: 0 additions & 2 deletions {{cookiecutter.project_slug}}/examples/slurm_mila/run.sh

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,16 +1,23 @@
#!/bin/bash
# __TODO__ fix options if needed
#SBATCH --job-name={{ cookiecutter.project_slug }}
#SBATCH --partition=long
{%- if cookiecutter.environment == 'mila' %}
## this is for the mila cluster (uncomment it if you need it):
##SBATCH --account=rrg-bengioy-ad
## this instead for ComputCanada (uncomment it if you need it):
##SBATCH --partition=long
# to attach a tag to your run (e.g., used to track the GPU time)
# uncomment the following line and add replace `my_tag` with the proper tag:
##SBATCH --wckey=my_tag
{%- endif %}
{%- if cookiecutter.environment == 'generic' %}
## set --account=... or --partition=... as needed.
{%- endif %}
#SBATCH --cpus-per-task=2
#SBATCH --gres=gpu:1
#SBATCH --mem=5G
#SBATCH --time=0:05:00
#SBATCH --output=logs/%x__%j.out
#SBATCH --error=logs/%x__%j.err
# to attach a tag to your run (e.g., used to track the GPU time)
# uncomment the following line and add replace `my_tag` with the proper tag:
##SBATCH --wckey=my_tag
# remove one # if you prefer receiving emails
##SBATCH --mail-type=all
##SBATCH --mail-user={{ cookiecutter.email }}
Expand Down

0 comments on commit f5c9701

Please sign in to comment.