An assortment of helper functions for machine learning. Heavily overfit to my coding idiosyncrasies, and meant to be used in conjunction with my project template.
To have these helpers be globally available for all your projects, create a hidden directory in home (i.e. ~/.python
) and set your PYTHONPATH
in your bashrc via:
export PYTHONPATH=~/.python:$PYTHONPATH
For notebooks, store your boilerplate code in init.ipynb
and run
%run ~/.python/init
at the beginning of each new notebook.
Then you can use import ml_helpers as mlh
in all your projects.
These scripts are highly opinionated, meaning they enforce a strict directory structure and are only set up to work with my machine learning project skeleton. That said, they should be easy to modify for your own use.
Submissions happen entirely though python. See submit.py
for an example. User imports specifies a dictionary of job options and a dictionary of hyperparameters, and calls submit()
from job_submitter.py
. A few comments:
-
Rather than copying
static.py
andjob_submitter.py
into every new project, create a hidden directory in home (i.e.~/.python
) and set yourPYTHONPATH
in your bashrc via:export PYTHONPATH=~/.python:$PYTHONPATH
then you can use import job_submitter
in all your projects. Super handy to prevent the case where you have K copies of job_submitter.py and you've tweaked one of them but can't remember which.
submit.py
strictly enforces a few things (see comments withinsubmit.py
for more info). If you use my project skeleton, all requirements should be met. Just make sure to make a new directory for each experiment, and callsubmit.py
withinmodified_ml_project_skeleton/experiments/my_experiment_name/submit.py
. Strict enforcement of directory structure is to ensure the output from each experiment is self contained.
The logic of job_submitter.py
is broken into a few components, each of which should be easily modifiable for your own purposes:
- First it validates directory structure.
verify_dirs()
checks the required path exists, and loads global parameters to avoid passing paths around everywhere. Also creates a unique timestamped results directory within the experiments folder.
- Then process hyperparameters
- This expects a dictionary (or a list of dictionaries) in the format
and will return a list of strings in of the form
my_hypers = { "lr":[0.001, 0.0001, ...], "seed":[1, 2, ...], "other_hyper": ['cat'] }
note the extra single-quotes within the string. This is tailored for sacred's command line interface. Modify line 157 for a different string format.my_hypers_strings = [ "'lr=0.001' 'seed=1' 'other_hyper=cat'", "'lr=0.001' 'seed=2' 'other_hyper=cat'", "'lr=0.0001' 'seed=1' 'other_hyper=cat'", "'lr=0.0001' 'seed=2' 'other_hyper=cat'", ... ]
- This expects a dictionary (or a list of dictionaries) in the format
- Next, iterate through each hyperparameter string and ask the user if they want to submit the job. The purpose of this is that the first submission will invariably fail for some reason or another. Submit a test job, wait to see that it runs correctly, then submit the rest.
- When a user submits a job, two command line string are made in
make_commands()
. The first turns the hyperparameter string into a sacred-specific python command, which looks something likeModify line 209 for a different python command. The second produces the slurm command itself and shouldn't need to be modified.python_command = "python main.py with 'lr=0.001' 'seed=1' 'other_hyper=cat' "
- Finally, in
make_bash_script()
, a bash scriptsubmit.sh
is made and saved using a prewritten template instatic.py
and the previously made python command. Modify make_bash_script() and static.py for different slurm configurations. Line 79 actually calls the bash command.submit.sh
is rewritten each time to prevent a buildup of submit.sh files, but if just want to make them then submit them yourself for debugging purposes, use themanual_mode=True
flag.
test edit