We have a DVC Pipeline defined in dvc.yaml file.
The pipeline is composed of stages using Python scripts, defined in src:
flowchart TD
node2[eval]
node3[get-data]
node4[split-data]
node5[train]
node3-->node4
node4-->node2
node4-->node5
node5-->node2
We use DVC Params, defined in params.yaml, to configure the pipeline.
The pipeline enables local reproducibility
and can be run with dvc repro
/ dvc exp run
:
$ export GITHUB_TOKEN={YOUR_GITHUB_TOKEN}
$ export LOGURU_LEVEL=INFO
$ dvc exp run -S train.epochs=8
The pipeline generates DVC Metrics and DVC Plots to evaluate model performance, which can be found in outs
$ dvc exp diff
$ dvc plots diff --open
Because the metrics and plots files are small enough to be tracked by git
, after we run the pipeline we can share the results with others:
git add `dvc.lock` outs
git push
You can connect the repo with https://studio.iterative.ai/ in order to have a better visualization for the metrics, parameters and plots associated to each commit:
https://studio.iterative.ai/user/daavoo/views/workshop-uncool-mlops-5fgmd70rkt
However, the rest of the outputs are gitignored because they are too big to be tracked by git
.