This repository is an example data analysis workflow with
targets
. The pipeline reads the
data from a file, preprocesses it, visualizes it, and fits a regression
model.
You can try out this example project as long as you have a browser and an internet connection. Click here to navigate your browser to an RStudio Cloud instance. Alternatively, you can clone or download this code repository and install the R packages listed here.
- Open the R console and call
renv::restore()
to install the required R packages. - call the
tar_make()
function to run the pipeline. - Then, call
tar_read(hist)
to retrieve the histogram. - Experiment with other
functions
such as
tar_visnetwork()
to learn how they work.
The most important files are:
├── _targets.R
├── R/
├──── functions.R
├── data/
├──── raw_data.csv
└── index.Rmd
File | Purpose |
---|---|
_targets.R |
The special R script that declares the targets pipeline. See tar_script() for details. |
R/functions.R |
An R script with user-defined functions. Unlike _targets.R , there is nothing special about the name or location of this script. In fact, for larger projects, it is good practice to partition functions into multiple files. |
data/raw_data.csv |
The raw airquality dataset. |
index.Rmd
:
an R Markdown report that reruns in the pipeline whenever the histogram
of ozone changes
(details).
Minimal pipelines with low resource requirements are appropriate for
continuous deployment. For example, when this particular GitHub
repository is updated, its targets
pipeline runs in a GitHub Actions
workflow. The
workflow pushes the results to the
targets-runs
branch, and GitHub Pages hosts the latest
version of the rendered R Markdown report at
https://wlandau.github.io/targets-minimal/. Subsequent runs restore
the output files from the previous run so that up-to-date targets do not
rebuild. Follow these steps to set up continuous deployment for your own
minimal pipeline:
- Ensure your project stays within the storage and compute limitations
of GitHub (i.e. your pipeline is minimal). For storage, you may
choose the AWS-backed storage
formats
(e.g.
tar_target(..., format = "aws_qs")
) for large outputs to reduce the burden on GitHub storage. - Ensure GitHub Actions are enabled in the Settings tab of your GitHub repository’s website.
- Set up your project with
renv
(details here).- Call
targets::tar_renv(extras = character(0))
to write a_packages.R
file to expose hidden dependencies. - Call
renv::init()
to initialize therenv
lockfilerenv.lock
orrenv::snapshot()
to update it. - Commit
renv.lock
to your Git repository.
- Call
- Write the
.github/workflows/targets.yaml
workflow file usingtargets::tar_github_actions()
and commit this file to Git. - Push to GitHub. A GitHub Actions workflow should run the pipeline
and upload the results to the
targets-runs
branch of your repository. Subsequent runs should add new commits but not necessarily rerun targets.