diff --git a/_quarto.yml b/_quarto.yml
index 868e3b89..104ec55e 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -5,16 +5,14 @@ book:
title: "DevOps for Data Science"
author: "Alex K Gold"
page-footer:
- left: "Copyright 2022, Alex K Gold"
- right:
- - icon: github
- href: https://github.com/akgold/do4ds
- - icon: twitter
- href: https://twitter.com/alexkgold
+ left: |
+ DevOps for Data Science was written by Alex K Gold.
+ right: |
+ This book was built with Quarto.
search: true
site-url: https://do4ds.com
repo-url: https://github.com/akgold/do4ds
- repo-actions: [edit]
+ repo-actions: [edit, issue]
sharing: [twitter, facebook]
google-analytics: "G-EQR1RYSHQK"
chapters:
@@ -56,16 +54,25 @@ book:
execute:
eval: false
+code-line-numbers: false
+number-depth: 1
+toc-depth: 1
+width: 80%
+fig-align: "left"
+output: asis
+
+
format:
html:
theme: flatly
- code-line-numbers: false
- number-depth: 2
- toc-depth: 2
+
# docx:
# toc: true
# standalone: true
-
+ # pdf:
+ # keep-tex: true
+ # fig-pos: H
+ # documentclass: krantz
filters:
- include-code-files
diff --git a/chapters/append/auth.qmd b/chapters/append/auth.qmd
index 803fba4e..1854cf63 100644
--- a/chapters/append/auth.qmd
+++ b/chapters/append/auth.qmd
@@ -22,12 +22,12 @@ services and legacy systems that were designed for on-prem software.
| Auth Technology | Token-Based? | "Modern"? |
|-----------------|--------------|-----------|
-| Service-based | ❌ | ❌ |
-| Linux Accounts | ❌[^auth-2] | ❌ |
-| LDAP/AD | ❌ | ❌ |
-| Kerberos | ✅ | ❌ |
-| SAML | ✅ | ✅ |
-| OAuth | ✅ | ✅ |
+| Service-based | No | No |
+| Linux Accounts | No[^auth-2] | No |
+| LDAP/AD | No | No |
+| Kerberos | Yes | No |
+| SAML | Yes | Yes |
+| OAuth | Yes | Yes |
[^auth-2]: To be precise, possible if integrated with Kerberos, but
unlikely.
@@ -94,7 +94,7 @@ against different systems. The most common is to authenticate against
the underlying Linux server, but it can also use LDAP/AD (common) or
Kerberos tickets (uncommon).
-![](images/auth-pam.png){width="539"}
+![](images/auth-pam.png){fig-alt="A visual representation of PAM auth flow as described above.."}
PAM can also be used to do things when users log in. The most common of
these is initializing Kerberos tickets to connect with databases or
@@ -137,8 +137,7 @@ base*, which is the subtree to look for users inside. Additionally, you
may configure LDAP/AD with *bind credentials* of a service account to
authenticate to the LDAP/AD server itself.
-![](images/auth-ldap.png){fig-alt="A diagram of the LDAP flow. 1-User provides username and password to service. 2-service sends bind credentials w/ ldapsearch to LDAP server. 3-LDAP server checks credentials. 4-LDAP server returns results to service."
-width="600"}
+![](images/auth-ldap.png){fig-alt="A diagram of the LDAP flow. 1-User provides username and password to service. 2-service sends bind credentials w/ ldapsearch to LDAP server. 3-LDAP server checks credentials. 4-LDAP server returns results to service."}
::: callout-note
Depending on your application and LDAP/AD configuration, it may be
@@ -196,8 +195,7 @@ KDC again along with the service they're trying to access and get a
*session key* (sometimes referred to as a *service ticket*) that allows
access to a particular service.
-![](images/auth-kerberos.png){fig-alt="The kerberos flow. 1 - User on server requests TGT from KDC. 2 - TGT granted. 3 - user requests service ticket with TGT from KDC. 4 - Service ticket granted. 5 - user uses service ticket to access service."
-width="553"}
+![](images/auth-kerberos.png){fig-alt="The kerberos flow. 1 - User on server requests TGT from KDC. 2 - TGT granted. 3 - user requests service ticket with TGT from KDC. 4 - Service ticket granted. 5 - user uses service ticket to access service."}
Kerberos is only used inside a corporate network and is tightly linked
to the underlying servers. That makes it very secure. Even if someone
@@ -241,7 +239,7 @@ you.[^auth-3]
[^auth-3]: The diagram assumes you don't already have a token in your
browser. If the user has a token already, steps 2-5 get skipped.
-![](images/auth-saml.png){width="600"}
+![](images/auth-saml.png){fig-alt="A visual representation of SAML auth flow as described above."}
A SAML token contains several *claims*, which usually include a username
and may include groups or other attributes. Whoever controls the IdP can
@@ -286,7 +284,7 @@ For example, if you want to read my Google Calendar, you need a JWT that
includes a claim granting *read* access against the scope of *events on
Alex's calendar*.
-![](images/auth-oauth.png){width="600"}
+![](images/auth-oauth.png){fig-alt="A visual representation of OAuth flow as described above."}
Unlike in SAML where action occurs via browser redirects, OAuth makes no
assumptions about how this flow happens. The process of requesting and
diff --git a/chapters/append/cheatsheets.qmd b/chapters/append/cheatsheets.qmd
index 34b5c1c9..a060d1ac 100644
--- a/chapters/append/cheatsheets.qmd
+++ b/chapters/append/cheatsheets.qmd
@@ -4,218 +4,225 @@
### Checking library + repository status
-| Step | R Command | Python Command |
-|------------------------------------|------------------|------------------|
-| Check whether library is in sync with lockfile. | `re nv::status()` | None |
++-------------------------+---------------+----------------------+
+| **Step** | R Command | Python Command |
++-------------------------+---------------+----------------------+
+| Check whether library | `re | None |
+| is in sync with | nv::status()` | |
+| lockfile. | | |
++-------------------------+---------------+----------------------+
### Creating and using a standalone project library
Make sure you're in a standalone project library.
-
-
-
-
-
-
-
-
-Step |
-R Command |
-Python Command |
-
-
-Create a standalone library. |
-renv::init()
-Tip: get {renv} w/
-install.p ackages("renv") |
-p ython -m venv <dir>
-Convention: use.venv for <dir>
-Tip: {venv} included w/ Python 3.5+ |
-
-
-Activate project library. |
-r env::activate()
-Happens automatically if in RStudio project. |
-source <dir> /bin/activate
|
-
-
-Install packages as normal. |
-install.pa ckages("<pkg>")
|
-python - m pip install <pkg>
|
-
-
-Snapshot package state. |
-r env::snapshot()
|
-pip freeze > requirements.txt
|
-
-
-Exit project environment. |
-Leave R project or re n v::deactivate() |
-deactivate
|
-
-
-
++------------------+--------------------+----------------------+
+| **Step** | R Command | Python Command |
++------------------+--------------------+----------------------+
+| Create a | `renv::init()` | `p |
+| standalone | | ython -m venv ` |
+| library. | Tip: get `{renv}` | |
+| | w/ | Convention: use |
+| | `install | `.venv` for `` |
+| | .packages("renv")` | |
+| | | Tip: `{venv}` |
+| | | included w/ Python |
+| | | 3.5+ |
++------------------+--------------------+----------------------+
+| Activate project | ` | `source |
+| library. | renv ::activate()` | /bin/activate` |
+| | | |
+| | Happens | |
+| | automatically if | |
+| | in RStudio | |
+| | project. | |
++------------------+--------------------+----------------------+
+| Install packages | `install. | `python - |
+| as normal. | packages("")` | m pip install ` |
++------------------+--------------------+----------------------+
+| Snapshot package | `renv::snapshot()` | `pip freeze |
+| state. | | > requirements.txt` |
++------------------+--------------------+----------------------+
+| Exit project | Leave R project or | `deactivate` |
+| environment. | `r | |
+| | env::deactivate()` | |
++------------------+--------------------+----------------------+
### Collaborating on someone else's project
Start by downloading the project into a directory on your machine.
-
-
-
-
-
-
-
-
-Step |
-R Command |
-Python Command |
-
-
-Move into project directory. |
-set wd ("< p roject-dir>")
-Or open project in RStudio. |
-cd <project-dir>
|
-
-
-Create project environment. |
-renv::init()
|
-python -m venv <dir>
-Recommend: use .venv for
-<dir> |
-
-
-Enter project environment. |
-Happens automatically or ren v::activate() |
-source <dir> /bin/activate
|
-
-
-Restore packages. |
-Happens automatically or re nv::restore() |
-pip install -r requirements.txt
|
-
-
-
++-------------+--------------------------+----------------------+
+| **Step** | **R Command** | **Python Command** |
++-------------+--------------------------+----------------------+
+| Move into | `setwd("")` | `cd ` |
+| project | | |
+| directory. | Or open project in | |
+| | RStudio. | |
++-------------+--------------------------+----------------------+
+| Create | `renv::init()` | `p |
+| project | | ython -m venv ` |
+| e | | |
+| nvironment. | | Recommend: use |
+| | | `.venv` for `` |
++-------------+--------------------------+----------------------+
+| Enter | Happens automatically or | `source |
+| project | `renv::activate()`. | /bin/activate` |
+| e | | |
+| nvironment. | | |
++-------------+--------------------------+----------------------+
+| Restore | Happens automatically or | `pip install |
+| packages. | `renv::restore()`. | -r requirements.txt` |
++-------------+--------------------------+----------------------+
## HTTP code cheatsheet {#cheat-http}
As you work with HTTP traffic, you'll learn some of the common codes.
Here's are some of those used most frequently.
-| Code | Meaning |
-|----------------------|--------------------------------------------------|
-| $200$ | Everyone's favorite, a successful response. |
-| $\text{3xx}$ | Your query was redirected somewhere else, usually ok. |
-| $\text{4xx}$ | Errors with the request |
-| $400$ | Bad request. This isn't a request the server can understand. |
-| $401$/$403$ | Unauthorized or forbidden. Required authentication hasn't been provided. |
-| $404$ | Not found. There isn't any content to access here. |
-| $\text{5xx}$ | Errors with the server once your request got there. |
-| $500$ | Generic server-side error. Your request was received, but there was an error processing it. |
-| $504$ | Gateway timeout. This means that a proxy or gateway between you and the server you're trying to access timed out before it got a response from the server. |
++---------------+------------------------------------------------+
+| Code | Meaning |
++===============+================================================+
+| $200$ | Everyone's favorite, a successful response. |
++---------------+------------------------------------------------+
+| $\text{3xx}$ | Your query was redirected somewhere else, |
+| | usually ok. |
++---------------+------------------------------------------------+
+| $\text{4xx}$ | Errors with the request |
++---------------+------------------------------------------------+
+| $400$ | Bad request. This isn't a request the server |
+| | can understand. |
++---------------+------------------------------------------------+
+| $401$/$403$ | Unauthorized or forbidden. Required |
+| | authentication hasn't been provided. |
++---------------+------------------------------------------------+
+| $404$ | Not found. There isn't any content to access |
+| | here. |
++---------------+------------------------------------------------+
+| $\text{5xx}$ | Errors with the server once your request got |
+| | there. |
++---------------+------------------------------------------------+
+| $500$ | Generic server-side error. Your request was |
+| | received, but there was an error processing |
+| | it. |
++---------------+------------------------------------------------+
+| $504$ | Gateway timeout. This means that a proxy or |
+| | gateway between you and the server you're |
+| | trying to access timed out before it got a |
+| | response from the server. |
++---------------+------------------------------------------------+
## Git {#cheat-git}
-| | |
-|---------------------------|---------------------------------------------|
-| **Command** | **What it does** |
-| `git clone ` | Clone a remote repo -- make sure you're using SSH URL. |
-| `git add ` | Add files/directory to staging area. |
-| `git commit -m ` | Commit staging area. |
-| `git push origin ` | Push to a remote. |
-| `git pull origin ` | Pull from a remote. |
-| `git checkout ` | Checkout a branch. |
-| `git checkout -b ` | Create and checkout a branch. |
-| `git branch -d ` | Delete a branch. |
++----------------------+----------------------------------------+
+| **Command (prefixed | **What it does** |
+| with `git`**) | |
++----------------------+----------------------------------------+
+| `clone ` | Clone a remote repo -- make sure |
+| | you're using SSH URL. |
++----------------------+----------------------------------------+
+| `add ` | Add files/directory to staging area. |
++----------------------+----------------------------------------+
+| ` | Commit staging area. |
+| commit -m ` | |
++----------------------+----------------------------------------+
+| `p | Push to a remote. |
+| ush origin ` | |
++----------------------+----------------------------------------+
+| `p | Pull from a remote. |
+| ull origin ` | |
++----------------------+----------------------------------------+
+| `che | Checkout a branch. |
+| ckout ` | |
++----------------------+----------------------------------------+
+| `check o | Create and checkout a branch. |
+| ut -b ` | |
++----------------------+----------------------------------------+
+| `bran | Delete a branch. |
+| ch -d ` | |
++----------------------+----------------------------------------+
## Docker {#cheat-docker}
### Docker CLI commands
-
-
-
-
-
-
-
-
-
-Stage |
-Command |
-What it does |
-Notes and helpful options |
-
-
-Build |
-docker build <directory>
|
-Builds a directory into an image. |
--t <name:tag> provides a name to the
-container.
-tag is optional, defaults to
-latest .
|
-
-
-Move |
-docker push <image>
|
-Push a container to a registry. |
- |
-
-
-Move |
-docker pull <image>
|
-Pull a container from a registry. |
-Rarely needed because run pulls the container if
-needed. |
-
-
-Run |
-docker run <image>
|
-Run a container. |
-See flags in next table. |
-
-
-Run |
-docker stop <container>
|
-Stop a running container. |
-docker kill can be used if stop
-fails.
|
-
-
-Run |
-docker ps
|
-List running containers. |
-Useful to get container id to do things to
-it. |
-
-
-Run |
-docker exec <cont aine r> <command>
|
-Run a command inside a running container. |
-Basically always used to open a shell with
-d ocker exec -it <co ntainer> /bin/bash |
-
-
-Run |
-docker logs <container>
|
-Views logs for a container. |
- |
-
-
-
++------------+------------+------------+---------------------+
+| **Stage** | **Command | **What it | **Notes and helpful |
+| | (prefix w/ | does** | options** |
+| | ` | | |
+| | docker`)** | | |
++------------+------------+------------+---------------------+
+| Build | `build ` |
+| | irectory>` | directory | provides a name to |
+| | | into an | the container. |
+| | | image. | |
+| | | | `tag` is optional, |
+| | | | defaults to |
+| | | | `latest`. |
++------------+------------+------------+---------------------+
+| Move | `pus | Push a | |
+| | h ` | container | |
+| | | to a | |
+| | | registry. | |
++------------+------------+------------+---------------------+
+| Move | `pul | Pull a | Rarely needed |
+| | l ` | container | because `run` pulls |
+| | | from a | the container if |
+| | | registry. | needed. |
++------------+------------+------------+---------------------+
+| Run | `ru | Run a | See flags in next |
+| | n ` | container. | table. |
++------------+------------+------------+---------------------+
+| Run | `stop ` | running | be used if `stop` |
+| | | container. | fails. |
++------------+------------+------------+---------------------+
+| Run | `ps` | List | Useful to get |
+| | | running | container `id` to |
+| | | c | do things to it. |
+| | | ontainers. | |
++------------+------------+------------+---------------------+
+| Run | `exec | command | used to open a |
+| | ` | inside a | shell with |
+| | | running | ` |
+| | | container. | docker exec -it /bin/bash` |
++------------+------------+------------+---------------------+
+| Run | `logs ` | for a | |
+| | | container. | |
++------------+------------+------------+---------------------+
### Flags for `docker run`
-| | | |
-|-----------------------|-----------------------|---------------------------|
-| Flag | Effect | Notes |
-| `--name ` | Give a name to container. | Optional. Auto-assigned if not provided |
-| `--rm` | Remove container when its stopped. | Don't use in production. You probably want to inspect failed containers. |
-| `-d` | Detach container (don't block the terminal). | Almost always used in production. |
-| `-p :` | Publish port from inside running container to outside. | Needed if you want to access an app or API inside the container. |
-| `-v :` | Mount volume into the container. | |
++------------------+------------------+----------------------+
+| Flag | Effect | Notes |
++------------------+------------------+----------------------+
+| `--name ` | Give a name to | Optional. |
+| | container. | Auto-assigned if not |
+| | | provided |
++------------------+------------------+----------------------+
+| `--rm` | Remove container | Don't use in |
+| | when its | production. You |
+| | stopped. | probably want to |
+| | | inspect failed |
+| | | containers. |
++------------------+------------------+----------------------+
+| `-d` | Detach container | Almost always used |
+| | (don't block the | in production. |
+| | terminal). | |
++------------------+------------------+----------------------+
+| `- | Publish port | Needed if you want |
+| p :` | from inside | to access an app or |
+| | running | API inside the |
+| | container to | container. |
+| | outside. | |
++------------------+------------------+----------------------+
+| `-v :` | Mount volume | |
+| | into the | |
+| | container. | |
++------------------+------------------+----------------------+
*Reminder: Order for `-p` and `-v` is `:`*
@@ -223,250 +230,226 @@ it.
These are the commands that go in a Dockerfile when you're building it.
-| | | |
-|-------------------|-------------------------|-----------------------------|
-| Command | Purpose | Example |
-| `FROM` | Indicate base container. | `FROM rocker/r-ver:4.1.0` |
-| `RUN` | Run a command when building. | `RUN apt-get update` |
-| `COPY` | Copy from build directory into the container. | `COPY . /app/` |
-| `CMD` | Specify the command to run when the container starts. | `CMD quarto render .` |
++-----------------+--------------------+------------------------+
+| Command | Purpose | Example |
++-----------------+--------------------+------------------------+
+| `FROM` | Indicate base | `F R |
+| | container. | OM rocker/r-ver:4.1.0` |
++-----------------+--------------------+------------------------+
+| `RUN` | Run a command when | `RUN apt-get update` |
+| | building. | |
++-----------------+--------------------+------------------------+
+| `COPY` | Copy from build | `COPY . /app/` |
+| | directory into the | |
+| | container. | |
++-----------------+--------------------+------------------------+
+| `CMD` | Specify the | `CMD quarto render .` |
+| | command to run | |
+| | when the container | |
+| | starts. | |
++-----------------+--------------------+------------------------+
## Cloud services
-| | | | |
-|------------------|------------------|------------------|------------------|
-| **Service** | **AWS** | **Azure** | **GCP** |
-| Kubernetes cluster | EKS or Fargate | AKS | GKE |
-| Run a container or application | ECS or Elastic Beanstalk | Azure Container Apps | Google App Engine |
-| Run an API | Lambda | Azure Functions | Google Cloud Functions |
-| Database | RDS | Azure SQL | Google Cloud Database |
-| Data Warehouse | Redshift | DataLake | BigQuery |
-| ML Platform | SageMaker | Azure ML | Vertex AI |
-| NAS | EFS or FSx | Azure File | Filestore |
++--------------+--------------+--------------+--------------+
+| **Service** | **AWS** | **Azure** | **GCP** |
++--------------+--------------+--------------+--------------+
+| Kubernetes | EKS or | AKS | GKE |
+| cluster | Fargate | | |
++--------------+--------------+--------------+--------------+
+| Run a | ECS or | Azure | Google App |
+| container or | Elastic | Container | Engine |
+| application | Beanstalk | Apps | |
++--------------+--------------+--------------+--------------+
+| Run an API | Lambda | Azure | Google Cloud |
+| | | Functions | Functions |
++--------------+--------------+--------------+--------------+
+| Database | RDS | Azure SQL | Google Cloud |
+| | | | Database |
++--------------+--------------+--------------+--------------+
+| Data | Redshift | DataLake | BigQuery |
+| Warehouse | | | |
++--------------+--------------+--------------+--------------+
+| ML Platform | SageMaker | Azure ML | Vertex AI |
++--------------+--------------+--------------+--------------+
+| NAS | EFS or FSx | Azure File | Filestore |
++--------------+--------------+--------------+--------------+
## Command line {#cheat-cli}
### General command line
-| Symbol | What it is |
-|-----------------|------------------------------------|
-| `man ` | Open manual for `command`. |
-| `q` | Quit the current screen. |
-| `\` | Continue bash command on new line. |
-| `ctrl + c` | Quit current execution. |
-| `echo ` | Print string (useful for piping). |
++------------------------+---------------------------------------+
+| Symbol | What it is |
++========================+=======================================+
+| `man ` | Open manual for `command`. |
++------------------------+---------------------------------------+
+| `q` | Quit the current screen. |
++------------------------+---------------------------------------+
+| `\` | Continue bash command on new line. |
++------------------------+---------------------------------------+
+| `ctrl + c` | Quit current execution. |
++------------------------+---------------------------------------+
+| `echo ` | Print string (useful for piping). |
++------------------------+---------------------------------------+
### Linux filesystem navigation
-
-
-
-
-
-
-
-
-
-
-
-/
|
-System root or file path separator. |
- |
-
-
-.
|
-Current working directory. |
- |
-
-
-..
|
-Parent of working directory. |
- |
-
-
-~
|
-Home directory of the current user. |
- |
-
-
-ls <dir>
|
-List objects in a directory. |
--l - format as list
--a - all (include hidden files that start with
-. )
|
-
-
-pwd
|
-Print working directory. |
- |
-
-
-cd <dir>
|
-Change directory. |
-Can use relative or absolute paths. |
-
-
-
++----------------+----------------------+-----------------------+
+| Command | What it does/is | Notes + Helpful |
+| | | options |
++================+======================+=======================+
+| `/` | System root or file | |
+| | path separator. | |
++----------------+----------------------+-----------------------+
+| `.` | Current working | |
+| | directory. | |
++----------------+----------------------+-----------------------+
+| `..` | Parent of working | |
+| | directory. | |
++----------------+----------------------+-----------------------+
+| `~` | Home directory of | |
+| | the current user. | |
++----------------+----------------------+-----------------------+
+| `ls ` | List objects in a | `-l` - format as list |
+| | directory. | |
+| | | `-a` - all (include |
+| | | hidden files that |
+| | | start with `.`) |
++----------------+----------------------+-----------------------+
+| `pwd` | Print working | |
+| | directory. | |
++----------------+----------------------+-----------------------+
+| `cd ` | Change directory. | Can use relative or |
+| | | absolute paths. |
++----------------+----------------------+-----------------------+
### Reading text files
-
-
-
-
-
-
-
-
-Command |
-What it does |
-Notes + Helpful options |
-
-
-cat <file>
|
-Print a file from the top. |
- |
-
-
-less <file>
|
-Print a file, but just a little. |
-Can be very helpful to look at a few rows of csv.
-Lazily reads lines, so can be much faster than
-cat for big files. |
-
-
-head <file>
|
-Look at the beginning of a file. |
-Defaults to 10 lines, can specify a different number with
--n <n> . |
-
-
-tail <file>
|
-Look at the end of a file. |
-Useful for logs where the newest part is last.
-The -f flag is useful to follow for a live
-view. |
-
-
-grep <expression>
|
-Search a file using regex. |
-Writing regex can be a pain. I suggest testing on .
-Often useful in combination with the pipe. |
-
-
-|
|
-The pipe. |
- |
-
-
-wc <file>
|
-Count words in a file. |
-Use -l to count lines, useful for .csv
-files. |
-
-
-
++---------------+----------------+-------------------------------+
+| **Command** | **What it | **Notes + Helpful options** |
+| | does** | |
++---------------+----------------+-------------------------------+
+| `cat ` | Print a file | |
+| | from the top. | |
++---------------+----------------+-------------------------------+
+| `less ` | Print a file, | Can be very helpful to look |
+| | but just a | at a few rows of csv. |
+| | little. | |
+| | | Lazily reads lines, so can be |
+| | | *much* faster than `cat` for |
+| | | big files. |
++---------------+----------------+-------------------------------+
+| `head ` | Look at the | Defaults to 10 lines, can |
+| | beginning of a | specify a different number |
+| | file. | with `-n `. |
++---------------+----------------+-------------------------------+
+| `tail ` | Look at the | Useful for logs where the |
+| | end of a file. | newest part is last. |
+| | | |
+| | | The `-f` flag is useful to |
+| | | follow for a live view. |
++---------------+----------------+-------------------------------+
+| `grep | Search a file | Writing regex can be a pain. |
+| ` | using regex. | I suggest testing on . |
+| | | |
+| | | Often useful in combination |
+| | | with the pipe. |
++---------------+----------------+-------------------------------+
+| `|` | The pipe. | |
++---------------+----------------+-------------------------------+
+| `wc ` | Count words in | Use `-l` to count lines, |
+| | a file. | useful for `.csv` files. |
++---------------+----------------+-------------------------------+
### Manipulating files
-
-
-
-
-
-
-
-
-Command |
-What it does |
-Notes + Helpful Options |
-
-
-rm <path>
|
-Remove. |
--r - recursively remove everything below a file
-path
--f - force - dont ask for each file
-Be very careful, its permanent! |
-
-
-cp <from> <to>
|
-Copy. |
- |
-
-
-mv <from> <to>
|
-Move. |
- |
-
-
-*
|
-Wildcard. |
- |
-
-
-mkdir /rmdir
|
-Make/remove directory. |
--p - create any parts of path that dont
-exist
|
-
-
-
++---------------+----------------+-------------------------------+
+| **Command** | **What it | **Notes + Helpful Options** |
+| | does** | |
++---------------+----------------+-------------------------------+
+| `rm ` | Remove. | `-r` - recursively remove |
+| | | everything below a file path |
+| | | |
+| | | `-f` - force - dont ask for |
+| | | each file |
+| | | |
+| | | **Be very careful, its |
+| | | permanent!** |
++---------------+----------------+-------------------------------+
+| `cp | Copy. | |
+| ` | | |
++---------------+----------------+-------------------------------+
+| `mv | Move. | |
+| ` | | |
++---------------+----------------+-------------------------------+
+| `*` | Wildcard. | |
++---------------+----------------+-------------------------------+
+| `m | Make/remove | `-p` - create any parts of |
+| kdir`/`rmdir` | directory. | path that dont exist |
++---------------+----------------+-------------------------------+
### Move things to/from server
-
-
-
-
-
-
-
-
-Command |
-What it does |
-Notes + Helpful Options |
-
-
-tar
|
-Create/extract archive file. |
-Almost always used with flags.
-Create is usually
-tar -czf <archive name> <file(s)>
-Extract is usually
-tar -xfv <archive name> |
-
-
-scp
|
-Secure copy via ssh . |
-Run on laptop to server.
-Can use most ssh flags (like -i and
--v ). |
-
-
-
++-------------+---------------+--------------------------------+
+| **Command** | **What it | **Notes + Helpful Options** |
+| | does** | |
++-------------+---------------+--------------------------------+
+| `tar` | C | Almost always used with flags. |
+| | reate/extract | |
+| | archive file. | Create is usually |
+| | | `tar -czf ` |
+| | | |
+| | | Extract is usually |
+| | | `tar -xfv ` |
++-------------+---------------+--------------------------------+
+| `scp` | Secure copy | Run on laptop to server. |
+| | via `ssh`. | |
+| | | Can use most `ssh` flags (like |
+| | | `-i` and `-v`). |
++-------------+---------------+--------------------------------+
### Write files from the command line
-| Command | What it does | Notes |
-|--------------------|---------------------|-------------------------------|
-| `touch` | Creates file if doesn't already exist. | Updates last updated to current time if it does exist. |
-| `>` | Overwrite file contents. | Creates a new file if it doesn't exist. |
-| `>>` | Concatenate to end of file. | Creates a new file if it doesn't exist. |
++-------------+---------------+-------------------------------+
+| Command | What it does | Notes |
++=============+===============+===============================+
+| `touch` | Creates file | Updates last updated to |
+| | if doesn't | current time if it does |
+| | exist. | exist. |
++-------------+---------------+-------------------------------+
+| `>` | Overwrite | Creates a new file if it |
+| | file | doesn't exist. |
+| | contents. | |
++-------------+---------------+-------------------------------+
+| `>>` | Concatenate | Creates a new file if it |
+| | to end of | doesn't exist. |
+| | file. | |
++-------------+---------------+-------------------------------+
### Command line text editors (Vim + Nano)
-| Command | What it does | Notes + Helpful options |
-|--------------------|---------------------------|--------------------------|
-| `^` | Prefix for file command in `nano` editor. | It's the `⌘` or `Ctrl` key, not the caret symbol. |
-| `i` | Enter insert mode (able to type) in `vim`. | |
-| `escape` | Enter normal mode (navigation) in `vim`. | |
-| `:w` | Write the current file in `vim` (from normal mode). | Can be combined to save and quit in one, `:wq`. |
-| `:q` | Quit `vim` (from normal mode). | `:q!` quit without saving. |
++-------------+-----------------------------+-------------------+
+| Command | What it does | Notes + Helpful |
+| | | options |
++=============+=============================+===================+
+| `^` | Prefix for file command in | It's the `⌘` or |
+| | `nano` editor. | `Ctrl` key, not |
+| | | the caret symbol. |
++-------------+-----------------------------+-------------------+
+| `i` | Enter insert mode (able to | |
+| | type) in `vim`. | |
++-------------+-----------------------------+-------------------+
+| `escape` | Enter normal mode | |
+| | (navigation) in `vim`. | |
++-------------+-----------------------------+-------------------+
+| `:w` | Write the current file in | Can be combined |
+| | `vim` (from normal mode). | to save and quit |
+| | | in one, `:wq`. |
++-------------+-----------------------------+-------------------+
+| `:q` | Quit `vim` (from normal | `:q!` quit |
+| | mode). | without saving. |
++-------------+-----------------------------+-------------------+
## SSH {#cheat-ssh}
@@ -476,118 +459,147 @@ General usage:
ssh @
```
-| Flag | What it does | Notes |
-|-------------------|----------------------|------------------------------|
-| `-v` | Verbose, good for debugging. | Add more `v`s as you please, `-vv` or `-vvv`. |
-| `-i` | Choose identity file (private key). | Not necessary with default key names. |
++------------+-----------------+--------------------------------+
+| Flag | What it does | Notes |
++============+=================+================================+
+| `-v` | Verbose, good | Add more `v`s as you please, |
+| | for debugging. | `-vv` or `-vvv`. |
++------------+-----------------+--------------------------------+
+| `-i` | Choose identity | Not necessary with default key |
+| | file (private | names. |
+| | key). | |
++------------+-----------------+--------------------------------+
## Linux admin
### Users
-| | | |
-|--------------------|--------------------------------|--------------------|
-| **Command** | **What it does** | **Helpful options + notes** |
-| `su ` | Change to be a different user. | |
-| `whoami` | Get username of current user. | |
-| `id` | Get full user + group info on current user. | |
-| `passwd` | Change password. | |
-| `useradd` | Add a new user. | |
-| `usermo d ` | Modify user `username`. | `-aG ` adds to a group (e.g.,`sudo`) |
++---------------+--------------------------+-------------------+
+| **Command** | **What it does** | **Helpful |
+| | | options + notes** |
++---------------+--------------------------+-------------------+
+| `s | Change to be a different | |
+| u ` | user. | |
++---------------+--------------------------+-------------------+
+| `whoami` | Get username of current | |
+| | user. | |
++---------------+--------------------------+-------------------+
+| `id` | Get full user + group | |
+| | info on current user. | |
++---------------+--------------------------+-------------------+
+| `passwd` | Change password. | |
++---------------+--------------------------+-------------------+
+| `useradd` | Add a new user. | |
++---------------+--------------------------+-------------------+
+| `usermo | Modify user `username`. | `-aG ` |
+| d ` | | adds to a group |
+| | | (e.g.,`sudo`) |
++---------------+--------------------------+-------------------+
### Permissions
-| Command | What it does | Helpful options + notes |
-|-------------------------|---------------------|-------------------------|
-| `chmod ` | Modifies permissions on a file or directory. | Number indicates permissions for user, group, others: add `4` for read, `2` for write, `1` for execute, `0` for nothing, e.g.,`644`. |
-| `chown ` | Change the owner of a file or directory. | Can be used for user or group, e.g.,`:my-group`. |
-| `sudo ` | Adopt root permissions for the following command. | |
++--------------------+-------------------+---------------------+
+| Command | What it does | Helpful options + |
+| | | notes |
++====================+===================+=====================+
+| `chmod ` | permissions on a | permissions for |
+| | file or | user, group, |
+| | directory. | others: add `4` for |
+| | | read, `2` for |
+| | | write, `1` for |
+| | | execute, `0` for |
+| | | nothing, |
+| | | e.g.,`644`. |
++--------------------+-------------------+---------------------+
+| `chown ` | of a file or | user or group, |
+| | directory. | e.g.,`:my-group`. |
++--------------------+-------------------+---------------------+
+| `sudo ` | Adopt root | |
+| | permissions for | |
+| | the following | |
+| | command. | |
++--------------------+-------------------+---------------------+
### Install applications (Ubuntu)
-| | |
-|---------------------------------------|---------------------------------|
-| **Command** | **What it does** |
-| `apt-get update && apt-get upgrade -y` | Fetch and install upgrades to system packages |
-| `apt-get install ` | Install a system package. |
-| `wget` | Download a file from a URL. |
-| `gdebi` | Install local `.deb` file. |
++----------------------------+-----------------------------------+
+| **Command** | **What it does** |
++----------------------------+-----------------------------------+
+| `apt -get upd | Fetch and install upgrades to |
+| ate && apt-get upgrade -y` | system packages |
++----------------------------+-----------------------------------+
+| ` | Install a system package. |
+| apt-get install ` | |
++----------------------------+-----------------------------------+
+| `wget` | Download a file from a URL. |
++----------------------------+-----------------------------------+
+| `gdebi` | Install local `.deb` file. |
++----------------------------+-----------------------------------+
### Storage
-
-
-
-
-
-
-
-
-
-
-
-df
|
-Check storage space on device. |
--h for human readable file sizes.
|
-
-
-du
|
-Check size of files. |
-Most likely to be used as
-du -h <dir> | sort -h
-Also useful to combine with head . |
-
-
-
++-------------+-----------------+-----------------------------+
+| Command | What it does | Helpful options |
++=============+=================+=============================+
+| `df` | Check storage | `-h` for human readable |
+| | space on | file sizes. |
+| | device. | |
++-------------+-----------------+-----------------------------+
+| `du` | Check size of | Most likely to be used as |
+| | files. | `du -h | sort -h` |
+| | | |
+| | | Also useful to combine with |
+| | | `head`. |
++-------------+-----------------+-----------------------------+
### Processes
-| Command | What it does | Helpful options |
-|--------------------|--------------------|---------------------------------|
-| `top` | See what's running on the system. | |
-| `ps aux` | See all system processes. | Consider using `--sort` and pipe into `head` or `grep`. |
-| `kill` | Kill a system process. | `-9` to force kill immediately |
++-------------+--------------------+---------------------------+
+| Command | What it does | Helpful options |
++=============+====================+===========================+
+| `top` | See what's running | |
+| | on the system. | |
++-------------+--------------------+---------------------------+
+| `ps aux` | See all system | Consider using `--sort` |
+| | processes. | and pipe into `head` or |
+| | | `grep`. |
++-------------+--------------------+---------------------------+
+| `kill` | Kill a system | `-9` to force kill |
+| | process. | immediately |
++-------------+--------------------+---------------------------+
### Networking
-
-
-
-
-
-
-
-
-Command |
-What it does |
-Helpful Options |
-
-
-netstat
|
-See ports and services using them. |
-Usually used with -tlp , for tcp listening
-applications, including pid . |
-
-
-ssh -L <port>:<i p>:<port>:<host>
|
-Port forwards a remote port on remote host to local. |
-Remote ip is usually localhost .
-Choose local port to match remote port. |
-
-
-
++---------------+----------------------+-----------------------+
+| **Command** | **What it does** | **Helpful Options** |
++---------------+----------------------+-----------------------+
+| `netstat` | See ports and | Usually used with |
+| | services using them. | `-tlp`, for tcp |
+| | | listening |
+| | | applications, |
+| | | including `pid`. |
++---------------+----------------------+-----------------------+
+| `ssh -L | Port forwards a | Remote `ip` is |
+| ::< | remote port on | usually `localhost`. |
+| port>:` | remote host to | |
+| | local. | Choose local port to |
+| | | match remote port. |
++---------------+----------------------+-----------------------+
### The path
-| | |
-|-----------------------------------|------------------------------------|
-| **Command** | **What it does** |
-| `which ` | Finds the location of the binary that runs when you run `command`. |
-| `ln -s :` | Creates a symlink from file/directory at `linked location` to `where to put symlink`. |
++------------------+---------------------------------------------+
+| **Command** | **What it does** |
++------------------+---------------------------------------------+
+| ` | Finds the location of the binary that runs |
+| which ` | when you run `command`. |
++------------------+---------------------------------------------+
+| `ln -s :` | `where to put symlink`. |
++------------------+---------------------------------------------+
### `systemd`
@@ -596,56 +608,63 @@ Daemonizing services is accomplished by configuring them in
The format of all commands is `systemctl `.
-| Command | Notes/Tips |
-|-------------------|----------------------------|
-| `status` | Report status. |
-| `start` | |
-| `stop` | |
-| `restart` | `stop` then `start`. |
-| `reload` | Reload configuration that doesn't require restart (depends on service). |
-| `enable` | Daemonize the service. |
-| `disable` | Un-daemonize the service. |
++-------------+-------------------------------------------------+
+| Command | Notes/Tips |
++=============+=================================================+
+| `status` | Report status. |
++-------------+-------------------------------------------------+
+| `start` | |
++-------------+-------------------------------------------------+
+| `stop` | |
++-------------+-------------------------------------------------+
+| `restart` | `stop` then `start`. |
++-------------+-------------------------------------------------+
+| `reload` | Reload configuration that doesn't require |
+| | restart (depends on service). |
++-------------+-------------------------------------------------+
+| `enable` | Daemonize the service. |
++-------------+-------------------------------------------------+
+| `disable` | Un-daemonize the service. |
++-------------+-------------------------------------------------+
## IP Addresses and ports {#cheat-ports}
### Special IP Addresses
-
-
-
-
-
-
-
-
-
-
- |
- or loopback – the machine that originated the request. |
-
-
-
-
- |
-Protected address blocks used for private IP addresses. |
-
-
-
++-------------------------+------------------------------------+
+| Address | Meaning |
++=========================+====================================+
+| $127.0.0.1$ | $\text{localhost}$ or loopback -- |
+| | the machine that originated the |
+| | request. |
++-------------------------+------------------------------------+
+| $\text{10.x.x.x}$ | Protected address blocks used for |
+| | private IP addresses. |
+| $\text{172.16.x.x}$ to | |
+| $\text{172.31.x.x}$ | |
+| | |
+| $\text{192.168.x.x}$ | |
++-------------------------+------------------------------------+
### Special ports
All ports below $1024$ are reserved for server tasks and cannot be
assigned to admin-controlled services.
-| Protocol/application | Default port |
-|----------------------|--------------|
-| HTTP | $80$ |
-| HTTPS | $443$ |
-| SSH | $22$ |
-| PostgreSQL | $5432$ |
-| RStudio Server | $8787$ |
-| Shiny Server | $3939$ |
-| JupyterHub | $8000$ |
++-----------------------+---------------------------------------+
+| Protocol/application | Default port |
++=======================+=======================================+
+| HTTP | $80$ |
++-----------------------+---------------------------------------+
+| HTTPS | $443$ |
++-----------------------+---------------------------------------+
+| SSH | $22$ |
++-----------------------+---------------------------------------+
+| PostgreSQL | $5432$ |
++-----------------------+---------------------------------------+
+| RStudio Server | $8787$ |
++-----------------------+---------------------------------------+
+| Shiny Server | $3939$ |
++-----------------------+---------------------------------------+
+| JupyterHub | $8000$ |
++-----------------------+---------------------------------------+
diff --git a/chapters/append/lab-map.qmd b/chapters/append/lab-map.qmd
index bd015acc..7b857ca6 100644
--- a/chapters/append/lab-map.qmd
+++ b/chapters/append/lab-map.qmd
@@ -3,49 +3,50 @@
This section aims to clarify the relationship between the assets you'll
make in each portfolio exercise and labs in this book.
-+----------------------+-----------------------------------------------------------------------------------+
-| Chapter | Lab Activity |
-+======================+===================================================================================+
-| [@sec-env-as-code]: | Create a Quarto site that uses `{renv}` and `{venv}` to create standalone R and |
-| Environments as Code | Python virtual environments. Add an R EDA page and Python modeling. |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-proj-arch]: | Create an API that serves a Python machine-learning model using `{vetiver}` and |
-| Project Architecture | `{fastAPI}`. Call that API from a Shiny App in both R and Python. |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-data-access]: | Move data into a DuckDB database and serve model predictions from an API. |
-| Data Architecture | |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-log-monitor]: | Add logging to the app from [Chapter @sec-proj-arch]. |
-| Logging and | |
-| Monitoring | |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-deployments]: | Put a static Quarto site up on GitHub Pages using GitHub Actions that renders the |
-| Deployments | project. |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-docker]: | Put API from [Chapter @sec-proj-arch] into Docker Container. |
-| Docker | |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-cloud]: Cloud | Stand up an EC2 instance. |
-| | |
-| | Put the model into S3. |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-cmd-line]: | Log into the server with `.pem` key and create SSH key. |
-| Command Line | |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-linux]: Linux | Create a user on the server and add SSH key. |
-| Admin | |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-app-admin]: | Add R, Python, RStudio Server, JupyterHub, API, and App to EC2 instance from |
-| Application Admin | [Chapter @sec-cloud]. |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-scale]: | Resize the server. |
-| Scaling | |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-networking]: | Add proxy (NGINX) to reach all services from the web. |
-| Networking | |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-dns]: DNS | Add a URL to the EC2 instance. Put the Shiny app into an iFrame on the Quarto |
-| | site. |
-+----------------------+-----------------------------------------------------------------------------------+
-| [@sec-ssl]: SSL | Add SSL/HTTPS to the EC2 instance. |
-+----------------------+-----------------------------------------------------------------------------------+
++-----------------------------+----------------------------------------------------+
+| Chapter | Lab Activity |
++=============================+====================================================+
+| [@sec-env-as-code]: | Create a Quarto site that uses `{renv}` and |
+| Environments as Code | `{venv}` to create standalone R and Python virtual |
+| | environments. Add an R EDA page and Python |
+| | modeling. |
++-----------------------------+----------------------------------------------------+
+| [@sec-proj-arch]: Project | Create an API that serves a Python |
+| Architecture | machine-learning model using `{vetiver}` and |
+| | `{fastAPI}`. Call that API from a Shiny App in |
+| | both R and Python. |
++-----------------------------+----------------------------------------------------+
+| [@sec-data-access]: Data | Move data into a DuckDB database and serve model |
+| Architecture | predictions from an API. |
++-----------------------------+----------------------------------------------------+
+| [@sec-log-monitor]: Logging | Add logging to the app from [Chapter |
+| and Monitoring | @sec-proj-arch]. |
++-----------------------------+----------------------------------------------------+
+| [@sec-deployments]: | Put a static Quarto site up on GitHub Pages using |
+| Deployments | GitHub Actions that renders the project. |
++-----------------------------+----------------------------------------------------+
+| [@sec-docker]: Docker | Put API from [Chapter @sec-proj-arch] into Docker |
+| | Container. |
++-----------------------------+----------------------------------------------------+
+| [@sec-cloud]: Cloud | Stand up an EC2 instance. |
+| | |
+| | Put the model into S3. |
++-----------------------------+----------------------------------------------------+
+| [@sec-cmd-line]: Command | Log into the server with `.pem` key and create SSH |
+| Line | key. |
++-----------------------------+----------------------------------------------------+
+| [@sec-linux]: Linux Admin | Create a user on the server and add SSH key. |
++-----------------------------+----------------------------------------------------+
+| [@sec-app-admin]: | Add R, Python, RStudio Server, JupyterHub, API, |
+| Application Admin | and App to EC2 instance from [Chapter @sec-cloud]. |
++-----------------------------+----------------------------------------------------+
+| [@sec-scale]: Scaling | Resize the server. |
++-----------------------------+----------------------------------------------------+
+| [@sec-networking]: | Add proxy (NGINX) to reach all services from the |
+| Networking | web. |
++-----------------------------+----------------------------------------------------+
+| [@sec-dns]: DNS | Add a URL to the EC2 instance. Put the Shiny app |
+| | into an iFrame on the Quarto site. |
++-----------------------------+----------------------------------------------------+
+| [@sec-ssl]: SSL | Add SSL/HTTPS to the EC2 instance. |
++-----------------------------+----------------------------------------------------+
diff --git a/chapters/append/lb.qmd b/chapters/append/lb.qmd
index 2469c6ca..c7b9b883 100644
--- a/chapters/append/lb.qmd
+++ b/chapters/append/lb.qmd
@@ -1,4 +1,4 @@
-# Technical Detail: Load balancers {#sec-append-lb}
+# Technical Detail: Load Balancers {#sec-append-lb}
[Chapter @sec-ent-scale] introduced the idea of a load balancer as the
"front door" to a computational cluster. This appendix chapter will
diff --git a/chapters/index.qmd b/chapters/index.qmd
new file mode 100644
index 00000000..7e3bb59d
--- /dev/null
+++ b/chapters/index.qmd
@@ -0,0 +1,49 @@
+# Welcome! {.unnumbered}
+
+In this book, you'll learn about DevOps conventions, tools, and
+practices that can be useful to you as a data scientist. You'll also
+learn how to work better with the IT/Admin team at your organization,
+and even how to do a little server administration of your own if you're
+pressed into service.
+
+This website is (and always will be) **free to use**, and is licensed
+under the [Creative Commons Attribution-NonCommercial-NoDerivs
+3.0](https://creativecommons.org/licenses/by-nc-nd/3.0/us/) license.
+
+If you'd like a **physical copy** of the book, they will be available
+once the book is finished!
+
+## About the Author
+
+Alex K Gold leads the Solutions Engineering at Posit, formerly RStudio.
+
+In his free time, he enjoys landscaping, handstands, and Tai Chi.
+
+He occasionally blogs about data, management, and leadership at
+alexkgold.space.
+
+## Acknowledgments {.unnumbered}
+
+Thank you to current and former members of the Solutions Engineering
+team at Posit, who taught me most of what's in this book.
+
+Huge thanks to the R4DS book club, especially Jon Harmon, Gus Lipkin,
+and Tinashe Michael Tapera, who read an early (and rough!) copy of this
+book and gave me amazing feedback.
+
+Thanks to all others who provided improvements that ended up in this
+book (in alphabetical order): Carl Boettinger, Jon Harmon, Gus Lipkin,
+and Leungi.
+
+Thanks to Randi Cohen at Taylor and Francis and to Linda Kahn, who's
+always been more than an editor to me.
+
+Thanks to Eliot who cared enough about this project to insist he appear
+in the acknowledgments. Most of all, thanks to Shoshana for helping me
+live my best life.
+
+## Software information {.unnumbered}
+
+This book was written using the Quarto publishing system and was
+published to the web using GitHub Actions from
+[rOpenSci](https://github.com/orchid00/actions_sandbox).
diff --git a/chapters/intro.qmd b/chapters/intro.qmd
index b7ba93e4..0cbb1be0 100644
--- a/chapters/intro.qmd
+++ b/chapters/intro.qmd
@@ -210,7 +210,7 @@ science environment will also be subject to access control to ensure
that only the right people and systems have access.
![](intro-images/dsp.png){fig-alt="An image of a data science platform with access control going to a workbench and deployment and data and package supporting."
-width="600"}
+}
This book will help you understand the needs of each component of the
data science platform and how to articulate them to the IT/Admins at
@@ -272,7 +272,7 @@ them as comprehension questions in many chapters.
Here's an example for this book:
![](intro-images/mindmap.png){fig-alt="A mindmap for this book: I *wrote* and YOU *read* DO4DS. DO4DS *includes* EXERCISES, *some are* MIND MAPS."
-width="600"}
+}
Note how every node is a noun, and the edges (labels on the arrows) are
verbs. You've probably understood the content if you can write down the
@@ -310,7 +310,8 @@ Actions.
From an architectural perspective, it'll look something like this:
-![](intro-images/lab-arch.png)
+![](intro-images/lab-arch.png){fig-alt="A visual representation of the lab architecture as described in paragraphs above.."
+}
If you're interested in which pieces are completed in each chapter,
check out [Appendix @sec-append-lab].
diff --git a/chapters/sec1/1-1-env-as-code.qmd b/chapters/sec1/1-1-env-as-code.qmd
index 27f18468..e7713b3f 100644
--- a/chapters/sec1/1-1-env-as-code.qmd
+++ b/chapters/sec1/1-1-env-as-code.qmd
@@ -19,8 +19,7 @@ environment is the stack of software and hardware below your code, from
the R and Python packages you're using right down to the physical
hardware your code runs on.
-![](images/environment.png){fig-alt="Code and data going into an environment with hardware, R and Python, and packages."
-width="600"}
+![](images/environment.png){fig-alt="Code and data going into an environment with hardware, R and Python, and packages."}
Ignoring the readiness of the data science environment results in the
dreaded *it works on my machine* phenomenon with a failed attempt to
@@ -68,23 +67,22 @@ essential system libraries, and Python and/or R. Above that is the
**Layers of data science environments**
-+--------------+-----------------------+
-| Layer | Contents |
-+==============+=======================+
-| Packages | Python + R Packages |
-+--------------+-----------------------+
-| System | Python + R Language |
-| | Versions |
-| | |
-| | Other System |
-| | Libraries |
-| | |
-| | Operating System |
-+--------------+-----------------------+
-| Hardware | Virtual Hardware |
-| | |
-| | Physical Hardware |
-+--------------+-----------------------+
++-------------+--------------------------+
+| Layer | Contents |
++=============+==========================+
+| Packages | Python + R Packages |
++-------------+--------------------------+
+| System | Python + R Language |
+| | Versions |
+| | |
+| | Other System Libraries |
+| | |
+| | Operating System |
++-------------+--------------------------+
+| Hardware | Virtual Hardware |
+| | |
+| | Physical Hardware |
++-------------+--------------------------+
In an ideal world, the *hardware* and *system* layers should be the
responsibility of an IT/Admin. You may be responsible for them, but then
@@ -246,8 +244,7 @@ allowed. And when you load a package with `import` or `library`, it
searches the directories from `sys.path` or `.libPaths()` and returns
the package when it finds it.
-![](images/pkg-libs.png){fig-alt="A diagram of package libraries. An import or library statement triggers a call to sys.path or .libPaths(), which returns directories to search. That returns a package to the session."
-width="600"}
+![](images/pkg-libs.png){fig-alt="A diagram of package libraries. An import or library statement triggers a call to sys.path or .libPaths(), which returns directories to search. That returns a package to the session."}
Each library can contain, at most, one version of any package. So order
matters for the directories in `sys.path` or `.libPaths()`. Whatever
@@ -330,9 +327,47 @@ Before you add code, create and activate an `{renv}` environment with
Now, go ahead and do your analysis. Here's the contents of my `eda.qmd`.
-``` {.markdown filename="eda.qmd" include="../../_labs/eda/eda-basic.qmd"}
+```` {.markdown filename="eda.qmd"}
+---
+title: "Penguins EDA"
+format:
+ html:
+ code-fold: true
+---
+
+## Penguin Size and Mass by Sex and Species
+
+```{r}
+library(palmerpenguins)
+library(dplyr)
+library(ggplot2)
+
+df <- palmerpenguins::penguins
```
+```{r}
+df %>%
+ group_by(species, sex) %>%
+ summarise(
+ across(
+ where(is.numeric),
+ \(x) mean(x, na.rm = TRUE)
+ )
+ ) %>%
+ knitr::kable()
+```
+
+## Penguin Size vs Mass by Species
+
+```{r}
+df %>%
+ ggplot(aes(x = bill_length_mm, y = body_mass_g, color = species)) +
+ geom_point() +
+ geom_smooth(method = "lm")
+```
+
+````
+
Feel free to copy this Quarto doc into your website or to write your
own.
@@ -352,9 +387,50 @@ before you start `pip install`-ing packages.
Here's what's in my `model.qmd`, but you should feel free to include
whatever you want.
-``` {.yml filename="model.qmd" include="../../_labs/model/model-basic-py.qmd"}
+```` {.markdown filename="model.qmd"}
+---
+title: "Model"
+format:
+ html:
+ code-fold: true
+---
+
+```{python}
+from palmerpenguins import penguins
+from pandas import get_dummies
+import numpy as np
+from sklearn.linear_model import LinearRegression
+from sklearn import preprocessing
```
+## Get Data
+
+```{python}
+df = penguins.load_penguins().dropna()
+
+df.head(3)
+```
+
+## Define Model and Fit
+
+```{python}
+X = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)
+y = df['body_mass_g']
+
+model = LinearRegression().fit(X, y)
+```
+
+## Get some information
+
+```{python}
+print(f"R^2 {model.score(X,y)}")
+print(f"Intercept {model.intercept_}")
+print(f"Columns {X.columns}")
+print(f"Coefficients {model.coef_}")
+```
+
+````
+
Once you're happy with how the page works, capture your dependencies in
a `requirements.txt` using `pip freeze > requirements.txt` on the
command line.
diff --git a/chapters/sec1/1-2-proj-arch.qmd b/chapters/sec1/1-2-proj-arch.qmd
index f309bda3..ecd271d5 100644
--- a/chapters/sec1/1-2-proj-arch.qmd
+++ b/chapters/sec1/1-2-proj-arch.qmd
@@ -122,7 +122,7 @@ This flow chart illustrates how I decide which of the four types to
build.
![](images/presentation-layer.png){fig-alt="A flow chart of choosing an App, Report, API, or Job for the presentation layer as described in this section."
-width="600"}
+}
## Do less in the presentation layer
@@ -521,7 +521,7 @@ You may want to annotate your data flow charts with other attributes
like data types, update frequencies, and where data objects live.
![](images/data-flow-chart.png){fig-alt="A data flow chart showing how the palmer penguins data flows into the model creation API, which creates the model and the model creation report. The model serving API uses the model and feeds the model explorer API."
-width="600"}
+}
## Comprehension Questions
@@ -556,7 +556,9 @@ deploy, and monitor a machine learning model.
We can take our existing model, turn it into a `{vetiver}` model, and
save it to the `/data/model` folder with
-``` {.python include="../../_labs/model/model-vetiver.qmd" filename="model.qmd" start-line="45" end-line="51"}
+``` {.python filename="model.qmd"}
+from vetiver import VetiverModel
+v = VetiverModel(model, model_name='penguin_model', prototype_data=X)
```
If `/data/model` doesn't exist on your machine, you can create it, or
diff --git a/chapters/sec1/1-3-data-access.qmd b/chapters/sec1/1-3-data-access.qmd
index 6c64f339..ed0e1925 100644
--- a/chapters/sec1/1-3-data-access.qmd
+++ b/chapters/sec1/1-3-data-access.qmd
@@ -1,4 +1,4 @@
-# Using databases and data APIs {#sec-data-access}
+# Databases and Data APIs {#sec-data-access}
Your job as a data scientist is to sift through a massive pile of data
to extract nuggets of real information -- and then use that information.
@@ -139,7 +139,7 @@ person viewing the content and pass those along. This last option is
much more complex than the other two.
![](images/whose-creds.png){fig-alt="A diagram showing that using project creator credentials is simplest, service account is still easy, and viewer is hard."
-width="600"}
+}
If you have to use the viewer's credentials for data access, you can
write code to collect them from the viewer and pass them along. I don't
@@ -190,7 +190,7 @@ our purposes, we're talking about `http`-based `REST`-ful APIs.
send a request to the API and it sends a response back.
![](images/req_resp.png){fig-alt="A client requesting puppy pictures from a server and getting back a 200-coded response with a puppy."
-width="600"}
+}
The best way to learn about a new API is to read the documentation,
which will include many details about usage. Let's go through some of
@@ -468,7 +468,10 @@ time by the user, this code will just work.
In this lab, we will build the data and the presentation layers for our
penguin mass model exploration. We're going to create an app to explore
-the model, which will look like this: ![](images/penguin_app.png)
+the model, which will look like this:
+
+![](images/penguin_app.png){fig-alt="A screenshot of a simple Shiny app.."
+}
Let's start by moving the data into an actual data layer.
@@ -481,12 +484,22 @@ To start, let's load the data.
Here's what that looks like in R:
-``` {.r include="../../_labs/data-load/r-data-load.R"}
+``` {.r}
+con <- DBI::dbConnect(duckdb::duckdb(), dbdir = "my-db.duckdb")
+DBI::dbWriteTable(con, "penguins", palmerpenguins::penguins)
+DBI::dbDisconnect(con)
```
Or equivalently, in Python:
-``` {.python include="../../_labs/data-load/py-data-load.py"}
+``` {.python}
+import duckdb
+from palmerpenguins import penguins
+
+con = duckdb.connect('my-db.duckdb')
+df = penguins.load_penguins()
+con.execute('CREATE TABLE penguins AS SELECT * FROM df')
+con.close()
```
Now that the data is loaded, let's adjust our scripts to use the
@@ -495,7 +508,12 @@ database.
In R, we will replace our data loading with connecting to the database.
Leaving out all the parts that don't change, it looks like
-``` {.r include="../../_labs/eda/eda-db.qmd" filename="eda.qmd" start-line="14" end-line="19"}
+``` {.r filename="eda.qmd"}
+con <- DBI::dbConnect(
+ duckdb::duckdb(),
+ dbdir = "my-db.duckdb"
+ )
+df <- dplyr::tbl(con, "penguins")
```
We also need to call to `DBI::dbDisconnect(con)` at the end of the
@@ -505,7 +523,17 @@ We don't have to change anything because we wrote our data processing
code in `{dplyr}`. Under the hood, `{dplyr}` can switch seamlessly to a
database backend, which is really cool.
-``` {.r include="../../_labs/eda/eda-db.qmd" filename="eda.qmd" start-line="23" end-line="32"}
+``` {.r filename="eda.qmd"}
+df %>%
+ group_by(species, sex) %>%
+ summarise(
+ across(
+ ends_with("mm") | ends_with("g"),
+ \(x) mean(x, na.rm = TRUE)
+ )
+ ) %>%
+ dplyr::collect() %>%
+ knitr::kable()
```
It's unnecessary, but I've added a call to `dplyr::collect()` in line
@@ -517,7 +545,10 @@ benefit a larger dataset.
In Python, we're just going to load the entire dataset into memory for
modeling, so the line loading the dataset changes to
-``` {.python include="../../_labs/model/model-db.qmd" filename="model.qmd" start-line="18" end-line="21"}
+``` {.python filename="model.qmd"}
+con = duckdb.connect('my-db.duckdb')
+df = con.execute("SELECT * FROM penguins").fetchdf().dropna()
+con.close()
```
Now let's switch to figuring out the connection we'll need to our
@@ -596,12 +627,137 @@ yourself.
Either way, an app that looks like the picture above would look like
this in Python
-``` {.python include="../../_labs/app/app-api.py" filename="app.py"}
+``` {.python filename="app.py"}
+from shiny import App, render, ui, reactive
+import requests
+
+api_url = 'http://127.0.0.1:8080/predict'
+
+app_ui = ui.page_fluid(
+ ui.panel_title("Penguin Mass Predictor"),
+ ui.layout_sidebar(
+ ui.panel_sidebar(
+ [ui.input_slider("bill_length", "Bill Length (mm)", 30, 60, 45, step = 0.1),
+ ui.input_select("sex", "Sex", ["Male", "Female"]),
+ ui.input_select("species", "Species", ["Adelie", "Chinstrap", "Gentoo"]),
+ ui.input_action_button("predict", "Predict")]
+ ),
+ ui.panel_main(
+ ui.h2("Penguin Parameters"),
+ ui.output_text_verbatim("vals_out"),
+ ui.h2("Predicted Penguin Mass (g)"),
+ ui.output_text("pred_out")
+ )
+ )
+)
+
+def server(input, output, session):
+ @reactive.Calc
+ def vals():
+ d = {
+ "bill_length_mm" : input.bill_length(),
+ "sex_Male" : input.sex() == "Male",
+ "species_Gentoo" : input.species() == "Gentoo",
+ "species_Chinstrap" : input.species() == "Chinstrap"
+
+ }
+ return d
+
+ @reactive.Calc
+ @reactive.event(input.predict)
+ def pred():
+ r = requests.post(api_url, json = vals())
+ return r.json().get('predict')[0]
+
+ @output
+ @render.text
+ def vals_out():
+ return f"{vals()}"
+
+ @output
+ @render.text
+ def pred_out():
+ return f"{round(pred())}"
+
+app = App(app_ui, server)
+
```
And like this in R
-``` {.python include="../../_labs/app/app-api.R" filename="app.R"}
+``` {.r filename="app.R"}
+library(shiny)
+
+api_url <- "http://127.0.0.1:8080/predict"
+
+ui <- fluidPage(
+ titlePanel("Penguin Mass Predictor"),
+
+ # Model input values
+ sidebarLayout(
+ sidebarPanel(
+ sliderInput(
+ "bill_length",
+ "Bill Length (mm)",
+ min = 30,
+ max = 60,
+ value = 45,
+ step = 0.1
+ ),
+ selectInput(
+ "sex",
+ "Sex",
+ c("Male", "Female")
+ ),
+ selectInput(
+ "species",
+ "Species",
+ c("Adelie", "Chinstrap", "Gentoo")
+ ),
+ # Get model predictions
+ actionButton(
+ "predict",
+ "Predict"
+ )
+ ),
+
+ mainPanel(
+ h2("Penguin Parameters"),
+ verbatimTextOutput("vals"),
+ h2("Predicted Penguin Mass (g)"),
+ textOutput("pred")
+ )
+ )
+)
+
+server <- function(input, output) {
+ # Input params
+ vals <- reactive(
+ list(
+ bill_length_mm = input$bill_length,
+ species_Chinstrap = input$species == "Chinstrap",
+ species_Gentoo = input$species == "Gentoo",
+ sex_male = input$sex == "Male"
+ )
+ )
+
+ # Fetch prediction from API
+ pred <- eventReactive(
+ input$predict,
+ httr2::request(api_url) |>
+ httr2::req_body_json(vals()) |>
+ httr2::req_perform() |>
+ httr2::resp_body_json(),
+ ignoreInit = TRUE
+ )
+
+ # Render to UI
+ output$pred <- renderText(pred()$predict[[1]])
+ output$vals <- renderPrint(vals())
+}
+
+# Run the application
+shinyApp(ui = ui, server = server)
```
Over the next few chapters, we will implement more architectural best
diff --git a/chapters/sec1/1-4-monitor-log.qmd b/chapters/sec1/1-4-monitor-log.qmd
index 714acb13..30dae2cb 100644
--- a/chapters/sec1/1-4-monitor-log.qmd
+++ b/chapters/sec1/1-4-monitor-log.qmd
@@ -114,7 +114,7 @@ When the log statement runs, it creates a *log entry*.
For example, here's what logging for an app starting up might look like
in Python
-```{python filename="app.py"}
+```{.python filename="app.py"}
import logging
# Configure the log object
@@ -129,7 +129,7 @@ logging.info("App Started")
And here's what that looks like using `{log4r}`
-```{r filename="app.R"}
+```{.r filename="app.R"}
# Configure the log object
log <- log4r::logger()
@@ -333,14 +333,96 @@ I decided to log when the app starts, just before and after each
request, and an error logger if an HTTP error code comes back from the
API.
-With the logging now added, here's what the app looks like in R:
-
-``` {.r include="../../_labs/app/app-log.R" filename="app.R"}
+This is what the server block of the app looks like now in R:
+
+``` {.r filename="app.R"}
+server <- function(input, output) {
+ log <- log4r::logger()
+ log4r::info(log, "App Started")
+ # Input params
+ vals <- reactive(
+ list(
+ bill_length_mm = input$bill_length,
+ species_Chinstrap = input$species == "Chinstrap",
+ species_Gentoo = input$species == "Gentoo",
+ sex_male = input$sex == "Male"
+ )
+ )
+
+ # Fetch prediction from API
+ pred <- eventReactive(
+ input$predict,
+ {
+ log4r::info(log, "Prediction Requested")
+ r <- httr2::request(api_url) |>
+ httr2::req_body_json(vals()) |>
+ httr2::req_perform()
+ log4r::info(log, "Prediction Returned")
+
+ if (httr2::resp_is_error(r)) {
+ log4r::error(log, paste("HTTP Error"))
+ }
+
+ httr2::resp_body_json(r)
+ },
+ ignoreInit = TRUE
+ )
+
+ # Render to UI
+ output$pred <- renderText(pred()$predict[[1]])
+ output$vals <- renderPrint(vals())
+}
```
-And in Python:
+At the outset of the Python app, I've added:
+
+``` {.python filename="app.py"}
+import logging
+
+logging.basicConfig(
+ format='%(asctime)s - %(message)s',
+ level=logging.INFO
+)
+```
-``` {.python include="../../_labs/app/app-log.py" filename="app.py"}
+And the server block now looks like:
+
+``` {.python filename="app.py"}
+def server(input, output, session):
+ logging.info("App start")
+
+ @reactive.Calc
+ def vals():
+ d = {
+ "bill_length_mm" : input.bill_length(),
+ "sex_Male" : input.sex() == "Male",
+ "species_Gentoo" : input.species() == "Gentoo",
+ "species_Chinstrap" : input.species() == "Chinstrap"
+
+ }
+ return d
+
+ @reactive.Calc
+ @reactive.event(input.predict)
+ def pred():
+ logging.info("Request Made")
+ r = requests.post(api_url, json = vals())
+ logging.info("Request Returned")
+
+ if r.status_code != 200:
+ logging.error("HTTP error returned")
+
+ return r.json().get('predict')[0]
+
+ @output
+ @render.text
+ def vals_out():
+ return f"{vals()}"
+
+ @output
+ @render.text
+ def pred_out():
+ return f"{round(pred())}"
```
Now, if you load up this app locally, you can see the logs of what's
diff --git a/chapters/sec1/1-5-deployments.qmd b/chapters/sec1/1-5-deployments.qmd
index f5057761..28d77cf8 100644
--- a/chapters/sec1/1-5-deployments.qmd
+++ b/chapters/sec1/1-5-deployments.qmd
@@ -1,4 +1,4 @@
-# Deployments and code promotion {#sec-deployments}
+# Deployments and Code Promotion {#sec-deployments}
Your work doesn't matter if it never leaves your computer. You want your
work to be useful, and it only becomes useful if you share it with the
@@ -43,7 +43,7 @@ test are collectively called the *lower environments* and prod the
*higher environment*.
![](images/dev-test-prod.png){fig-alt="An app moving through dev, test, and prod environments."
-width="600"}
+}
While the dev/test/prod triad is the most traditional, some
organizations have more than two lower environments and some have only
@@ -205,7 +205,7 @@ you were satisfied.
Here's what the Git graph for that sequence of events might look like:
![](images/git-branches.png){fig-alt="Diagram showing branching strategy. A feature branch called new plot is created from and then merged back to test. A bug is revealed, so another commit fixing the bug is merged into test and then into main."
-width="600"}
+}
One of the tenets of a good CI/CD practice is that changes are merged
frequently and incrementally into production.
@@ -277,7 +277,7 @@ For example, let's say you have a project that should use a special
read-only database in dev and switch to writing in the prod database in
prod. You might write the config file below to describe this behavior:
-```{yaml filename="config.yml"}
+```{.yaml filename="config.yml"}
dev:
write: false
db-path: dev-db
@@ -346,7 +346,32 @@ Following those instructions will accomplish three things for you:
Here's the basic GitHub Actions file (or close to it) that the process
will auto-generate for you.
-``` {.yaml include="../../_labs/gha/publish-basic.yml" filename=".github/workflows/publish.yml"}
+``` {.yaml filename=".github/workflows/publish.yml"}
+on:
+ workflow_dispatch:
+ push:
+ branches: main
+
+name: Quarto Publish
+
+jobs:
+ build-deploy:
+ runs-on: ubuntu-latest
+ permissions:
+ contents: write
+ steps:
+ - name: Check out repository
+ uses: actions/checkout@v2
+
+ - name: Set up Quarto
+ uses: quarto-dev/quarto-actions/setup@v2
+
+ - name: Render and Publish
+ uses: quarto-dev/quarto-actions/publish@v2
+ with:
+ target: gh-pages
+ env:
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
```
Like all GitHub Actions, this action is defined in a `.yml` file in the
@@ -394,7 +419,19 @@ code working in CI/CD you **should not** freeze your environment.
First, add the commands to install R, `{renv}`, and the packages for
your content to the GitHub Actions workflow.
-``` {.yml filename=".github/workflows/publish.yml" include="../../_labs/gha/publish-r-py.yml" start-line="20" end-line="31"}
+``` {.yml filename=".github/workflows/publish.yml"}
+ - name: Install R
+ uses: r-lib/actions/setup-r@v2
+ with:
+ r-version: '4.2.0'
+ use-public-rspm: true
+
+ - name: Setup renv and install packages
+ uses: r-lib/actions/setup-renv@v2
+ with:
+ cache-version: 1
+ env:
+ RENV_CONFIG_REPOS_OVERRIDE: https://packagemanager.rstudio.com/all/latest
```
::: callout-note
@@ -409,7 +446,14 @@ Posit Package Manager, which does have Linux binaries.
You'll also need to add a workflow to GitHub Actions to install Python
and the necessary Python packages from the `requirements.txt`.
-``` {.yml filename=".github/workflows/publish.yml" include="../../_labs/gha/publish-r-py.yml" start-line="33" end-line="39"}
+``` {.yml filename=".github/workflows/publish.yml"}
+ - name: Install Python and Dependencies
+ uses: actions/setup-python@v4
+ with:
+ python-version: '3.10'
+ cache: 'pip'
+ - run: pip install jupyter
+ - run: pip install -r requirements.txt
```
Note that, we run the Python environment restore commands with `run`
diff --git a/chapters/sec1/1-6-docker.qmd b/chapters/sec1/1-6-docker.qmd
index c4481413..c47e0062 100644
--- a/chapters/sec1/1-6-docker.qmd
+++ b/chapters/sec1/1-6-docker.qmd
@@ -103,7 +103,7 @@ The graphic below shows the different states for a container and the CLI
commands to move from one to another.
![](images/docker-lifecycle.png){fig-alt="A diagram. A dockerfile turns into a Docker Image with docker build. The image can push or pull to or from an image registry. The image can run as a container instance."
-width="600"}
+}
::: callout-note
I've included `docker pull` on the graphic for completeness, but you'll
@@ -196,8 +196,7 @@ docker run -v /home/alex/data:/data
Here's a diagram of how this works.
-![](images/docker-on-host.png){fig-alt="A diagram showing how the flag -v /home/data:/data mounts the /home/data directory of the host to /data in the container."
-width="450"}
+![](images/docker-on-host.png){fig-alt="A diagram showing how the flag -v /home/data:/data mounts the /home/data directory of the host to /data in the container."}
Similarly, if you have a service running in a container on a particular
port, you'll need to map the container port to a host port with the `-p`
@@ -259,7 +258,7 @@ Every Dockerfile command defines a new *layer*. A great feature of
Docker is that it only rebuilds the layers it needs to when you make
changes. For example, take the following Dockerfile:
-``` {.dockerfile Filename="Dockerfile"}
+``` {.dockerfile filename="Dockerfile"}
FROM ubuntu:latest
COPY my-data.csv /data/data.csv
@@ -304,7 +303,28 @@ at the package documentation for details.
Once you've generated your Dockerfile, take a look at it. Here's the one
for my model:
-``` {.dockerfile include="../../_labs/docker/docker-local/docker/Dockerfile" filename="Dockerfile"}
+``` {.dockerfile filename="Dockerfile"}
+# # Generated by the vetiver package; edit with care
+# start with python base image
+FROM python:3.9
+
+# create directory in container for vetiver files
+WORKDIR /vetiver
+
+# copy and install requirements
+COPY vetiver_requirements.txt /vetiver/requirements.txt
+
+#
+RUN pip install --no-cache-dir --upgrade -r /vetiver/requirements.txt
+
+# copy app file
+COPY app.py /vetiver/app/app.py
+
+# expose port
+EXPOSE 8080
+
+# run vetiver API
+CMD ["uvicorn", "app.app:api", "--host", "0.0.0.0", "--port", "8080"]
```
This auto-generated Dockerfile is nicely commented so it's easy to
@@ -321,7 +341,7 @@ Now build the container using `docker build -t penguin-model .`.
You can run the container using
-``` {.bash eval="false"}
+``` {.bash filename="Terminal"}
docker run --rm -d \
-p 8080:8080 \
--name penguin-model \
@@ -335,7 +355,17 @@ command) you'll get some feedback that might be helpful.
In line 15 of the Dockerfile, we copy `app.py` in to the container.
Let's look at that file to see if we can find any hints.
-``` {.python include="../../_labs/docker/docker-local/docker/app.py" filename="app.py"}
+``` {.python filename="app.py"}
+from vetiver import VetiverModel
+import vetiver
+import pins
+
+
+b = pins.board_folder('./model', allow_pickle_read=True)
+v = VetiverModel.from_pin(b, 'penguin_model', version = '20230422T102952Z-cb1f9')
+
+vetiver_api = vetiver.VetiverAPI(v)
+api = vetiver_api.app
```
Look at that (very long) line 6. The API is connecting to a local
diff --git a/chapters/sec2/2-1-cloud.qmd b/chapters/sec2/2-1-cloud.qmd
index a365034a..c85cb606 100644
--- a/chapters/sec2/2-1-cloud.qmd
+++ b/chapters/sec2/2-1-cloud.qmd
@@ -527,7 +527,12 @@ model to S3 is easy by changing the `{vetiver}` board type to
It'll look something like this.
-``` {.python include="../../_labs/model/model-vetiver-s3.qmd" filename="model.qmd" start-line="64" end-line="68"}
+``` {.python filename="model.qmd"}
+from pins import board_s3
+from vetiver import vetiver_pin_write
+
+board = board_s3("do4ds-lab", allow_pickle_read=True)
+vetiver_pin_write(board, v)
```
Under the hood, `{vetiver}` uses standard R and Python tooling to access
@@ -547,7 +552,16 @@ than the local folder.
Now, the script to build the `Dockerfile` looks like this:
-``` {.python include="../../_labs/docker/docker-s3/build-docker-s3.qmd" start_line="11" end_line="19"}
+``` {.python}
+from dotenv import load_dotenv
+
+load_dotenv()
+
+from pins import board_s3
+from vetiver import vetiver_prepare_docker
+
+board = board_s3("do4ds-lab", allow_pickle_read=True)
+vetiver_prepare_docker(board, "penguin_model")
```
### Step 4: Give GitHub Actions S3 credentials
@@ -562,7 +576,12 @@ of the Action.
Once you're done, that section of the `publish.yml` should look
something like this.
-``` {.yaml include="../../_labs/gha/publish-s3.yml" filename=".github/workflows/publish.yml" start_line="45" end_line="49"}
+``` {.yaml filename=".github/workflows/publish.yml"}
+ env:
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+ AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+ AWS_REGION: us-east-1
```
Now, unlike the `GITHUB_TOKEN` secret, which GitHub Actions
diff --git a/chapters/sec2/2-2-cmd-line.qmd b/chapters/sec2/2-2-cmd-line.qmd
index e74e574c..0db35a52 100644
--- a/chapters/sec2/2-2-cmd-line.qmd
+++ b/chapters/sec2/2-2-cmd-line.qmd
@@ -1,4 +1,4 @@
-# Using the command line {#sec-cmd-line}
+# The Command Line {#sec-cmd-line}
Interacting with your personal computer or phone happens via taps and
clicks, opening applications, and navigating tabs and windows. But
@@ -220,7 +220,7 @@ request. The remote host verifies the private key with the public key
and opens an encrypted connection.
![](images/ssh.png){fig-alt="A diagram of SSH initialization. The local host sends the private key, the remote checks against the public key, and then opens the session."
-width="450"}
+}
It can be hard to remember how to configure SSH. So let's detour into
*public key cryptography*, the underlying technology. Once you've built
diff --git a/chapters/sec2/2-3-linux.qmd b/chapters/sec2/2-3-linux.qmd
index 158835fa..093c7f88 100644
--- a/chapters/sec2/2-3-linux.qmd
+++ b/chapters/sec2/2-3-linux.qmd
@@ -1,4 +1,4 @@
-# Intro to Linux Administration {#sec-linux}
+# Linux Administration {#sec-linux}
You're accustomed to interacting with a computer and phone running
MacOS, Windows, iOS, or Android. But most servers don't run any of
@@ -248,7 +248,7 @@ my MacBook, I'm a member of several groups, with the primary group
``` {.bash filename="Terminal"}
> id
-uid=501(alexkgold) gid=20(staff) groups=20(staff),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),701(com.apple.sharepoint.group.1),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),400(com.apple.access_remote_ae)
+uid=501(alexkgold) gid=20(staff) groups=20(staff),12(everyone), ...
```
If you ever need to add users to a server, the easiest way is with the
@@ -289,8 +289,7 @@ For example, here's a set of permissions that you might have for a
program that you wanted anyone to be able to run, group members to
inspect, and only the owner to change.
-![](./images/perms-ex.png){fig-alt="A 3x3 grid read, write, execute, on one side and owner, owning group, and everyone else at the top. Green checks in all of the execute, write for the owner, and read for owner and owning group. Red xs everywhere else."
-width="400"}
+![](./images/perms-ex.png){fig-alt="A 3x3 grid read, write, execute, on one side and owner, owning group, and everyone else at the top. Green checks in all of the execute, write for the owner, and read for owner and owning group. Red xs everywhere else."}
Directories also have permissions -- read allows the user to see what's
in the directory, write allows the user to alter what's in the
@@ -413,8 +412,7 @@ sub-directory of `/` or...you get the picture. The `/home/alex` *file
path* defines a particular location, the `alex` sub-directory of
`/home`, itself a sub-directory of the root directory, `/`.
-![](images/directories.png){fig-alt="A tree of directories. / is the root, /home is a sub directory, /home/alex is a sub-sub-directory, and /etc is another sub-directory."
-width="600"}
+![](images/directories.png){fig-alt="A tree of directories. / is the root, /home is a sub directory, /home/alex is a sub-sub-directory, and /etc is another sub-directory."}
::: callout-tip
It's never necessary, but viewing the tree-like layout for a directory
diff --git a/chapters/sec2/2-4-app-admin.qmd b/chapters/sec2/2-4-app-admin.qmd
index 6a7acf6b..63dae9e5 100644
--- a/chapters/sec2/2-4-app-admin.qmd
+++ b/chapters/sec2/2-4-app-admin.qmd
@@ -1,4 +1,4 @@
-# Application administration {#sec-app-admin}
+# Application Administration {#sec-app-admin}
The last few chapters have focused on how to run a Linux server. But you
don't care about running a Linux server -- you care about doing data
diff --git a/chapters/sec2/2-5-scale.qmd b/chapters/sec2/2-5-scale.qmd
index 1c9dc23b..38e94197 100644
--- a/chapters/sec2/2-5-scale.qmd
+++ b/chapters/sec2/2-5-scale.qmd
@@ -294,16 +294,19 @@ system at any given time. The `top` command is a good first stop. `top`
shows information about the processes consuming the most CPU in
real-time.
-Here's the `top` output from my machine as I write this sentence.
+Here's the `top` output from my machine as I write this
+sentence.[^2-5-scale-4]
+
+[^2-5-scale-4]: I've cut out a few columns for readability.
``` {.bash filename="Terminal"}
-PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPRS PGRP
-0 kernel_task 16.1 03:56:53 530/10 0 0 2272K 0B 0B 0
-16329 WindowServer 16.0 01:53:20 23 6 3717 941M- 16M+ 124M 16329
-24484 iTerm2 11.3 00:38.20 5 2 266- 71M- 128K 18M- 24484
-29519 top 9.7 00:04.30 1/1 0 36 9729K 0B 0B 29519
-16795 Magnet 3.1 00:39.16 3 1 206 82M 0B 39M 16795
-16934 Arc 1.8 18:18.49 45 6 938 310M 144K 61M 16934
+PID COMMAND %CPU TIME #PORT MEM
+0 kernel_task 16.1 03:56:53 0 2272K
+16329 WindowServer 16.0 01:53:20 3717 941M-
+24484 iTerm2 11.3 00:38.20 266- 71M-
+29519 top 9.7 00:04.30 36 9729K
+16795 Magnet 3.1 00:39.16 206 82M
+16934 Arc 1.8 18:18.49 938 310M
```
In most instances, the first three columns are the most useful. The
@@ -340,16 +343,16 @@ resources than you intended. If you have some sense of the name or who
started it, you may want to pipe the output of `ps aux` into `grep` to
find the `pid`.
-For example, I might run `ps aux | grep RStudio` to get[^2-5-scale-4]
+For example, I might run `ps aux | grep RStudio` to get[^2-5-scale-5]
-[^2-5-scale-4]: I've done a bunch of doctoring to the output to make it
+[^2-5-scale-5]: I've done a bunch of doctoring to the output to make it
easier to read.
``` {.bash filename="Terminal"}
> ps aux | grep RStudio
USER PID %CPU %MEM STARTED TIME COMMAND
-alexkgold 23583 0.9 1.7 Sat09AM 17:15.27 /Applications/RStudio.app/RStudio
-alexkgold 23605 0.5 0.4 Sat09AM 1:58.16 /Applications/RStudio.app/rsession
+alexkgold 23583 0.9 1.7 Sat09AM 17:15.27 RStudio
+alexkgold 23605 0.5 0.4 Sat09AM 1:58.16 rsession
```
RStudio is behaving nicely on my machine, but if it were not responsive,
@@ -364,14 +367,14 @@ you've got: `du` and `df`. These commands are almost always used with
the `-h` flag to put file sizes in human-readable formats.
`df` (disk free) shows the capacity left on the device where the
-directory sits. For example, here's the result of running the `df`
-command on the chapters directory on my laptop that includes this
-chapter.
+directory sits. For example, here's the first few columns from running
+the `df` command on the chapters directory on my laptop that includes
+this chapter.
``` {.bash filename="Terminal"}
> df -h chapters
-Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
-/dev/disk3s5 926Gi 227Gi 686Gi 25% 1496100 7188673280 0% /System/Volumes/Data
+Filesystem Size Used Avail Capacity
+/dev/disk3s5 926Gi 227Gi 686Gi 25%
```
You can see that the `chapters` folder lives on a disk called
@@ -453,7 +456,7 @@ instances are optimized for different kinds of workloads.
Here's a table of common instance types for data science purposes:
| Instance Type | What it is |
-|--------------------|----------------------------------------------------|
+|---------------------|---------------------------------------------------|
| `t3` | The "standard" configuration. Relatively cheap. Sizes may be limited. |
| `C` | CPU-optimized instances, aka faster CPUs. |
| `R` | Higher ratio of RAM to CPU relative to `t3`. |
@@ -535,13 +538,11 @@ before and after the switch reveals that I've got more RAM:
```
test-user@ip-172-31-53-181:~$ free -h
- total used free shared buff/cache available
-Mem: 966Mi 412Mi 215Mi 0.0Ki 338Mi 404Mi
-Swap: 0B 0B 0B
+ total used free available
+Mem: 966Mi 412Mi 215Mi 404Mi
test-user@ip-172-31-53-181:~$ free -h
- total used free shared buff/cache available
-Mem: 1.9Gi 225Mi 1.3Gi 0.0Ki 447Mi 1.6Gi
-Swap: 0B 0B 0B
+ total used free available
+Mem: 1.9Gi 412Mi 1.4Gi 1.6Gi
```
There's twice as much after the change!
diff --git a/chapters/sec2/2-6-networking.qmd b/chapters/sec2/2-6-networking.qmd
index 11435cc0..88c23892 100644
--- a/chapters/sec2/2-6-networking.qmd
+++ b/chapters/sec2/2-6-networking.qmd
@@ -216,8 +216,7 @@ go to your service and it hangs with no response before eventually
timing out.
:::
-![](images/firewall-proxy.png){fig-alt="Traffic coming through firewall only on port 80. Proxy routes /jupyter to port 8000 on server and /rstudio to port 8787."
-width="600"}
+![](images/firewall-proxy.png){fig-alt="Traffic coming through firewall only on port 80. Proxy routes /jupyter to port 8000 on server and /rstudio to port 8787."}
We've been talking exclusively about HTTP and HTTPS traffic arriving on
$80$ and $443$, because web traffic arrives as a series of HTTP `GET`
@@ -412,21 +411,18 @@ and RStudio Server:
2. Install NGINX with `sudo apt install nginx`.
3. Save a backup of the default `nginx.conf`,
`cp /etc/nginx/nginx.conf /etc/nginx/nginx-backup.conf`.[^2-6-networking-7]
-4. Edit the NGINX configuration with `sudo vim /etc/nginx/nginx.conf`
- and replace it with:
+4. Edit the NGINX configuration with `sudo vim /etc/nginx/nginx.conf`.
+ There is an example in the Git repo for this book under
+ `_labs/server-config`.
+5. Test that your configuration is valid `sudo nginx -t`.
+6. Start NGINX with `sudo systemctl start nginx`. If you see nothing
+ all is well.
[^2-6-networking-7]: This is generally a good practice before you start
messing with config files. Bad configuration is usually preferable
to a service that can't start at all because you've messed up the
config so badly. It happens.
-``` {.bash include="../../_labs/server-config/http-nginx.conf" filename="/etc/nginx/nginx.conf"}
-```
-
-4. Test that your configuration is valid `sudo nginx -t`.
-5. Start NGINX with `sudo systemctl start nginx`. If you see nothing
- all is well.
-
If you need to change anything, update the config and then restart with
`sudo systemctl restart nginx`.
@@ -523,3 +519,4 @@ good results.
One thing to consider is whether the model API should be publicly
accessible at all. If the only thing calling it is the Shiny app, maybe
it shouldn't be.
+
diff --git a/chapters/sec2/2-7-dns.qmd b/chapters/sec2/2-7-dns.qmd
index 2be57696..6fac937a 100644
--- a/chapters/sec2/2-7-dns.qmd
+++ b/chapters/sec2/2-7-dns.qmd
@@ -1,4 +1,4 @@
-# DNS is for human-readable addresses {#sec-dns}
+# Domains and DNS {#sec-dns}
In [Chapter @sec-networking] you learned that IP Addresses are where a
host actually lives on a computer network. But you've been using the
@@ -34,7 +34,7 @@ purchasing a domain, you register the association between your domain
and the IP Address with the DNS nameservers so users can look them up.
![](images/dns_resolution.png){fig-alt="An image of the user querying a DNS nameserver for example.com and getting back an IP address."
-width="600"}
+}
A complete domain is called a *Fully-Qualified Domain Name (FQDN)* and
consists of three parts:
@@ -305,5 +305,5 @@ get loaded in automatically. I want the app on the landing page of my
site, `index.qmd`. So I've added a block that looks like:
``` {.html filename="index.qmd"}
-
+
```
diff --git a/chapters/sec2/2-8-ssl.qmd b/chapters/sec2/2-8-ssl.qmd
index 90282454..57ee9c8c 100644
--- a/chapters/sec2/2-8-ssl.qmd
+++ b/chapters/sec2/2-8-ssl.qmd
@@ -1,4 +1,4 @@
-# You should use SSL/HTTPS {#sec-ssl}
+# SSL/TLS and HTTPS {#sec-ssl}
In [Chapter @sec-networking], I used the analogy of putting a letter in
the mail for sending HTTP traffic over the web. But there's more to it.
@@ -106,7 +106,7 @@ sending real data, now encrypted securely inside a digital envelope.
work the same in both directions.
![](images/ssl.png){fig-alt="SSL initialization. 1 client request, 2 public key sent by server, 3 validate key against CA store, 4 establish session w/ session keys."
-width="600"}
+}
## Getting and using SSL certificates
diff --git a/chapters/sec3/3-0-sec-intro.qmd b/chapters/sec3/3-0-sec-intro.qmd
index 49de9969..893f3490 100644
--- a/chapters/sec3/3-0-sec-intro.qmd
+++ b/chapters/sec3/3-0-sec-intro.qmd
@@ -90,7 +90,7 @@ way down the list, is that users don't have a bad experience using the
environment.
![](images/it_hierarchy.png){fig-alt="A hierarchy of IT/Admin concerns. From biggest to smallest: data theft, resource hijacking, data loss, lost time, incorrect work, good user experience."
-width="600"}
+}
## Enterprise tools and techniques
diff --git a/chapters/sec3/3-1-ent-networks.qmd b/chapters/sec3/3-1-ent-networks.qmd
index 8aa59cc1..18b5ef58 100644
--- a/chapters/sec3/3-1-ent-networks.qmd
+++ b/chapters/sec3/3-1-ent-networks.qmd
@@ -60,7 +60,7 @@ of these servers to be more available than needed. Providing precisely
the right level of networking access isn't a trivial undertaking.
![](images/private_network.png){fig-alt="A picture of traffic coming into a private network from laptops going to a workbench. There's a connection from the workbench to a database and package repository, but only to there."
-width="600"}
+}
Those are just the servers for actually doing work. Enterprise networks
also include various devices that control the network traffic itself.
@@ -130,7 +130,7 @@ servers live.[^3-1-ent-networks-2]
be resilient to failures in one availability zone.
![](images/subnets.png){fig-alt="A private network where people come to public subnet with HTTP proxy and Bastion Host. Access to Work Nodes in Private Subnet is only from Public Subnet."
-width="600"}
+}
Aside from the security benefits, putting the important servers in the
private subnet is also more convenient because the IT/Admins can use
@@ -168,7 +168,7 @@ easier to remember and IT/Admins always understand what I mean.
:::
![](images/proxy-dir.png){fig-alt="Inbound/Reverse proxies handle traffic into the private network. Outbound/Forward proxies handle traffic going out."
-width="600"}
+}
The first step in debugging networking issues is to ask whether one or
more proxies might be in the middle. You can jumpstart that discussion
diff --git a/chapters/sec3/3-2-auth.qmd b/chapters/sec3/3-2-auth.qmd
index 370e3f51..a569934f 100644
--- a/chapters/sec3/3-2-auth.qmd
+++ b/chapters/sec3/3-2-auth.qmd
@@ -56,7 +56,7 @@ process is called *authorization (authz)*. The combination of authn and
authz comprise auth.
![](images/auth.png){fig-alt="Authentication -- someone proving who they are with an ID card. Authorization -- someone asking if they can come in and a list being consulted."
-width="600"}
+}
Many organizations start simply. They add services one at a time and
allow each to use built-in functionality to issue service-specific
@@ -64,7 +64,7 @@ usernames and passwords to users. This would be similar to posting a
guard at each room's door to create a unique credential for each user.
![](images/simple_auth.png){fig-alt="A user logging into 3 different services with 3 different usernames and passwords."
-width="600"}
+}
This quickly becomes a mess for everyone. It's bad for users because
they either need to keep many credentials straight or reuse the same
@@ -91,7 +91,7 @@ the rooms are using a similar level of security and if credentials are
compromised, it's easy to swap them out.
![](images/ldap-ad.png){fig-alt="A user logging into different services with the same username and password with an LDAP/AD server in the back."
-width="600"}
+}
LDAP/AD also provides a straightforward way to create Linux users with a
home directory, so it's often used in data science workbench contexts
@@ -149,7 +149,7 @@ send a request to the central security office, where the room can be
remotely unlocked if the request is approved.
![](images/sso.png){fig-alt="A user getting an SSO token, which they use to log in to each service."
-width="600"}
+}
SSO isn't a technology. It describes a user and admin experience almost
always accomplished through a standalone identity provider like Okta,
@@ -213,7 +213,7 @@ means "passthrough" is a misnomer, and a much more complicated exchange
occurs.
![](images/passthrough_auth.png){fig-alt="The user logs into the data science platform with an SSO token and then can automatically access the data source with the proper token."
-width="600"}
+}
OAuth and IAM are quickly becoming industry standards for accessing data
sources, but automated handoffs for every combination of SSO technology,
diff --git a/chapters/sec3/3-3-ent-scale.qmd b/chapters/sec3/3-3-ent-scale.qmd
index 3d7251bb..5a801e5d 100644
--- a/chapters/sec3/3-3-ent-scale.qmd
+++ b/chapters/sec3/3-3-ent-scale.qmd
@@ -1,4 +1,4 @@
-# Compute at enterprise scale {#sec-ent-scale}
+# Compute at Enterprise Scale {#sec-ent-scale}
Many enterprise data science platforms have requirements that quickly
outstrip the capacity of a modest server. It's common to have way more
@@ -85,8 +85,7 @@ working on data science projects in Dev and Test within the Prod
IT/Admin environment. Ultimately, the goal is to create an extremely
reliable Prod-Prod environment.
-![](images/dev-test-prod.png){fig-alt="The IT/Admin promotes the complete staging environment, then the data scientist or IT/Admin promote within Prod."
-width="600"}
+![](images/dev-test-prod.png){fig-alt="The IT/Admin promotes the complete staging environment, then the data scientist or IT/Admin promote within Prod."}
In enterprises, moves from staging to prod, including upgrades to
applications or operating systems or adding system libraries have rules
@@ -174,8 +173,7 @@ doesn't stay on the nodes. Instead, it lives in separate storage, most
often a database and/or file share, that is symmetrically accessible to
all the nodes in the cluster.
-![](images/lb-cluster.png){fig-alt="Users come to the load balancer, which sends them to the nodes, which connect to the state."
-width="600"}
+![](images/lb-cluster.png){fig-alt="Users come to the load balancer, which sends them to the nodes, which connect to the state."}
If you are a solo data scientist reading this, please do not try to run
a load-balanced data science cluster. When you undertake load balancing,
@@ -228,8 +226,7 @@ about where each pod goes. The control plane *schedules* the pods on the
nodes without a human having to consider networking or application
requirements.
-![](images/k8s.png){fig-alt="Image of a Kubernetes cluster with 3 nodes and 6 pods of various sizes arranged across the nodes."
-width="600"}
+![](images/k8s.png){fig-alt="Image of a Kubernetes cluster with 3 nodes and 6 pods of various sizes arranged across the nodes."}
From the IT/Admin's perspective, this is wonderful because they ensure
the cluster has sufficient horsepower and all the app requirements come
@@ -390,10 +387,10 @@ Even if you don't have any node sizing issues, HPC can be an excellent
fit for autoscaling or a cluster with heterogeneous nodes. HPC
frameworks are generally quite session-aware, making them a good choice
for autoscaling a data science workbench. Most also support different
-categories of work (called *queues* in Slurm) right out of the box. If
-you're interested in trying out Slurm, AWS has a service called
-*ParallelCluster* that allows users to set up an HPC cluster with no
-additional cost beyond the EC2 instances in the cluster.
+categories of work, often called *queues* or *partitions*, right out of
+the box. If you're interested in trying out Slurm, AWS has a service
+called *ParallelCluster* that allows users to set up an HPC cluster with
+no additional cost beyond the EC2 instances in the cluster.
The upshot is that running a single data science cluster that autoscales
is hard. It's usually easier to create one-off data science environments
diff --git a/chapters/sec3/3-4-ent-pm.qmd b/chapters/sec3/3-4-ent-pm.qmd
index 4f3f5d85..1806ac66 100644
--- a/chapters/sec3/3-4-ent-pm.qmd
+++ b/chapters/sec3/3-4-ent-pm.qmd
@@ -213,7 +213,7 @@ airgapped environments, IT/Admins are comfortable having narrow
exceptions so the package repository can download packages.
![](images/pm-solution.png){fig-alt="A data science environment getting packages from a package repository, while all other connections bounce back inside the firewall."
-width="600"}
+}
This tends to work best when the IT/Admin is the one who controls which
packages are allowed into the repository and when. Then you, as the data
diff --git a/index.qmd b/index.qmd
index 61c92846..b4335fe3 100644
--- a/index.qmd
+++ b/index.qmd
@@ -1,75 +1,24 @@
# Welcome! {.unnumbered}
-This is the website for the book *DevOps for Data Science*, currently in
-draft form.
-
-::: callout-warning
-This book is very much still a work in progress.
-
-The content is mostly here, but I am still rewriting, polishing, and
-rearranging.
-
-Content is likely to move from section to section and chapter to
-chapter.
-
-Links are likely to break.
-
-Thanks for your patience.
-:::
-
In this book, you'll learn about DevOps conventions, tools, and
practices that can be useful to you as a data scientist. You'll also
learn how to work better with the IT/Admin team at your organization,
and even how to do a little server administration of your own if you're
pressed into service.
-This website is (and always will be) **free to use**, and is licensed
-under the [Creative Commons Attribution-NonCommercial-NoDerivs
-3.0](https://creativecommons.org/licenses/by-nc-nd/3.0/us/) license.
-
-If you'd like a **physical copy** of the book, they will be available
-once it's finished!
-
-## Software information {.unnumbered}
-
-I used the **knitr**\index{knitr} package [@xie2015] and the
-**quarto**\index{quarto} package [@quarto] to compile my book.
-
-This book is published to the web using GitHub Actions from
-[rOpenSci](https://github.com/orchid00/actions_sandbox).
-
## About the Author
-Alex Gold is the Director of Solutions Engineering at [Posit](posit.co),
-formerly RStudio.
-
-The Solutions Engineering team works with Posit's customers to help them
-deploy, configure, and use Posit's professional software and open-source
-tooling in R and Python.
+Alex K Gold leads the Solutions Engineering at Posit, formerly RStudio.
In his free time, he enjoys landscaping, handstands, and Tai Chi.
He occasionally blogs about data, management, and leadership at
-[alexkgold.space](http://alexkgold.space).
+alexkgold.space.
## Acknowledgments {.unnumbered}
-I have so many people to thank for their help in getting this book out
-the door.
-
-The biggest thanks to current and former members of the Solutions
-Engineering Team at Posit, who taught me so much about DevOps, Data
-Science, and how to be a great team.
-
-Thanks to my family, especially my brother, who is a great brother and
-cared enough about this project to insist he appear in the
-acknowledgments.
-
-Thanks to Randi Cohen at Taylor and Francis, who has been great to work
-with, and to my editor, Linda Kahn, who's always been more than an
-editor to me.
-
-Most of all, thanks to Shoshana for helping me live my best life.
+Thank you to current and former members of the Solutions Engineering
+team at Posit, who taught me most of what's in this book.
Huge thanks to the R4DS book club, especially Jon Harmon, Gus Lipkin,
and Tinashe Michael Tapera, who read an early (and rough!) copy of this
@@ -79,14 +28,15 @@ Thanks to all others who provided improvements that ended up in this
book (in alphabetical order): Carl Boettinger, Jon Harmon, Gus Lipkin,
and Leungi.
-## Color palette
-
-Tea Green: #CAFFDO
-
-Steel Blue: #3E7CB1
+Thanks to Randi Cohen at Taylor and Francis and to Linda Kahn, who's
+always been more than an editor to me.
-Kombu Green: #273c2c
+Thanks to Eliot who cared enough about this project to insist he appear
+in the acknowledgments. Most of all, thanks to Shoshana for helping me
+live my best life.
-Bright Maroon: #B33951
+## Software information {.unnumbered}
-Sandy Brown: #FCAA67
+This book was written using the Quarto publishing system and was
+published to the web using GitHub Actions from
+[rOpenSci](https://github.com/orchid00/actions_sandbox).