From 512d461d6d5a4b68c89aeb3d39f448069c5784be Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Wed, 25 Oct 2023 15:09:25 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- chapters/append/cheatsheets.html | 1449 ++++++++++++++++++++-------- chapters/append/lab-map.html | 45 +- chapters/sec1/1-1-env-as-code.html | 4 +- chapters/sec1/1-2-proj-arch.html | 4 +- chapters/sec1/1-3-data-access.html | 6 +- chapters/sec1/1-4-monitor-log.html | 6 +- chapters/sec1/1-5-deployments.html | 4 +- search.json | 62 +- sitemap.xml | 54 +- 10 files changed, 1167 insertions(+), 469 deletions(-) diff --git a/.nojekyll b/.nojekyll index e625decc..ecd1f1f9 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -07824c48 \ No newline at end of file +869dbe33 \ No newline at end of file diff --git a/chapters/append/cheatsheets.html b/chapters/append/cheatsheets.html index a388af1c..3568fa4d 100644 --- a/chapters/append/cheatsheets.html +++ b/chapters/append/cheatsheets.html @@ -360,33 +360,33 @@

Table of contents

@@ -426,14 +426,14 @@

-

D.1 Environments as Code

+

D.1 Environments as code

D.1.1 Checking library + repository status

---+++ @@ -444,107 +444,256 @@

-

- + +
Check whether library in sync with lockfile.renv::status()Check whether library is in sync with lockfile.re nv::status() None
-

D.1.2 Creating and Using a Standalone Project Library

+

D.1.2 Creating and using a standalone project library

Make sure you’re in a standalone project library.

+ ---+++ - - - + + + - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + + - - - + + + +
StepR CommandPython Command +

+Step +

+
+

+R Command +

+
+

+Python Command +

+

Create a standalone library.

-

Tip: Make sure you’ve got {renv}/{venv}: in stall.packages("renv") {venv} included w/ Python 3.5+

renv::init()

python -m venv <dir>

-

Convention: use.venv for <dir>

Activate project library.

renv::activate()

-

Happens automatically if in RStudio project.

source <dir> /bin/activate +

+Create a standalone library. +

+
+

+renv::init() +

+

+Tip: get {renv} w/ install.p ackages(“renv”) +

+
+

+p ython -m venv <dir> +

+

+Convention: use.venv for <dir> +

+

+Tip: {venv} included w/ Python 3.5+ +

+
+

+Activate project library. +

+
+

+r env::activate() +

+

+Happens automatically if in RStudio project. +

+
+

+source <dir> /bin/activate +

+
Install packages as normal.install.p ackages("<pkg>")python -m pip install <pkg>
Snapshot package state.renv::snapshot()pip freez e > requirements.txt +

+Install packages as normal. +

+
+

+install.pa ckages(“<pkg>”) +

+
+

+python - m pip install <pkg> +

+
+

+Snapshot package state. +

+
+

+r env::snapshot() +

+
+

+pip freeze > requirements.txt +

+
Exit project environment.Leave R project or re nv::deactivate()deactivate +

+Exit project environment. +

+
+

+Leave R project or re n v::deactivate() +

+
+

+deactivate +

+

D.1.3 Collaborating on someone else’s project

Start by downloading the project into a directory on your machine.

+ ---+++ - - - + + + - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + + +
StepR CommandPython Command +

+Step +

+
+

+R Command +

+
+

+Python Command +

+
Move into project directory.

set wd ("< project-dir>")

-

Or open project in RStudio.

cd <project-dir>
Create project environment.renv::init()

python -m venv <dir>

-

Recommend: use .venv for <dir>

+

+Move into project directory. +

+
+

+set wd (“< p roject-dir>”) +

+

+Or open project in RStudio. +

+
+

+cd <project-dir> +

+
+

+Create project environment. +

+
+

+renv::init() +

+
+

+python -m venv <dir> +

+

+Recommend: use .venv for <dir> +

+
Enter project environment.Happens automatically or re nv::activate()source <dir> /bin/activate
Restore packages.Happens automatically or r env::restore()pi p install -r requirements.txt +

+Enter project environment. +

+
+

+Happens automatically or ren v::activate() +

+
+

+source <dir> /bin/activate +

+
+

+Restore packages. +

+
+

+Happens automatically or re nv::restore() +

+
+

+pip install -r requirements.txt +

+
-

D.2 HTTP Code Cheatsheet

-

As you work more with HTTP traffic, you’ll learn some of the common codes. Here’s a cheatsheet for some of the most frequent you’ll see.

+

D.2 HTTP code cheatsheet

+

As you work with HTTP traffic, you’ll learn some of the common codes. Here’s are some of those used most frequently.

--++ @@ -554,39 +703,39 @@

-

+ - + - + - + - + - + - + - + - + @@ -596,8 +745,8 @@

D.3 Git

200\(200\) Everyone’s favorite, a successful response.
3xx\(\text{3xx}\) Your query was redirected somewhere else, usually ok.
4xx\(\text{4xx}\) Errors with the request
400\(400\) Bad request. This isn’t a request the server can understand.
401 and 403\(401\)/\(403\) Unauthorized or forbidden. Required authentication hasn’t been provided.
404\(404\) Not found. There isn’t any content to access here.
5xx\(\text{5xx}\) Errors with the server once your request got there.
500\(500\) Generic server-side error. Your request was received, but there was an error processing it.
504\(504\) Gateway timeout. This means that a proxy or gateway between you and the server you’re trying to access timed out before it got a response from the server.
--++ @@ -610,30 +759,30 @@

- + - + - + - + - + - + @@ -642,80 +791,222 @@

D.4 Docker

-

D.4.1 Docker CLI Commands

+

D.4.1 Docker CLI commands

+

git add <files/dir>Add files/dir to staging area.Add files/directory to staging area.
git commit -m <message> Commit staging area.
git p ush origin <branch>git push origin <branch> Push to a remote.
git p ull origin <branch>git pull origin <branch> Pull from a remote.
git che ckout <branch name>git checkout <branch name> Checkout a branch.
git checko ut -b <branch name>git checkout -b <branch name> Create and checkout a branch.
git bran ch -d <branch name>git branch -d <branch name> Delete a branch.
----++++ - - - - + + + + - - - - - - - - - - + + + + + + + + + + - - - - - - - - - - + + + + + + + + + + - - - - - - - - - - + + + + + + + + + + - - - - - - - - - - + + + + + + + + + + +
S ta geCommandWhat it doesNotes and helpful options +

+Stage +

+
+

+Command +

+
+

+What it does +

+
+

+Notes and helpful options +

+
B ui lddocker b uild <directory>Builds a directory into an image.

-t <name:tag> provides a name to the container.

-

tag is optional, defaults to latest.

Mo vedoc ker push <image>Push a container to a registry. +

+Build +

+
+

+docker build <directory> +

+
+

+Builds a directory into an image. +

+
+

+-t <name:tag> provides a name to the container. +

+

+tag is optional, defaults to latest. +

+
+

+Move +

+
+

+docker push <image> +

+
+

+Push a container to a registry. +

+
+
Mo vedoc ker pull <image>Pull a container from a registry.Rarely needed because run pulls the container if needed.
R undo cker run <image>Run a container.See flags in next table. +

+Move +

+
+

+docker pull <image> +

+
+

+Pull a container from a registry. +

+
+

+Rarely needed because run pulls the container if needed. +

+
+

+Run +

+
+

+docker run <image> +

+
+

+Run a container. +

+
+

+See flags in next table. +

+
R undocker stop <container>Stop a running container.docker kill can be used if stop fails.
R undocker psList running containers.Useful to get container id to do things to it. +

+Run +

+
+

+docker stop <container> +

+
+

+Stop a running container. +

+
+

+docker kill can be used if stop fails. +

+
+

+Run +

+
+

+docker ps +

+
+

+List running containers. +

+
+

+Useful to get container id to do things to it. +

+
R undocker exec <cont ainer> <command>Run a command inside a running container.Basically always used to open a shell with docker exec -it <container> /bin/bash
R undocker logs <container>Views logs for a container. +

+Run +

+
+

+docker exec <cont aine r> <command> +

+
+

+Run a command inside a running container. +

+
+

+Basically always used to open a shell with d ocker exec -it <co ntainer> /bin/bash +

+
+

+Run +

+
+

+docker logs <container> +

+
+

+Views logs for a container. +

+
+

D.4.2 Flags for docker run

---+++ @@ -724,7 +1015,7 @@

<

- + @@ -739,12 +1030,12 @@

<

- - + + - + @@ -753,13 +1044,13 @@

<

Reminder: Order for -p and -v is <host>:<container>

-

D.4.3 Dockerfile Commands

+

D.4.3 Dockerfile commands

These are the commands that go in a Dockerfile when you’re building it.

Notes
--n ame <name>--name <name> Give a name to container. Optional. Auto-assigned if not provided
Almost always used in production.
-p <po rt>:<port>Publish port from inside running inside container to outside.-p <port>:<port>Publish port from inside running container to outside. Needed if you want to access an app or API inside the container.
-v< dir>:<dir>-v <dir>:<dir> Mount volume into the container.
-+-+ @@ -792,13 +1083,13 @@

-

D.5 Cloud Services

+

D.5 Cloud services

----++++ @@ -853,9 +1144,9 @@

-

D.6 Command Line

+

D.6 Command line

-

D.6.1 General Command Line

+

D.6.1 General command line

@@ -866,222 +1157,513 @@

<

- + - + - + - + - +
man <command>Open manual for commandOpen manual for command.
qQuit the current screenQuit the current screen.
\Continue bash command on new lineContinue bash command on new line.
ctrl + cQuit current executionQuit current execution.
echo <string>Print string (useful for piping)Print string (useful for piping).
-
-

D.6.2 Linux Navigation

+
+

D.6.2 Linux filesystem navigation

+ ---+++ - - - + + + - - - + + + - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + + +
CommandWhat it does/isNotes + Helpful options +

+Command +

+
+

+What it does/is +

+
+

+Notes + Helpful options +

+
/System root or file path separator +

+/ +

+
+

+System root or file path separator. +

+
+
.current working directory
..Parent of working directory +

+. +

+
+

+Current working directory. +

+
+
+

+.. +

+
+

+Parent of working directory. +

+
+
~Home directory of the current user
ls <dir>List objects in a directory

-l - format as list

-

-a - all (include hidden files that start with .)

+

+~ +

+
+

+Home directory of the current user. +

+
+
+

+ls <dir> +

+
+

+List objects in a directory. +

+
+

+-l - format as list +

+

+-a - all (include hidden files that start with .) +

+
pwdPrint working directory
cd <dir>Change directoryCan use relative or absolute paths +

+pwd +

+
+

+Print working directory. +

+
+
+

+cd <dir> +

+
+

+Change directory. +

+
+

+Can use relative or absolute paths. +

+
-

D.6.3 Reading Text Files

+

D.6.3 Reading text files

+ ---+++ - - - + + + - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + + - - - + + + +
CommandWhat it doesNotes + Helpful options +

+Command +

+
+

+What it does +

+
+

+Notes + Helpful options +

+
cat <file>Print a file from the top.
less <file>Print a file, but just a little.

Can be very helpful to look at a few rows of csv.

-

Lazily reads lines, so can be much faster than cat for big files.

+

+cat <file> +

+
+

+Print a file from the top. +

+
+
+

+less <file> +

+
+

+Print a file, but just a little. +

+
+

+Can be very helpful to look at a few rows of csv. +

+

+Lazily reads lines, so can be much faster than cat for big files. +

+
head <file>Look at the beginning of a file.Defaults to 10 lines, can specify a different number with -n <n>.
tail <file>Look at the end of a file.

Useful for logs where the newest part is last.

-

The -f flag is useful to follow for a live view.

+

+head <file> +

+
+

+Look at the beginning of a file. +

+
+

+Defaults to 10 lines, can specify a different number with -n <n>. +

+
+

+tail <file> +

+
+

+Look at the end of a file. +

+
+

+Useful for logs where the newest part is last. +

+

+The -f flag is useful to follow for a live view. +

+
gre p <expression>Search a file using regex.

Writing regex can be a pain. I suggest testing on \(\text{regex101.com}\).

-

Often useful in combination with the pipe.

|The pipe +

+grep <expression> +

+
+

+Search a file using regex. +

+
+

+Writing regex can be a pain. I suggest testing on . +

+

+Often useful in combination with the pipe. +

+
+

+| +

+
+

+The pipe. +

+
+
wc <file>Count words in a fileUse -l to count lines, useful for .csv files. +

+wc <file> +

+
+

+Count words in a file. +

+
+

+Use -l to count lines, useful for .csv files. +

+
-

D.6.4 Manipulating Files

+

D.6.4 Manipulating files

+ --++ - - - + + + - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + + - - - - - - - - + + + +
CommandWhat it doesNotes + Helpful Options +

+Command +

+
+

+What it does +

+
+

+Notes + Helpful Options +

+
rm <path>Remove

-r - recursively remove everything below a file path

-

-f - force - don’t ask for each file

-

Be very careful, it’s permanent

+

+rm <path> +

+
+

+Remove. +

+
+

+-r - recursively remove everything below a file path +

+

+-f - force - dont ask for each file +

+

+Be very careful, its permanent! +

+
+

+cp <from> <to> +

+
+

+Copy. +

+
+
c p <from> <to>Copy
m v <from> <to>Move +

+mv <from> <to> +

+
+

+Move. +

+
+
+

+* +

+
+

+Wildcard. +

+
+
*Wildcard
mkdir/rmdirMake/remove directory-p - create any parts of path that don’t exist +

+mkdir/rmdir +

+
+

+Make/remove directory. +

+
+

+-p - create any parts of path that dont exist +

+

D.6.5 Move things to/from server

+ ---+++ - - - + + + - - - - - - - - + + + + + + + + +
CommandWhat it doesNotes + Helpful Options +

+Command +

+
+

+What it does +

+
+

+Notes + Helpful Options +

+
tarCreate/extract archive file

Almost always used with flags.

-

Create is usually tar -czf <a rchive name> <file(s)>

-

Extract is usually t ar -xfv <archive name>

scpSecure copy via ssh

Run on laptop to server

-

Can use most ssh flags (like -i and -v)

+

+tar +

+
+

+Create/extract archive file. +

+
+

+Almost always used with flags. +

+

+Create is usually tar -czf <archive name> <file(s)> +

+

+Extract is usually tar -xfv <archive name> +

+
+

+scp +

+
+

+Secure copy via ssh. +

+
+

+Run on laptop to server. +

+

+Can use most ssh flags (like -i and -v). +

+

D.6.6 Write files from the command line

-+ @@ -1099,24 +1681,24 @@

- - + + - - + +
>Overwrite file contentsCreates a new file if it doesn’t existOverwrite file contents.Creates a new file if it doesn’t exist.
>>Concatenate to end of fileCreates a new file if it doesn’t existConcatenate to end of file.Creates a new file if it doesn’t exist.
-

D.6.7 Command Line Text Editors (Vim + Nano)

+

D.6.7 Command line text editors (Vim + Nano)

-+-+ @@ -1129,11 +1711,11 @@

- + - + @@ -1143,26 +1725,27 @@

- - + + - - + +
^ Prefix for file command in nano editor.Its the or Ctrl key, not the caret symbol.It’s the or Ctrl key, not the caret symbol.
iEnter insert mode (able to type) in vimEnter insert mode (able to type) in vim.
:wWrite the current file in vim (from normal mode)Can be combined to save and quit in one, :wqWrite the current file in vim (from normal mode).Can be combined to save and quit in one, :wq.
:qQuit vim (from normal mode):q! quit without savingQuit vim (from normal mode).:q! quit without saving.
-

D.7 ssh

+

D.7 SSH

+

General usage:

ssh <user>@<host>
-+-+ @@ -1179,21 +1762,21 @@

- +
-iChoose identity file (private key)Choose identity file (private key). Not necessary with default key names.
-

D.8 Linux Admin

+

D.8 Linux admin

D.8.1 Users

---+++ @@ -1202,7 +1785,7 @@

-

+ @@ -1227,9 +1810,9 @@

-

- - + + +
s u <username>su <username> Change to be a different user.
usermo d <username>Modify user username-aG <group> adds to a group (e.g. sudo)usermo d <username>Modify user username.-aG <group> adds to a group (e.g.,sudo)
@@ -1238,9 +1821,9 @@

D.8.2 Permissions

-+-+ @@ -1251,14 +1834,14 @@

- + - + - + - + @@ -1272,8 +1855,8 @@

D.8.3 Install applications (Ubuntu)

chmod <permissions> <file>chmod <permissions> <file> Modifies permissions on a file or directory.Number indicates permissions for user, group, others: add 4 for read, 2 for write, 1 for execute, 0 for nothing, e.g. 644.Number indicates permissions for user, group, others: add 4 for read, 2 for write, 1 for execute, 0 for nothing, e.g.,644.
chow n <user/group> <file>chown <user/group> <file> Change the owner of a file or directory.Can be used for user or group, e.g. :my-group.Can be used for user or group, e.g.,:my-group.
sudo <command>
--++ @@ -1281,7 +1864,7 @@

-

+ @@ -1301,40 +1884,80 @@

D.8.4 Storage

+
apt-get u pdate && apt-get upgrade -yapt-get update && apt-get upgrade -y Fetch and install upgrades to system packages
---+++ - - - + + + - - - + + + - - - + + + +
CommandWhat it doesHelpful options +

+Command +

+
+

+What it does +

+
+

+Helpful options +

+
dfCheck storage space on device.-h for human readable file sizes. +

+df +

+
+

+Check storage space on device. +

+
+

+-h for human readable file sizes. +

+
duCheck size of files.

Most likely to be used as d u - h <dir> | sort -h

-

Also useful to combine with head.

+

+du +

+
+

+Check size of files. +

+
+

+Most likely to be used as du -h <dir> | sort -h +

+

+Also useful to combine with head. +

+

D.8.5 Processes

--++ @@ -1353,7 +1976,7 @@

- + @@ -1365,38 +1988,78 @@

D.8.6 Networking

+

ps aux See all system processes.Consider using --sort and pipe into head or grepConsider using --sort and pipe into head or grep.
kill
---+++ - - - + + + - - - - - - - - + + + + + + + + +
CommandWhat it doesHelpful Options +

+Command +

+
+

+What it does +

+
+

+Helpful Options +

+
netstatSee ports and services using them.Usually used with -tlp, for tcp listening applications, including pid
ssh -L <port>:<i p>:<port>:<host>Port forwards a remote port on remote host to local.

Remote ip is usually localhost.

-

Choose local port to match remote port.

+

+netstat +

+
+

+See ports and services using them. +

+
+

+Usually used with -tlp, for tcp listening applications, including pid. +

+
+

+ssh -L <port>:<i p>:<port>:<host> +

+
+

+Port forwards a remote port on remote host to local. +

+
+

+Remote ip is usually localhost. +

+

+Choose local port to match remote port. +

+

D.8.7 The path

--++ @@ -1408,8 +2071,8 @@

-

- + +
ln -s <location to l ink>:<location of symlink>Creates a symlink from file at location to link to location of symlink.ln -s <linked location>:<where to put symlink>Creates a symlink from file/directory at linked location to where to put symlink.
@@ -1420,8 +2083,8 @@

+ - @@ -1432,7 +2095,7 @@

status -Report status +Report status. start @@ -1444,61 +2107,87 @@

restart -stop then start +stop then start. reload -Reload configuration that doesn’t require restart (depends on service) +Reload configuration that doesn’t require restart (depends on service). enable -Daemonize the service +Daemonize the service. disable -Un-daemonize the service +Un-daemonize the service.

-

D.9 IP Addresses and Ports

+

D.9 IP Addresses and ports

D.9.1 Special IP Addresses

+ --++ - - + + - - + + - - + + +
AddressMeaning +

+Address +

+
+

+Meaning +

+
\(\text{127.0.0.1}\)\(\text{localhost}\) or loopback – the machine that originated the request. +

+

+
+

+or loopback – the machine that originated the request. +

+

\(\text{192.168.x.x}\)

-

\(\text{172.16.x.x.x}\)

-

\(\text{10.x.x.x}\)

Protected address blocks used for private IP addresses. +

+

+

+

+

+

+
+

+Protected address blocks used for private IP addresses. +

+
-

D.9.2 Special Ports

+

D.9.2 Special ports

All ports below \(1024\) are reserved for server tasks and cannot be assigned to admin-controlled services.

- - + + diff --git a/chapters/append/lab-map.html b/chapters/append/lab-map.html index 93768180..fd7dd4ce 100644 --- a/chapters/append/lab-map.html +++ b/chapters/append/lab-map.html @@ -346,8 +346,8 @@

Ap

This section aims to clarify the relationship between the assets you’ll make in each portfolio exercise and labs in this book.

Protocol/ApplicationDefault PortProtocol/applicationDefault port
--++ @@ -358,22 +358,22 @@

Ap

- + - - - - + + + + - + @@ -382,27 +382,36 @@

Ap

- + + + + + - + + + + + - - + + - - + + - - + + - - + +
Chapter 1: Environments as CodeCreate a Quarto side that uses {renv} and {venv} to create standalone R and Python virtual environments, create a page on the website for each.Create a Quarto site that uses {renv} and {venv} to create standalone R and Python virtual environments. Add an R EDA page and Python modeling.
Chapter 3: Data ArchitectureMove data into a DuckDB database.
Chapter 2: Project Architecture Create an API that serves a Python machine-learning model using {vetiver} and {fastAPI}. Call that API from a Shiny App in both R and Python.
Chapter 3: Data ArchitectureMove data into a DuckDB database and serve model predictions from an API.
Chapter 4: Logging and Monitoring Add logging to the app from Chapter 2.
Chapter 5: Code PromotionChapter 5: Deployments Put a static Quarto site up on GitHub Pages using GitHub Actions that renders the project.
Chapter 7: CloudStand up an EC2 instance. Put model into S3.

Stand up an EC2 instance.

+

Put the model into S3.

Chapter 8: Command LineLog into the server with .pem key and create SSH key.
Chapter 9: Linux AdminAdd R, Python, RStudio Server, JupyterHub, palmer penguin fastAPI + App.Create a user on the server and add SSH key.
Chapter 10: Application AdminAdd R, Python, RStudio Server, JupyterHub, API, and App to EC2 instance from Chapter 7.
Chapter 12: NetworkingAdd proxy (nginx) to reach all services from the web.Chapter 11: ScalingResize the server.
Chapter 13: DNSAdd a real URL to the EC2 instance. Put the Shiny app into an iFrame on the site.Chapter 12: NetworkingAdd proxy (NGINX) to reach all services from the web.
Chapter 14: SSLAdd SSL/HTTPS to the EC2 instance.Chapter 13: DNSAdd a URL to the EC2 instance. Put the Shiny app into an iFrame on the Quarto site.
Chapter 11: ServersResize servers.Chapter 14: SSLAdd SSL/HTTPS to the EC2 instance.
diff --git a/chapters/sec1/1-1-env-as-code.html b/chapters/sec1/1-1-env-as-code.html index 04e4dad2..eb7cbbbc 100644 --- a/chapters/sec1/1-1-env-as-code.html +++ b/chapters/sec1/1-1-env-as-code.html @@ -367,7 +367,7 @@

Table of contents

  • 1.4 What’s happening under the hood
  • 1.5 Comprehension Questions
  • -
  • 1.6 Lab 1: Create and use a virtual environment +
  • 1.6 Lab: Create and use a virtual environment
  • -

    1.6 Lab 1: Create and use a virtual environment

    +

    1.6 Lab: Create and use a virtual environment

    In this lab, we will start working on our penguin explorer website. We will create a simple website using Quarto, an open-source scientific and technical publishing system that makes it easy to render R and Python code into beautiful documents, websites, reports, and presentations.

    We will create pages for a simple exploratory data analysis and model building from the Palmer Penguins dataset. To get to practice with both R and Python, I’m going to do the EDA page in R and the modeling in Python. By the end of this lab, we’ll have both pages created using standalone Python and R virtual environments.

    If you’re starting, check out the Quarto website to use Quarto in the editor of your choice.

    diff --git a/chapters/sec1/1-2-proj-arch.html b/chapters/sec1/1-2-proj-arch.html index 8713af07..de862135 100644 --- a/chapters/sec1/1-2-proj-arch.html +++ b/chapters/sec1/1-2-proj-arch.html @@ -392,7 +392,7 @@

    Table of contents

  • 2.8 Create an API if you need it
  • 2.9 Write a data flow chart
  • 2.10 Comprehension Questions
  • -
  • 2.11 Lab 2: Build the processing layer +
  • 2.11 Lab: Build the processing layer
  • -

    2.11 Lab 2: Build the processing layer

    +

    2.11 Lab: Build the processing layer

    In Chapter 1, we did some EDA of the Palmer Penguins data set and built an ML model. In this lab, we will take that work we did and turn it into the actual presentation layer for our project.

    2.11.1 Step 1: Write the model outside the bundle

    diff --git a/chapters/sec1/1-3-data-access.html b/chapters/sec1/1-3-data-access.html index 824008b0..52763bdc 100644 --- a/chapters/sec1/1-3-data-access.html +++ b/chapters/sec1/1-3-data-access.html @@ -379,7 +379,7 @@

    Table of contents

  • 3.5 Data Connection Packages
  • 3.6 Comprehension Questions
  • -
  • 3.7 Lab 3: Use a database and an API +
  • 3.7 Lab: Use a database and an API
  • -
    -

    3.7 Lab 3: Use a database and an API

    +
    +

    3.7 Lab: Use a database and an API

    In this lab, we will build the data and the presentation layers for our penguin mass model exploration. We’re going to create an app to explore the model, which will look like this:

    Let’s start by moving the data into an actual data layer.

    diff --git a/chapters/sec1/1-4-monitor-log.html b/chapters/sec1/1-4-monitor-log.html index 65e42066..7610787d 100644 --- a/chapters/sec1/1-4-monitor-log.html +++ b/chapters/sec1/1-4-monitor-log.html @@ -370,7 +370,7 @@

    Table of contents

  • 4.3.1 Working with Metrics
  • 4.4 Comprehension Questions
  • -
  • 4.5 Lab 4: An App with Logging
  • +
  • 4.5 Lab: An App with Logging
  • @@ -531,8 +531,8 @@

    -
    -

    4.5 Lab 4: An App with Logging

    +
    +

    4.5 Lab: An App with Logging

    Let’s return to the last lab’s prediction generator app and add a little logging. This is easy in both R and Python. We declare that we’re using the logger and then put logging statements into our code.

    I decided to log when the app starts, just before and after each request, and an error logger if an HTTP error code comes back from the API.

    With the logging now added, here’s what the app looks like in R:

    diff --git a/chapters/sec1/1-5-deployments.html b/chapters/sec1/1-5-deployments.html index 6a60a268..c2dc4267 100644 --- a/chapters/sec1/1-5-deployments.html +++ b/chapters/sec1/1-5-deployments.html @@ -369,7 +369,7 @@

    Table of contents

  • 5.3.1 Configuring per-environment behavior
  • 5.4 Comprehension Questions
  • -
  • 5.5 Lab 5: Host a website with automatic updates
  • +
  • 5.5 Lab: Host a website with automatic updates
  • @@ -523,7 +523,7 @@

    -

    5.5 Lab 5: Host a website with automatic updates

    +

    5.5 Lab: Host a website with automatic updates

    In labs 1 through 4, you’ve created a Quarto website for the penguin model. You’ve got sections on EDA and model building. But it’s still just on your computer.

    In this lab, we will deploy that website to a public site on GitHub and set up GitHub Actions as CI/CD so the EDA and modeling steps re-render every time we make changes.

    Before we get into the meat of the lab, there are a few things you need to do on your own. If you don’t know how, there are plenty of great tutorials online.

    diff --git a/search.json b/search.json index 06c35691..323868eb 100644 --- a/search.json +++ b/search.json @@ -122,8 +122,8 @@ "objectID": "chapters/sec1/1-1-env-as-code.html#lab1", "href": "chapters/sec1/1-1-env-as-code.html#lab1", "title": "1  Environments as Code", - "section": "1.6 Lab 1: Create and use a virtual environment", - "text": "1.6 Lab 1: Create and use a virtual environment\nIn this lab, we will start working on our penguin explorer website. We will create a simple website using Quarto, an open-source scientific and technical publishing system that makes it easy to render R and Python code into beautiful documents, websites, reports, and presentations.\nWe will create pages for a simple exploratory data analysis and model building from the Palmer Penguins dataset. To get to practice with both R and Python, I’m going to do the EDA page in R and the modeling in Python. By the end of this lab, we’ll have both pages created using standalone Python and R virtual environments.\nIf you’re starting, check out the Quarto website to use Quarto in the editor of your choice.\n\n\n\n\n\n\nTip\n\n\n\nEnsure you add each page below to your _quarto.yml so Quarto knows to render them.\n\n\n\n1.6.1 EDA in R\nLet’s add a simple R-language EDA of the Palmer Penguins data set to our website by adding a file called eda.qmd in the project’s root directory.\nBefore you add code, create and activate an {renv} environment with renv::init().\nNow, go ahead and do your analysis. Here’s the contents of my eda.qmd.\n\n\neda.qmd\n\n---\ntitle: \"Penguins EDA\"\nformat:\n html:\n code-fold: true\n---\n\n## Penguin Size and Mass by Sex and Species\n\n```{r}\nlibrary(palmerpenguins)\nlibrary(dplyr)\nlibrary(ggplot2)\n\ndf <- palmerpenguins::penguins\n```\n\n```{r}\ndf %>%\n group_by(species, sex) %>%\n summarise(\n across(\n where(is.numeric), \n \\(x) mean(x, na.rm = TRUE)\n )\n ) %>%\n knitr::kable()\n```\n\n## Penguin Size vs Mass by Species\n\n```{r}\ndf %>%\n ggplot(aes(x = bill_length_mm, y = body_mass_g, color = species)) +\n geom_point() + \n geom_smooth(method = \"lm\")\n```\n\nFeel free to copy this Quarto doc into your website or to write your own.\nOnce you’ve finished writing your EDA script and checked that it previews nicely into the website, save the doc, and create your lockfile with renv::snapshot().\n\n\n1.6.2 Modeling in Python\nNow let’s build a {scikit-learn} model for predicting penguin weight based on bill length in a Python notebook by adding a model.qmd to the root of our project.\nAgain, you’ll want to create and activate your virtual environment before you start pip install-ing packages.\nHere’s what’s in my model.qmd, but you should feel free to include whatever you want.\n\n\nmodel.qmd\n\n---\ntitle: \"Model\"\nformat:\n html:\n code-fold: true\n---\n\n```{python}\nfrom palmerpenguins import penguins\nfrom pandas import get_dummies\nimport numpy as np\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn import preprocessing\n```\n\n## Get Data\n\n```{python}\ndf = penguins.load_penguins().dropna()\n\ndf.head(3)\n```\n\n## Define Model and Fit\n\n```{python}\nX = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)\ny = df['body_mass_g']\n\nmodel = LinearRegression().fit(X, y)\n```\n\n## Get some information\n\n```{python}\nprint(f\"R^2 {model.score(X,y)}\")\nprint(f\"Intercept {model.intercept_}\")\nprint(f\"Columns {X.columns}\")\nprint(f\"Coefficients {model.coef_}\")\n```\n\nOnce you’re happy with how the page works, capture your dependencies in a requirements.txt using pip freeze > requirements.txt on the command line." + "section": "1.6 Lab: Create and use a virtual environment", + "text": "1.6 Lab: Create and use a virtual environment\nIn this lab, we will start working on our penguin explorer website. We will create a simple website using Quarto, an open-source scientific and technical publishing system that makes it easy to render R and Python code into beautiful documents, websites, reports, and presentations.\nWe will create pages for a simple exploratory data analysis and model building from the Palmer Penguins dataset. To get to practice with both R and Python, I’m going to do the EDA page in R and the modeling in Python. By the end of this lab, we’ll have both pages created using standalone Python and R virtual environments.\nIf you’re starting, check out the Quarto website to use Quarto in the editor of your choice.\n\n\n\n\n\n\nTip\n\n\n\nEnsure you add each page below to your _quarto.yml so Quarto knows to render them.\n\n\n\n1.6.1 EDA in R\nLet’s add a simple R-language EDA of the Palmer Penguins data set to our website by adding a file called eda.qmd in the project’s root directory.\nBefore you add code, create and activate an {renv} environment with renv::init().\nNow, go ahead and do your analysis. Here’s the contents of my eda.qmd.\n\n\neda.qmd\n\n---\ntitle: \"Penguins EDA\"\nformat:\n html:\n code-fold: true\n---\n\n## Penguin Size and Mass by Sex and Species\n\n```{r}\nlibrary(palmerpenguins)\nlibrary(dplyr)\nlibrary(ggplot2)\n\ndf <- palmerpenguins::penguins\n```\n\n```{r}\ndf %>%\n group_by(species, sex) %>%\n summarise(\n across(\n where(is.numeric), \n \\(x) mean(x, na.rm = TRUE)\n )\n ) %>%\n knitr::kable()\n```\n\n## Penguin Size vs Mass by Species\n\n```{r}\ndf %>%\n ggplot(aes(x = bill_length_mm, y = body_mass_g, color = species)) +\n geom_point() + \n geom_smooth(method = \"lm\")\n```\n\nFeel free to copy this Quarto doc into your website or to write your own.\nOnce you’ve finished writing your EDA script and checked that it previews nicely into the website, save the doc, and create your lockfile with renv::snapshot().\n\n\n1.6.2 Modeling in Python\nNow let’s build a {scikit-learn} model for predicting penguin weight based on bill length in a Python notebook by adding a model.qmd to the root of our project.\nAgain, you’ll want to create and activate your virtual environment before you start pip install-ing packages.\nHere’s what’s in my model.qmd, but you should feel free to include whatever you want.\n\n\nmodel.qmd\n\n---\ntitle: \"Model\"\nformat:\n html:\n code-fold: true\n---\n\n```{python}\nfrom palmerpenguins import penguins\nfrom pandas import get_dummies\nimport numpy as np\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn import preprocessing\n```\n\n## Get Data\n\n```{python}\ndf = penguins.load_penguins().dropna()\n\ndf.head(3)\n```\n\n## Define Model and Fit\n\n```{python}\nX = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)\ny = df['body_mass_g']\n\nmodel = LinearRegression().fit(X, y)\n```\n\n## Get some information\n\n```{python}\nprint(f\"R^2 {model.score(X,y)}\")\nprint(f\"Intercept {model.intercept_}\")\nprint(f\"Columns {X.columns}\")\nprint(f\"Coefficients {model.coef_}\")\n```\n\nOnce you’re happy with how the page works, capture your dependencies in a requirements.txt using pip freeze > requirements.txt on the command line." }, { "objectID": "chapters/sec1/1-1-env-as-code.html#footnotes", @@ -206,8 +206,8 @@ "objectID": "chapters/sec1/1-2-proj-arch.html#lab2", "href": "chapters/sec1/1-2-proj-arch.html#lab2", "title": "2  Data Project Architecture", - "section": "2.11 Lab 2: Build the processing layer", - "text": "2.11 Lab 2: Build the processing layer\nIn Chapter 1, we did some EDA of the Palmer Penguins data set and built an ML model. In this lab, we will take that work we did and turn it into the actual presentation layer for our project.\n\n2.11.1 Step 1: Write the model outside the bundle\nWhen we originally wrote our model.qmd script, we didn’t save the model at all.\nOur model will likely be updated more frequently than our app, so we don’t want to store it in the app bundle. Later in the book, I’ll show you how to store it in the cloud. For now, I will store it in a directory on my computer.\nI will use the {vetiver} package, an R and Python package to version, deploy, and monitor a machine learning model.\nWe can take our existing model, turn it into a {vetiver} model, and save it to the /data/model folder with\n\n\nmodel.qmd\n\n\n```{python}\nfrom vetiver import VetiverModel\nv = VetiverModel(model, model_name='penguin_model', prototype_data=X)\n```\n\n## Save to Board\n\nIf /data/model doesn’t exist on your machine, you can create it, or use a directory that does exist.\n\n\n2.11.2 Step 2: Create an API for model predictions\nI’ll serve the model from an API to allow for real-time predictions.\nAs the point of this lab is to focus on the architecture, I’m just going to use the auto-generation capabilities of {vetiver}. If you want to improve at writing APIs, I encourage you to consult the documentation for {plumber} or {fastAPI}.\nIf you’ve closed your modeling code, you can get your model back from your pin with:\n\nb = pins.board_folder('data/model', allow_pickle_read=True)\nv = VetiverModel.from_pin(b, 'penguin_model')\n\nThen you can auto-generate a {fastAPI} from this model with\n\napp = VetiverAPI(v, check_prototype=True)\n\nYou can run this in your Python session with app.run(port = 8080). You can then access run your model API by navigating to http://localhost:8080 in your browser.\nYou can play around with the front end there, including trying the provided examples." + "section": "2.11 Lab: Build the processing layer", + "text": "2.11 Lab: Build the processing layer\nIn Chapter 1, we did some EDA of the Palmer Penguins data set and built an ML model. In this lab, we will take that work we did and turn it into the actual presentation layer for our project.\n\n2.11.1 Step 1: Write the model outside the bundle\nWhen we originally wrote our model.qmd script, we didn’t save the model at all.\nOur model will likely be updated more frequently than our app, so we don’t want to store it in the app bundle. Later in the book, I’ll show you how to store it in the cloud. For now, I will store it in a directory on my computer.\nI will use the {vetiver} package, an R and Python package to version, deploy, and monitor a machine learning model.\nWe can take our existing model, turn it into a {vetiver} model, and save it to the /data/model folder with\n\n\nmodel.qmd\n\n\n```{python}\nfrom vetiver import VetiverModel\nv = VetiverModel(model, model_name='penguin_model', prototype_data=X)\n```\n\n## Save to Board\n\nIf /data/model doesn’t exist on your machine, you can create it, or use a directory that does exist.\n\n\n2.11.2 Step 2: Create an API for model predictions\nI’ll serve the model from an API to allow for real-time predictions.\nAs the point of this lab is to focus on the architecture, I’m just going to use the auto-generation capabilities of {vetiver}. If you want to improve at writing APIs, I encourage you to consult the documentation for {plumber} or {fastAPI}.\nIf you’ve closed your modeling code, you can get your model back from your pin with:\n\nb = pins.board_folder('data/model', allow_pickle_read=True)\nv = VetiverModel.from_pin(b, 'penguin_model')\n\nThen you can auto-generate a {fastAPI} from this model with\n\napp = VetiverAPI(v, check_prototype=True)\n\nYou can run this in your Python session with app.run(port = 8080). You can then access run your model API by navigating to http://localhost:8080 in your browser.\nYou can play around with the front end there, including trying the provided examples." }, { "objectID": "chapters/sec1/1-2-proj-arch.html#footnotes", @@ -259,11 +259,11 @@ "text": "3.6 Comprehension Questions\n\nDraw two mental maps for connecting to a database, one using a database driver in a Python or R package vs an ODBC or JDBC driver. You should (at a minimum) include the nodes database package, DBI (R only), driver, system driver, ODBC, JDBC, and database.\nDraw a mental map for using an API from R or Python. You should (at a minimum) include nodes for {requests}/{httr2}, request, http verb/request method, headers, query parameters, body, json, response, and response code.\nHow can environment variables be used to keep secrets secure in your code?" }, { - "objectID": "chapters/sec1/1-3-data-access.html#lab-3-use-a-database-and-an-api", - "href": "chapters/sec1/1-3-data-access.html#lab-3-use-a-database-and-an-api", + "objectID": "chapters/sec1/1-3-data-access.html#lab-use-a-database-and-an-api", + "href": "chapters/sec1/1-3-data-access.html#lab-use-a-database-and-an-api", "title": "3  Using databases and data APIs", - "section": "3.7 Lab 3: Use a database and an API", - "text": "3.7 Lab 3: Use a database and an API\nIn this lab, we will build the data and the presentation layers for our penguin mass model exploration. We’re going to create an app to explore the model, which will look like this: \nLet’s start by moving the data into an actual data layer.\n\n3.7.1 Step 1: Put the data in DuckDB\nLet’s start by moving the data into a DuckDB database and use it from there for the modeling and EDA scripts.\nTo start, let’s load the data.\nHere’s what that looks like in R:\ncon <- DBI::dbConnect(duckdb::duckdb(), dbdir = \"my-db.duckdb\")\nDBI::dbWriteTable(con, \"penguins\", palmerpenguins::penguins)\nDBI::dbDisconnect(con)\nOr equivalently, in Python:\nimport duckdb\nfrom palmerpenguins import penguins\n\ncon = duckdb.connect('my-db.duckdb')\ndf = penguins.load_penguins()\ncon.execute('CREATE TABLE penguins AS SELECT * FROM df')\ncon.close()\nNow that the data is loaded, let’s adjust our scripts to use the database.\nIn R, we will replace our data loading with connecting to the database. Leaving out all the parts that don’t change, it looks like\n\n\neda.qmd\n\n\ncon <- DBI::dbConnect(\n duckdb::duckdb(), \n dbdir = \"my-db.duckdb\"\n )\ndf <- dplyr::tbl(con, \"penguins\")\n\nWe also need to call to DBI::dbDisconnect(con) at the end of the script.\nWe don’t have to change anything because we wrote our data processing code in {dplyr}. Under the hood, {dplyr} can switch seamlessly to a database backend, which is really cool.\n\n\neda.qmd\n\ndf %>%\n group_by(species, sex) %>%\n summarise(\n across(\n ends_with(\"mm\") | ends_with(\"g\"),\n \\(x) mean(x, na.rm = TRUE)\n )\n ) %>%\n dplyr::collect() %>%\n knitr::kable()\n\nIt’s unnecessary, but I’ve added a call to dplyr::collect() in line 31. It will be implied if I don’t put it there manually, but it helps make it obvious that all the work before there has been pushed off to the database. It doesn’t matter for this small dataset, but it could benefit a larger dataset.\nIn Python, we’re just going to load the entire dataset into memory for modeling, so the line loading the dataset changes to\n\n\nmodel.qmd\n\ncon = duckdb.connect('my-db.duckdb')\ndf = con.execute(\"SELECT * FROM penguins\").fetchdf().dropna()\ncon.close()\n\n\nNow let’s switch to figuring out the connection we’ll need to our processing layer in the presentation layer.\n\n\n3.7.2 Step 2: Call the model API from code\nBefore you start, ensure the API is running on your machine from the last lab.\n\n\n\n\n\n\nNote\n\n\n\nI’m assuming it’s running on port 8080 in this lab. If you’ve put it somewhere else, change the 8080 in the code below to match the port on your machine.\n\n\nIf you want to call the model in code, you can use any http request library. In R you should use httr2 and in Python you should use requests.\nHere’s what it looks like to call the API in Python\n\nimport requests\n\nreq_data = {\n \"bill_length_mm\": 0,\n \"species_Chinstrap\": False,\n \"species_Gentoo\": False,\n \"sex_male\": False\n}\nreq = requests.post('http://127.0.0.1:8080/predict', json = req_data)\nres = req.json().get('predict')[0]\n\nor equivalently in R\n\nreq <- httr2::request(\"http://127.0.0.1:8080/predict\") |>\n httr2::req_body_json(\n list(\n \"bill_length_mm\" = 0,\n \"species_Chinstrap\" = FALSE,\n \"species_Gentoo\" = FALSE,\n \"sex_male\" = FALSE\n )\n ) |>\n httr2::req_perform()\nres <- httr2::resp_body_json(r)$predict[[1]]\n\nNote that there’s no translation necessary to send the request. The {requests} and{httr2} packages automatically know what to do with the Python dictionary and the R list.\nGetting the result back takes more work to find the right spot in the JSON returned. This is quite common.\n\n\n\n\n\n\nNote\n\n\n\nThe {vetiver} package also includes the ability to auto-query a {vetiver} API. I’m not using it here to expose the details of calling an API.\n\n\nLet’s take this API-calling code and build the presentation layer around it.\n\n\n3.7.3 Step 3: Build a shiny app\nWe will use the {shiny} R and Python package for creating interactive web apps using just Python code. If you don’t know much about {shiny}, you can unthinkingly follow the examples here or spend time with the Mastering Shiny book to learn to use it yourself.\nEither way, an app that looks like the picture above would look like this in Python\n\n\napp.py\n\nfrom shiny import App, render, ui, reactive\nimport requests\n\napi_url = 'http://127.0.0.1:8080/predict'\n\napp_ui = ui.page_fluid(\n ui.panel_title(\"Penguin Mass Predictor\"), \n ui.layout_sidebar(\n ui.panel_sidebar(\n [ui.input_slider(\"bill_length\", \"Bill Length (mm)\", 30, 60, 45, step = 0.1),\n ui.input_select(\"sex\", \"Sex\", [\"Male\", \"Female\"]),\n ui.input_select(\"species\", \"Species\", [\"Adelie\", \"Chinstrap\", \"Gentoo\"]),\n ui.input_action_button(\"predict\", \"Predict\")]\n ),\n ui.panel_main(\n ui.h2(\"Penguin Parameters\"),\n ui.output_text_verbatim(\"vals_out\"),\n ui.h2(\"Predicted Penguin Mass (g)\"), \n ui.output_text(\"pred_out\")\n )\n ) \n)\n\ndef server(input, output, session):\n @reactive.Calc\n def vals():\n d = {\n \"bill_length_mm\" : input.bill_length(),\n \"sex_Male\" : input.sex() == \"Male\",\n \"species_Gentoo\" : input.species() == \"Gentoo\", \n \"species_Chinstrap\" : input.species() == \"Chinstrap\"\n\n }\n return d\n \n @reactive.Calc\n @reactive.event(input.predict)\n def pred():\n r = requests.post(api_url, json = vals())\n return r.json().get('predict')[0]\n\n @output\n @render.text\n def vals_out():\n return f\"{vals()}\"\n\n @output\n @render.text\n def pred_out():\n return f\"{round(pred())}\"\n\napp = App(app_ui, server)\n\nAnd like this in R\n\n\napp.R\n\nlibrary(shiny)\n\napi_url <- \"http://127.0.0.1:8080/predict\"\n\nui <- fluidPage(\n titlePanel(\"Penguin Mass Predictor\"),\n\n # Model input values\n sidebarLayout(\n sidebarPanel(\n sliderInput(\n \"bill_length\",\n \"Bill Length (mm)\",\n min = 30,\n max = 60,\n value = 45,\n step = 0.1\n ),\n selectInput(\n \"sex\",\n \"Sex\",\n c(\"Male\", \"Female\")\n ),\n selectInput(\n \"species\",\n \"Species\",\n c(\"Adelie\", \"Chinstrap\", \"Gentoo\")\n ),\n # Get model predictions\n actionButton(\n \"predict\",\n \"Predict\"\n )\n ),\n\n mainPanel(\n h2(\"Penguin Parameters\"),\n verbatimTextOutput(\"vals\"),\n h2(\"Predicted Penguin Mass (g)\"),\n textOutput(\"pred\")\n )\n )\n)\n\nserver <- function(input, output) {\n # Input params\n vals <- reactive(\n list(\n bill_length_mm = input$bill_length,\n species_Chinstrap = input$species == \"Chinstrap\",\n species_Gentoo = input$species == \"Gentoo\",\n sex_male = input$sex == \"Male\"\n )\n )\n\n # Fetch prediction from API\n pred <- eventReactive(\n input$predict,\n httr2::request(api_url) |>\n httr2::req_body_json(vals()) |>\n httr2::req_perform() |>\n httr2::resp_body_json(),\n ignoreInit = TRUE\n )\n\n # Render to UI\n output$pred <- renderText(pred()$predict[[1]])\n output$vals <- renderPrint(vals())\n}\n\n# Run the application\nshinyApp(ui = ui, server = server)\n\nOver the next few chapters, we will implement more architectural best practices for the app and eventually go to deployment." + "section": "3.7 Lab: Use a database and an API", + "text": "3.7 Lab: Use a database and an API\nIn this lab, we will build the data and the presentation layers for our penguin mass model exploration. We’re going to create an app to explore the model, which will look like this: \nLet’s start by moving the data into an actual data layer.\n\n3.7.1 Step 1: Put the data in DuckDB\nLet’s start by moving the data into a DuckDB database and use it from there for the modeling and EDA scripts.\nTo start, let’s load the data.\nHere’s what that looks like in R:\ncon <- DBI::dbConnect(duckdb::duckdb(), dbdir = \"my-db.duckdb\")\nDBI::dbWriteTable(con, \"penguins\", palmerpenguins::penguins)\nDBI::dbDisconnect(con)\nOr equivalently, in Python:\nimport duckdb\nfrom palmerpenguins import penguins\n\ncon = duckdb.connect('my-db.duckdb')\ndf = penguins.load_penguins()\ncon.execute('CREATE TABLE penguins AS SELECT * FROM df')\ncon.close()\nNow that the data is loaded, let’s adjust our scripts to use the database.\nIn R, we will replace our data loading with connecting to the database. Leaving out all the parts that don’t change, it looks like\n\n\neda.qmd\n\n\ncon <- DBI::dbConnect(\n duckdb::duckdb(), \n dbdir = \"my-db.duckdb\"\n )\ndf <- dplyr::tbl(con, \"penguins\")\n\nWe also need to call to DBI::dbDisconnect(con) at the end of the script.\nWe don’t have to change anything because we wrote our data processing code in {dplyr}. Under the hood, {dplyr} can switch seamlessly to a database backend, which is really cool.\n\n\neda.qmd\n\ndf %>%\n group_by(species, sex) %>%\n summarise(\n across(\n ends_with(\"mm\") | ends_with(\"g\"),\n \\(x) mean(x, na.rm = TRUE)\n )\n ) %>%\n dplyr::collect() %>%\n knitr::kable()\n\nIt’s unnecessary, but I’ve added a call to dplyr::collect() in line 31. It will be implied if I don’t put it there manually, but it helps make it obvious that all the work before there has been pushed off to the database. It doesn’t matter for this small dataset, but it could benefit a larger dataset.\nIn Python, we’re just going to load the entire dataset into memory for modeling, so the line loading the dataset changes to\n\n\nmodel.qmd\n\ncon = duckdb.connect('my-db.duckdb')\ndf = con.execute(\"SELECT * FROM penguins\").fetchdf().dropna()\ncon.close()\n\n\nNow let’s switch to figuring out the connection we’ll need to our processing layer in the presentation layer.\n\n\n3.7.2 Step 2: Call the model API from code\nBefore you start, ensure the API is running on your machine from the last lab.\n\n\n\n\n\n\nNote\n\n\n\nI’m assuming it’s running on port 8080 in this lab. If you’ve put it somewhere else, change the 8080 in the code below to match the port on your machine.\n\n\nIf you want to call the model in code, you can use any http request library. In R you should use httr2 and in Python you should use requests.\nHere’s what it looks like to call the API in Python\n\nimport requests\n\nreq_data = {\n \"bill_length_mm\": 0,\n \"species_Chinstrap\": False,\n \"species_Gentoo\": False,\n \"sex_male\": False\n}\nreq = requests.post('http://127.0.0.1:8080/predict', json = req_data)\nres = req.json().get('predict')[0]\n\nor equivalently in R\n\nreq <- httr2::request(\"http://127.0.0.1:8080/predict\") |>\n httr2::req_body_json(\n list(\n \"bill_length_mm\" = 0,\n \"species_Chinstrap\" = FALSE,\n \"species_Gentoo\" = FALSE,\n \"sex_male\" = FALSE\n )\n ) |>\n httr2::req_perform()\nres <- httr2::resp_body_json(r)$predict[[1]]\n\nNote that there’s no translation necessary to send the request. The {requests} and{httr2} packages automatically know what to do with the Python dictionary and the R list.\nGetting the result back takes more work to find the right spot in the JSON returned. This is quite common.\n\n\n\n\n\n\nNote\n\n\n\nThe {vetiver} package also includes the ability to auto-query a {vetiver} API. I’m not using it here to expose the details of calling an API.\n\n\nLet’s take this API-calling code and build the presentation layer around it.\n\n\n3.7.3 Step 3: Build a shiny app\nWe will use the {shiny} R and Python package for creating interactive web apps using just Python code. If you don’t know much about {shiny}, you can unthinkingly follow the examples here or spend time with the Mastering Shiny book to learn to use it yourself.\nEither way, an app that looks like the picture above would look like this in Python\n\n\napp.py\n\nfrom shiny import App, render, ui, reactive\nimport requests\n\napi_url = 'http://127.0.0.1:8080/predict'\n\napp_ui = ui.page_fluid(\n ui.panel_title(\"Penguin Mass Predictor\"), \n ui.layout_sidebar(\n ui.panel_sidebar(\n [ui.input_slider(\"bill_length\", \"Bill Length (mm)\", 30, 60, 45, step = 0.1),\n ui.input_select(\"sex\", \"Sex\", [\"Male\", \"Female\"]),\n ui.input_select(\"species\", \"Species\", [\"Adelie\", \"Chinstrap\", \"Gentoo\"]),\n ui.input_action_button(\"predict\", \"Predict\")]\n ),\n ui.panel_main(\n ui.h2(\"Penguin Parameters\"),\n ui.output_text_verbatim(\"vals_out\"),\n ui.h2(\"Predicted Penguin Mass (g)\"), \n ui.output_text(\"pred_out\")\n )\n ) \n)\n\ndef server(input, output, session):\n @reactive.Calc\n def vals():\n d = {\n \"bill_length_mm\" : input.bill_length(),\n \"sex_Male\" : input.sex() == \"Male\",\n \"species_Gentoo\" : input.species() == \"Gentoo\", \n \"species_Chinstrap\" : input.species() == \"Chinstrap\"\n\n }\n return d\n \n @reactive.Calc\n @reactive.event(input.predict)\n def pred():\n r = requests.post(api_url, json = vals())\n return r.json().get('predict')[0]\n\n @output\n @render.text\n def vals_out():\n return f\"{vals()}\"\n\n @output\n @render.text\n def pred_out():\n return f\"{round(pred())}\"\n\napp = App(app_ui, server)\n\nAnd like this in R\n\n\napp.R\n\nlibrary(shiny)\n\napi_url <- \"http://127.0.0.1:8080/predict\"\n\nui <- fluidPage(\n titlePanel(\"Penguin Mass Predictor\"),\n\n # Model input values\n sidebarLayout(\n sidebarPanel(\n sliderInput(\n \"bill_length\",\n \"Bill Length (mm)\",\n min = 30,\n max = 60,\n value = 45,\n step = 0.1\n ),\n selectInput(\n \"sex\",\n \"Sex\",\n c(\"Male\", \"Female\")\n ),\n selectInput(\n \"species\",\n \"Species\",\n c(\"Adelie\", \"Chinstrap\", \"Gentoo\")\n ),\n # Get model predictions\n actionButton(\n \"predict\",\n \"Predict\"\n )\n ),\n\n mainPanel(\n h2(\"Penguin Parameters\"),\n verbatimTextOutput(\"vals\"),\n h2(\"Predicted Penguin Mass (g)\"),\n textOutput(\"pred\")\n )\n )\n)\n\nserver <- function(input, output) {\n # Input params\n vals <- reactive(\n list(\n bill_length_mm = input$bill_length,\n species_Chinstrap = input$species == \"Chinstrap\",\n species_Gentoo = input$species == \"Gentoo\",\n sex_male = input$sex == \"Male\"\n )\n )\n\n # Fetch prediction from API\n pred <- eventReactive(\n input$predict,\n httr2::request(api_url) |>\n httr2::req_body_json(vals()) |>\n httr2::req_perform() |>\n httr2::resp_body_json(),\n ignoreInit = TRUE\n )\n\n # Render to UI\n output$pred <- renderText(pred()$predict[[1]])\n output$vals <- renderPrint(vals())\n}\n\n# Run the application\nshinyApp(ui = ui, server = server)\n\nOver the next few chapters, we will implement more architectural best practices for the app and eventually go to deployment." }, { "objectID": "chapters/sec1/1-3-data-access.html#footnotes", @@ -301,11 +301,11 @@ "text": "4.4 Comprehension Questions\n\nWhat is the difference between monitoring and logging? What are the two halves of the monitoring and logging process?\nLogging is generally good, but what are some things you should be careful not to log?\nAt what level would you log each of the following events:\n\nSomeone clicks on a particular tab in your Shiny app.\nSomeone puts an invalid entry into a text entry box.\nAn HTTP call your app makes to an external API fails.\nThe numeric values that are going into your computational function." }, { - "objectID": "chapters/sec1/1-4-monitor-log.html#lab-4-an-app-with-logging", - "href": "chapters/sec1/1-4-monitor-log.html#lab-4-an-app-with-logging", + "objectID": "chapters/sec1/1-4-monitor-log.html#lab-an-app-with-logging", + "href": "chapters/sec1/1-4-monitor-log.html#lab-an-app-with-logging", "title": "4  Logging and Monitoring", - "section": "4.5 Lab 4: An App with Logging", - "text": "4.5 Lab 4: An App with Logging\nLet’s return to the last lab’s prediction generator app and add a little logging. This is easy in both R and Python. We declare that we’re using the logger and then put logging statements into our code.\nI decided to log when the app starts, just before and after each request, and an error logger if an HTTP error code comes back from the API.\nWith the logging now added, here’s what the app looks like in R:\n\n\napp.R\n\nlibrary(shiny)\n\napi_url <- \"http://127.0.0.1:8080/predict\"\nlog <- log4r::logger()\n\nui <- fluidPage(\n titlePanel(\"Penguin Mass Predictor\"),\n\n # Model input values\n sidebarLayout(\n sidebarPanel(\n sliderInput(\n \"bill_length\",\n \"Bill Length (mm)\",\n min = 30,\n max = 60,\n value = 45,\n step = 0.1\n ),\n selectInput(\n \"sex\",\n \"Sex\",\n c(\"Male\", \"Female\")\n ),\n selectInput(\n \"species\",\n \"Species\",\n c(\"Adelie\", \"Chinstrap\", \"Gentoo\")\n ),\n # Get model predictions\n actionButton(\n \"predict\",\n \"Predict\"\n )\n ),\n\n mainPanel(\n h2(\"Penguin Parameters\"),\n verbatimTextOutput(\"vals\"),\n h2(\"Predicted Penguin Mass (g)\"),\n textOutput(\"pred\")\n )\n )\n)\n\nserver <- function(input, output) {\n log4r::info(log, \"App Started\")\n # Input params\n vals <- reactive(\n list(\n bill_length_mm = input$bill_length,\n species_Chinstrap = input$species == \"Chinstrap\",\n species_Gentoo = input$species == \"Gentoo\",\n sex_male = input$sex == \"Male\"\n )\n )\n\n # Fetch prediction from API\n pred <- eventReactive(\n input$predict,\n {\n log4r::info(log, \"Prediction Requested\")\n r <- httr2::request(api_url) |>\n httr2::req_body_json(vals()) |>\n httr2::req_perform()\n log4r::info(log, \"Prediction Returned\")\n\n if (httr2::resp_is_error(r)) {\n log4r::error(log, paste(\"HTTP Error\"))\n }\n\n httr2::resp_body_json(r)\n },\n ignoreInit = TRUE\n )\n\n # Render to UI\n output$pred <- renderText(pred()$predict[[1]])\n output$vals <- renderPrint(vals())\n}\n\n# Run the application\nshinyApp(ui = ui, server = server)\n\nAnd in Python:\n\n\napp.py\n\nfrom shiny import App, render, ui, reactive\nimport requests\nimport logging\n\napi_url = 'http://127.0.0.1:8080/predict'\nlogging.basicConfig(\n format='%(asctime)s - %(message)s',\n level=logging.INFO\n)\n\napp_ui = ui.page_fluid(\n ui.panel_title(\"Penguin Mass Predictor\"), \n ui.layout_sidebar(\n ui.panel_sidebar(\n [ui.input_slider(\"bill_length\", \"Bill Length (mm)\", 30, 60, 45, step = 0.1),\n ui.input_select(\"sex\", \"Sex\", [\"Male\", \"Female\"]),\n ui.input_select(\"species\", \"Species\", [\"Adelie\", \"Chinstrap\", \"Gentoo\"]),\n ui.input_action_button(\"predict\", \"Predict\")]\n ),\n ui.panel_main(\n ui.h2(\"Penguin Parameters\"),\n ui.output_text_verbatim(\"vals_out\"),\n ui.h2(\"Predicted Penguin Mass (g)\"), \n ui.output_text(\"pred_out\")\n )\n ) \n)\n\ndef server(input, output, session):\n logging.info(\"App start\")\n\n @reactive.Calc\n def vals():\n d = {\n \"bill_length_mm\" : input.bill_length(),\n \"sex_Male\" : input.sex() == \"Male\",\n \"species_Gentoo\" : input.species() == \"Gentoo\", \n \"species_Chinstrap\" : input.species() == \"Chinstrap\"\n\n }\n return d\n \n @reactive.Calc\n @reactive.event(input.predict)\n def pred():\n logging.info(\"Request Made\")\n r = requests.post(api_url, json = vals())\n logging.info(\"Request Returned\")\n\n if r.status_code != 200:\n logging.error(\"HTTP error returned\")\n\n return r.json().get('predict')[0]\n\n @output\n @render.text\n def vals_out():\n return f\"{vals()}\"\n\n @output\n @render.text\n def pred_out():\n return f\"{round(pred())}\"\n\napp = App(app_ui, server)\n\nNow, if you load up this app locally, you can see the logs of what’s happening stream in as you press buttons.\nYou can feel free to log whatever you think is helpful. For example, getting the actual error contents would probably be helpful if an HTTP error comes back." + "section": "4.5 Lab: An App with Logging", + "text": "4.5 Lab: An App with Logging\nLet’s return to the last lab’s prediction generator app and add a little logging. This is easy in both R and Python. We declare that we’re using the logger and then put logging statements into our code.\nI decided to log when the app starts, just before and after each request, and an error logger if an HTTP error code comes back from the API.\nWith the logging now added, here’s what the app looks like in R:\n\n\napp.R\n\nlibrary(shiny)\n\napi_url <- \"http://127.0.0.1:8080/predict\"\nlog <- log4r::logger()\n\nui <- fluidPage(\n titlePanel(\"Penguin Mass Predictor\"),\n\n # Model input values\n sidebarLayout(\n sidebarPanel(\n sliderInput(\n \"bill_length\",\n \"Bill Length (mm)\",\n min = 30,\n max = 60,\n value = 45,\n step = 0.1\n ),\n selectInput(\n \"sex\",\n \"Sex\",\n c(\"Male\", \"Female\")\n ),\n selectInput(\n \"species\",\n \"Species\",\n c(\"Adelie\", \"Chinstrap\", \"Gentoo\")\n ),\n # Get model predictions\n actionButton(\n \"predict\",\n \"Predict\"\n )\n ),\n\n mainPanel(\n h2(\"Penguin Parameters\"),\n verbatimTextOutput(\"vals\"),\n h2(\"Predicted Penguin Mass (g)\"),\n textOutput(\"pred\")\n )\n )\n)\n\nserver <- function(input, output) {\n log4r::info(log, \"App Started\")\n # Input params\n vals <- reactive(\n list(\n bill_length_mm = input$bill_length,\n species_Chinstrap = input$species == \"Chinstrap\",\n species_Gentoo = input$species == \"Gentoo\",\n sex_male = input$sex == \"Male\"\n )\n )\n\n # Fetch prediction from API\n pred <- eventReactive(\n input$predict,\n {\n log4r::info(log, \"Prediction Requested\")\n r <- httr2::request(api_url) |>\n httr2::req_body_json(vals()) |>\n httr2::req_perform()\n log4r::info(log, \"Prediction Returned\")\n\n if (httr2::resp_is_error(r)) {\n log4r::error(log, paste(\"HTTP Error\"))\n }\n\n httr2::resp_body_json(r)\n },\n ignoreInit = TRUE\n )\n\n # Render to UI\n output$pred <- renderText(pred()$predict[[1]])\n output$vals <- renderPrint(vals())\n}\n\n# Run the application\nshinyApp(ui = ui, server = server)\n\nAnd in Python:\n\n\napp.py\n\nfrom shiny import App, render, ui, reactive\nimport requests\nimport logging\n\napi_url = 'http://127.0.0.1:8080/predict'\nlogging.basicConfig(\n format='%(asctime)s - %(message)s',\n level=logging.INFO\n)\n\napp_ui = ui.page_fluid(\n ui.panel_title(\"Penguin Mass Predictor\"), \n ui.layout_sidebar(\n ui.panel_sidebar(\n [ui.input_slider(\"bill_length\", \"Bill Length (mm)\", 30, 60, 45, step = 0.1),\n ui.input_select(\"sex\", \"Sex\", [\"Male\", \"Female\"]),\n ui.input_select(\"species\", \"Species\", [\"Adelie\", \"Chinstrap\", \"Gentoo\"]),\n ui.input_action_button(\"predict\", \"Predict\")]\n ),\n ui.panel_main(\n ui.h2(\"Penguin Parameters\"),\n ui.output_text_verbatim(\"vals_out\"),\n ui.h2(\"Predicted Penguin Mass (g)\"), \n ui.output_text(\"pred_out\")\n )\n ) \n)\n\ndef server(input, output, session):\n logging.info(\"App start\")\n\n @reactive.Calc\n def vals():\n d = {\n \"bill_length_mm\" : input.bill_length(),\n \"sex_Male\" : input.sex() == \"Male\",\n \"species_Gentoo\" : input.species() == \"Gentoo\", \n \"species_Chinstrap\" : input.species() == \"Chinstrap\"\n\n }\n return d\n \n @reactive.Calc\n @reactive.event(input.predict)\n def pred():\n logging.info(\"Request Made\")\n r = requests.post(api_url, json = vals())\n logging.info(\"Request Returned\")\n\n if r.status_code != 200:\n logging.error(\"HTTP error returned\")\n\n return r.json().get('predict')[0]\n\n @output\n @render.text\n def vals_out():\n return f\"{vals()}\"\n\n @output\n @render.text\n def pred_out():\n return f\"{round(pred())}\"\n\napp = App(app_ui, server)\n\nNow, if you load up this app locally, you can see the logs of what’s happening stream in as you press buttons.\nYou can feel free to log whatever you think is helpful. For example, getting the actual error contents would probably be helpful if an HTTP error comes back." }, { "objectID": "chapters/sec1/1-4-monitor-log.html#footnotes", @@ -346,8 +346,8 @@ "objectID": "chapters/sec1/1-5-deployments.html#lab5", "href": "chapters/sec1/1-5-deployments.html#lab5", "title": "5  Deployments and code promotion", - "section": "5.5 Lab 5: Host a website with automatic updates", - "text": "5.5 Lab 5: Host a website with automatic updates\nIn labs 1 through 4, you’ve created a Quarto website for the penguin model. You’ve got sections on EDA and model building. But it’s still just on your computer.\nIn this lab, we will deploy that website to a public site on GitHub and set up GitHub Actions as CI/CD so the EDA and modeling steps re-render every time we make changes.\nBefore we get into the meat of the lab, there are a few things you need to do on your own. If you don’t know how, there are plenty of great tutorials online.\n\nCreate an empty public repo on GitHub.\nConfigure the repo as the remote for your Quarto project directory.\n\nOnce you’ve connected the GitHub repo to your project, you will set up the Quarto project to publish via GitHub Actions. There are great directions on configuring that on the Quarto website.\nFollowing those instructions will accomplish three things for you:\n\nGenerate a _publish.yml, which is a Quarto-specific file for configuring publishing locations.\nConfigure GitHub Pages to serve your website off a long-running standalone branch called gh-pages.\nGenerate a GitHub Actions workflow file, which will live at .github/workflows/publish.yml.\n\nHere’s the basic GitHub Actions file (or close to it) that the process will auto-generate for you.\n\n\n.github/workflows/publish.yml\n\non:\n workflow_dispatch:\n push:\n branches: main\n\nname: Quarto Publish\n\njobs:\n build-deploy:\n runs-on: ubuntu-latest\n permissions:\n contents: write\n steps:\n - name: Check out repository\n uses: actions/checkout@v2\n\n - name: Set up Quarto\n uses: quarto-dev/quarto-actions/setup@v2\n\n - name: Render and Publish\n uses: quarto-dev/quarto-actions/publish@v2\n with:\n target: gh-pages\n env:\n GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n\nLike all GitHub Actions, this action is defined in a .yml file in the .github/workflows directory of a project. It contains several sections:\n\nThe on section defines when the workflow occurs. In this case, we’ve configured the workflow only to trigger on a push to the main branch.2 Another common case would be to trigger on a pull request to main or another branch.\nThe jobs section defines what happens in steps, that occur in sequential order.\n\nThe runs-on field specifies which runner to use. A runner is a virtual machine that is started from scratch every time the action runs. GitHub Actions offers runners with Ubuntu, Windows, and MacOS. You can also add custom runners.\nMost steps are defined with uses, which calls a preexisting GitHub Actions step that someone else has written.\nYou can inject variables using with and environment variables using env.\n\n\nIf you try to run this, it probably won’t work.\nThe CI/CD process occurs in a completely isolated environment. This auto-generated action doesn’t include setting up versions of R and Python or the packages to run our EDA and modeling scripts. We need to get that configured before this action will work.\n\n\n\n\n\n\nNote\n\n\n\nIf you read the Quarto documentation, they recommend freezing your computations. Freezing is useful if you want to render your R or Python code only once and update only the text of your document. You wouldn’t need to set up R or Python in CI/CD, and the document would render faster.\nThat said, freezing isn’t an option if you intend the CI/CD environment to re-run the R or Python code.\nBecause the main point here is to learn about getting environments as code working in CI/CD you should not freeze your environment.\n\n\nFirst, add the commands to install R, {renv}, and the packages for your content to the GitHub Actions workflow.\n\n\n.github/workflows/publish.yml\n\n - name: Install R\n uses: r-lib/actions/setup-r@v2\n with:\n r-version: '4.2.0'\n use-public-rspm: true\n\n - name: Setup renv and install packages\n uses: r-lib/actions/setup-renv@v2\n with:\n cache-version: 1\n env:\n RENV_CONFIG_REPOS_OVERRIDE: https://packagemanager.rstudio.com/all/latest\n\n\n\n\n\n\n\nNote\n\n\n\nIf you’re having slow package installs in CI/CD for R, I’d strongly recommend using a repos override like in the example above.\nThe issue is that CRAN doesn’t serve binary packages for Linux, which means slow installs. You need to direct {renv} to install from Public Posit Package Manager, which does have Linux binaries.\n\n\nYou’ll also need to add a workflow to GitHub Actions to install Python and the necessary Python packages from the requirements.txt.\n\n\n.github/workflows/publish.yml\n\n - name: Install Python and Dependencies\n uses: actions/setup-python@v4\n with:\n python-version: '3.10'\n cache: 'pip'\n - run: pip install jupyter\n - run: pip install -r requirements.txt\n\nNote that, we run the Python environment restore commands with run rather than uses. Where uses takes an existing GitHub Action and runs it, run executes the shell command natively.\nOnce you’ve made those changes, try pushing or merging your project to main. If you click on the Actions tab on GitHub you’ll be able to see the Action running.\nIn all honesty, it will probably fail the first time or five. You will rarely get your Actions correct on the first try. Breathe deeply and know we’ve all been there. You’ll figure it out.\nOnce it’s up, your website will be available at https://<username>.github.io/<repo-name>." + "section": "5.5 Lab: Host a website with automatic updates", + "text": "5.5 Lab: Host a website with automatic updates\nIn labs 1 through 4, you’ve created a Quarto website for the penguin model. You’ve got sections on EDA and model building. But it’s still just on your computer.\nIn this lab, we will deploy that website to a public site on GitHub and set up GitHub Actions as CI/CD so the EDA and modeling steps re-render every time we make changes.\nBefore we get into the meat of the lab, there are a few things you need to do on your own. If you don’t know how, there are plenty of great tutorials online.\n\nCreate an empty public repo on GitHub.\nConfigure the repo as the remote for your Quarto project directory.\n\nOnce you’ve connected the GitHub repo to your project, you will set up the Quarto project to publish via GitHub Actions. There are great directions on configuring that on the Quarto website.\nFollowing those instructions will accomplish three things for you:\n\nGenerate a _publish.yml, which is a Quarto-specific file for configuring publishing locations.\nConfigure GitHub Pages to serve your website off a long-running standalone branch called gh-pages.\nGenerate a GitHub Actions workflow file, which will live at .github/workflows/publish.yml.\n\nHere’s the basic GitHub Actions file (or close to it) that the process will auto-generate for you.\n\n\n.github/workflows/publish.yml\n\non:\n workflow_dispatch:\n push:\n branches: main\n\nname: Quarto Publish\n\njobs:\n build-deploy:\n runs-on: ubuntu-latest\n permissions:\n contents: write\n steps:\n - name: Check out repository\n uses: actions/checkout@v2\n\n - name: Set up Quarto\n uses: quarto-dev/quarto-actions/setup@v2\n\n - name: Render and Publish\n uses: quarto-dev/quarto-actions/publish@v2\n with:\n target: gh-pages\n env:\n GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n\nLike all GitHub Actions, this action is defined in a .yml file in the .github/workflows directory of a project. It contains several sections:\n\nThe on section defines when the workflow occurs. In this case, we’ve configured the workflow only to trigger on a push to the main branch.2 Another common case would be to trigger on a pull request to main or another branch.\nThe jobs section defines what happens in steps, that occur in sequential order.\n\nThe runs-on field specifies which runner to use. A runner is a virtual machine that is started from scratch every time the action runs. GitHub Actions offers runners with Ubuntu, Windows, and MacOS. You can also add custom runners.\nMost steps are defined with uses, which calls a preexisting GitHub Actions step that someone else has written.\nYou can inject variables using with and environment variables using env.\n\n\nIf you try to run this, it probably won’t work.\nThe CI/CD process occurs in a completely isolated environment. This auto-generated action doesn’t include setting up versions of R and Python or the packages to run our EDA and modeling scripts. We need to get that configured before this action will work.\n\n\n\n\n\n\nNote\n\n\n\nIf you read the Quarto documentation, they recommend freezing your computations. Freezing is useful if you want to render your R or Python code only once and update only the text of your document. You wouldn’t need to set up R or Python in CI/CD, and the document would render faster.\nThat said, freezing isn’t an option if you intend the CI/CD environment to re-run the R or Python code.\nBecause the main point here is to learn about getting environments as code working in CI/CD you should not freeze your environment.\n\n\nFirst, add the commands to install R, {renv}, and the packages for your content to the GitHub Actions workflow.\n\n\n.github/workflows/publish.yml\n\n - name: Install R\n uses: r-lib/actions/setup-r@v2\n with:\n r-version: '4.2.0'\n use-public-rspm: true\n\n - name: Setup renv and install packages\n uses: r-lib/actions/setup-renv@v2\n with:\n cache-version: 1\n env:\n RENV_CONFIG_REPOS_OVERRIDE: https://packagemanager.rstudio.com/all/latest\n\n\n\n\n\n\n\nNote\n\n\n\nIf you’re having slow package installs in CI/CD for R, I’d strongly recommend using a repos override like in the example above.\nThe issue is that CRAN doesn’t serve binary packages for Linux, which means slow installs. You need to direct {renv} to install from Public Posit Package Manager, which does have Linux binaries.\n\n\nYou’ll also need to add a workflow to GitHub Actions to install Python and the necessary Python packages from the requirements.txt.\n\n\n.github/workflows/publish.yml\n\n - name: Install Python and Dependencies\n uses: actions/setup-python@v4\n with:\n python-version: '3.10'\n cache: 'pip'\n - run: pip install jupyter\n - run: pip install -r requirements.txt\n\nNote that, we run the Python environment restore commands with run rather than uses. Where uses takes an existing GitHub Action and runs it, run executes the shell command natively.\nOnce you’ve made those changes, try pushing or merging your project to main. If you click on the Actions tab on GitHub you’ll be able to see the Action running.\nIn all honesty, it will probably fail the first time or five. You will rarely get your Actions correct on the first try. Breathe deeply and know we’ve all been there. You’ll figure it out.\nOnce it’s up, your website will be available at https://<username>.github.io/<repo-name>." }, { "objectID": "chapters/sec1/1-5-deployments.html#footnotes", @@ -1208,69 +1208,69 @@ "href": "chapters/append/lab-map.html", "title": "Appendix C — Lab Map", "section": "", - "text": "This section aims to clarify the relationship between the assets you’ll make in each portfolio exercise and labs in this book.\n\n\n\n\n\n\n\nChapter\nLab Activity\n\n\n\n\nChapter 1: Environments as Code\nCreate a Quarto side that uses {renv} and {venv} to create standalone R and Python virtual environments, create a page on the website for each.\n\n\nChapter 3: Data Architecture\nMove data into a DuckDB database.\n\n\nChapter 2: Project Architecture\nCreate an API that serves a Python machine-learning model using {vetiver} and {fastAPI}. Call that API from a Shiny App in both R and Python.\n\n\nChapter 4: Logging and Monitoring\nAdd logging to the app from Chapter 2.\n\n\nChapter 5: Code Promotion\nPut a static Quarto site up on GitHub Pages using GitHub Actions that renders the project.\n\n\nChapter 6: Docker\nPut API from Chapter 2 into Docker Container.\n\n\nChapter 7: Cloud\nStand up an EC2 instance. Put model into S3.\n\n\nChapter 9: Linux Admin\nAdd R, Python, RStudio Server, JupyterHub, palmer penguin fastAPI + App.\n\n\nChapter 12: Networking\nAdd proxy (nginx) to reach all services from the web.\n\n\nChapter 13: DNS\nAdd a real URL to the EC2 instance. Put the Shiny app into an iFrame on the site.\n\n\nChapter 14: SSL\nAdd SSL/HTTPS to the EC2 instance.\n\n\nChapter 11: Servers\nResize servers." + "text": "This section aims to clarify the relationship between the assets you’ll make in each portfolio exercise and labs in this book.\n\n\n\n\n\n\n\nChapter\nLab Activity\n\n\n\n\nChapter 1: Environments as Code\nCreate a Quarto site that uses {renv} and {venv} to create standalone R and Python virtual environments. Add an R EDA page and Python modeling.\n\n\nChapter 2: Project Architecture\nCreate an API that serves a Python machine-learning model using {vetiver} and {fastAPI}. Call that API from a Shiny App in both R and Python.\n\n\nChapter 3: Data Architecture\nMove data into a DuckDB database and serve model predictions from an API.\n\n\nChapter 4: Logging and Monitoring\nAdd logging to the app from Chapter 2.\n\n\nChapter 5: Deployments\nPut a static Quarto site up on GitHub Pages using GitHub Actions that renders the project.\n\n\nChapter 6: Docker\nPut API from Chapter 2 into Docker Container.\n\n\nChapter 7: Cloud\nStand up an EC2 instance.\nPut the model into S3.\n\n\nChapter 8: Command Line\nLog into the server with .pem key and create SSH key.\n\n\nChapter 9: Linux Admin\nCreate a user on the server and add SSH key.\n\n\nChapter 10: Application Admin\nAdd R, Python, RStudio Server, JupyterHub, API, and App to EC2 instance from Chapter 7.\n\n\nChapter 11: Scaling\nResize the server.\n\n\nChapter 12: Networking\nAdd proxy (NGINX) to reach all services from the web.\n\n\nChapter 13: DNS\nAdd a URL to the EC2 instance. Put the Shiny app into an iFrame on the Quarto site.\n\n\nChapter 14: SSL\nAdd SSL/HTTPS to the EC2 instance." }, { "objectID": "chapters/append/cheatsheets.html#environments-as-code", "href": "chapters/append/cheatsheets.html#environments-as-code", "title": "Appendix D — Cheatsheets", - "section": "D.1 Environments as Code", - "text": "D.1 Environments as Code\n\nD.1.1 Checking library + repository status\n\n\n\n\n\n\n\n\nStep\nR Command\nPython Command\n\n\n\n\nCheck whether library in sync with lockfile.\nrenv::status()\nNone\n\n\n\n\n\nD.1.2 Creating and Using a Standalone Project Library\nMake sure you’re in a standalone project library.\n\n\n\n\n\n\n\n\nStep\nR Command\nPython Command\n\n\nCreate a standalone library.\nTip: Make sure you’ve got {renv}/{venv}: in stall.packages(\"renv\") {venv} included w/ Python 3.5+\nrenv::init()\npython -m venv <dir>\nConvention: use.venv for <dir>\n\n\nActivate project library.\nrenv::activate()\nHappens automatically if in RStudio project.\nsource <dir> /bin/activate\n\n\nInstall packages as normal.\ninstall.p ackages(\"<pkg>\")\npython -m pip install <pkg>\n\n\nSnapshot package state.\nrenv::snapshot()\npip freez e > requirements.txt\n\n\nExit project environment.\nLeave R project or re nv::deactivate()\ndeactivate\n\n\n\n\n\nD.1.3 Collaborating on someone else’s project\nStart by downloading the project into a directory on your machine.\n\n\n\n\n\n\n\n\nStep\nR Command\nPython Command\n\n\nMove into project directory.\nset wd (\"< project-dir>\")\nOr open project in RStudio.\ncd <project-dir>\n\n\nCreate project environment.\nrenv::init()\npython -m venv <dir>\nRecommend: use .venv for <dir>\n\n\nEnter project environment.\nHappens automatically or re nv::activate()\nsource <dir> /bin/activate\n\n\nRestore packages.\nHappens automatically or r env::restore()\npi p install -r requirements.txt" + "section": "D.1 Environments as code", + "text": "D.1 Environments as code\n\nD.1.1 Checking library + repository status\n\n\n\n\n\n\n\n\nStep\nR Command\nPython Command\n\n\n\n\nCheck whether library is in sync with lockfile.\nre nv::status()\nNone\n\n\n\n\n\nD.1.2 Creating and using a standalone project library\nMake sure you’re in a standalone project library.\n\n\n\n\n\n\n\n\n\n\n\nStep\n\n\n\n\nR Command\n\n\n\n\nPython Command\n\n\n\n\n\n\nCreate a standalone library.\n\n\n\n\nrenv::init()\n\n\nTip: get {renv} w/ install.p ackages(“renv”)\n\n\n\n\np ython -m venv <dir>\n\n\nConvention: use.venv for <dir>\n\n\nTip: {venv} included w/ Python 3.5+\n\n\n\n\n\n\nActivate project library.\n\n\n\n\nr env::activate()\n\n\nHappens automatically if in RStudio project.\n\n\n\n\nsource <dir> /bin/activate\n\n\n\n\n\n\nInstall packages as normal.\n\n\n\n\ninstall.pa ckages(“<pkg>”)\n\n\n\n\npython - m pip install <pkg>\n\n\n\n\n\n\nSnapshot package state.\n\n\n\n\nr env::snapshot()\n\n\n\n\npip freeze > requirements.txt\n\n\n\n\n\n\nExit project environment.\n\n\n\n\nLeave R project or re n v::deactivate()\n\n\n\n\ndeactivate\n\n\n\n\n\n\n\n\nD.1.3 Collaborating on someone else’s project\nStart by downloading the project into a directory on your machine.\n\n\n\n\n\n\n\n\n\n\n\nStep\n\n\n\n\nR Command\n\n\n\n\nPython Command\n\n\n\n\n\n\nMove into project directory.\n\n\n\n\nset wd (“< p roject-dir>”)\n\n\nOr open project in RStudio.\n\n\n\n\ncd <project-dir>\n\n\n\n\n\n\nCreate project environment.\n\n\n\n\nrenv::init()\n\n\n\n\npython -m venv <dir>\n\n\nRecommend: use .venv for <dir>\n\n\n\n\n\n\nEnter project environment.\n\n\n\n\nHappens automatically or ren v::activate()\n\n\n\n\nsource <dir> /bin/activate\n\n\n\n\n\n\nRestore packages.\n\n\n\n\nHappens automatically or re nv::restore()\n\n\n\n\npip install -r requirements.txt" }, { "objectID": "chapters/append/cheatsheets.html#cheat-http", "href": "chapters/append/cheatsheets.html#cheat-http", "title": "Appendix D — Cheatsheets", - "section": "D.2 HTTP Code Cheatsheet", - "text": "D.2 HTTP Code Cheatsheet\nAs you work more with HTTP traffic, you’ll learn some of the common codes. Here’s a cheatsheet for some of the most frequent you’ll see.\n\n\n\n\n\n\n\nCode\nMeaning\n\n\n\n\n200\nEveryone’s favorite, a successful response.\n\n\n3xx\nYour query was redirected somewhere else, usually ok.\n\n\n4xx\nErrors with the request\n\n\n400\nBad request. This isn’t a request the server can understand.\n\n\n401 and 403\nUnauthorized or forbidden. Required authentication hasn’t been provided.\n\n\n404\nNot found. There isn’t any content to access here.\n\n\n5xx\nErrors with the server once your request got there.\n\n\n500\nGeneric server-side error. Your request was received, but there was an error processing it.\n\n\n504\nGateway timeout. This means that a proxy or gateway between you and the server you’re trying to access timed out before it got a response from the server." + "section": "D.2 HTTP code cheatsheet", + "text": "D.2 HTTP code cheatsheet\nAs you work with HTTP traffic, you’ll learn some of the common codes. Here’s are some of those used most frequently.\n\n\n\n\n\n\n\nCode\nMeaning\n\n\n\n\n\\(200\\)\nEveryone’s favorite, a successful response.\n\n\n\\(\\text{3xx}\\)\nYour query was redirected somewhere else, usually ok.\n\n\n\\(\\text{4xx}\\)\nErrors with the request\n\n\n\\(400\\)\nBad request. This isn’t a request the server can understand.\n\n\n\\(401\\)/\\(403\\)\nUnauthorized or forbidden. Required authentication hasn’t been provided.\n\n\n\\(404\\)\nNot found. There isn’t any content to access here.\n\n\n\\(\\text{5xx}\\)\nErrors with the server once your request got there.\n\n\n\\(500\\)\nGeneric server-side error. Your request was received, but there was an error processing it.\n\n\n\\(504\\)\nGateway timeout. This means that a proxy or gateway between you and the server you’re trying to access timed out before it got a response from the server." }, { "objectID": "chapters/append/cheatsheets.html#cheat-git", "href": "chapters/append/cheatsheets.html#cheat-git", "title": "Appendix D — Cheatsheets", "section": "D.3 Git", - "text": "D.3 Git\n\n\n\n\n\n\n\nCommand\nWhat it does\n\n\ngit clone <remote>\nClone a remote repo – make sure you’re using SSH URL.\n\n\ngit add <files/dir>\nAdd files/dir to staging area.\n\n\ngit commit -m <message>\nCommit staging area.\n\n\ngit p ush origin <branch>\nPush to a remote.\n\n\ngit p ull origin <branch>\nPull from a remote.\n\n\ngit che ckout <branch name>\nCheckout a branch.\n\n\ngit checko ut -b <branch name>\nCreate and checkout a branch.\n\n\ngit bran ch -d <branch name>\nDelete a branch." + "text": "D.3 Git\n\n\n\n\n\n\n\nCommand\nWhat it does\n\n\ngit clone <remote>\nClone a remote repo – make sure you’re using SSH URL.\n\n\ngit add <files/dir>\nAdd files/directory to staging area.\n\n\ngit commit -m <message>\nCommit staging area.\n\n\ngit push origin <branch>\nPush to a remote.\n\n\ngit pull origin <branch>\nPull from a remote.\n\n\ngit checkout <branch name>\nCheckout a branch.\n\n\ngit checkout -b <branch name>\nCreate and checkout a branch.\n\n\ngit branch -d <branch name>\nDelete a branch." }, { "objectID": "chapters/append/cheatsheets.html#cheat-docker", "href": "chapters/append/cheatsheets.html#cheat-docker", "title": "Appendix D — Cheatsheets", "section": "D.4 Docker", - "text": "D.4 Docker\n\nD.4.1 Docker CLI Commands\n\n\n\n\n\n\n\n\n\nS ta ge\nCommand\nWhat it does\nNotes and helpful options\n\n\nB ui ld\ndocker b uild <directory>\nBuilds a directory into an image.\n-t <name:tag> provides a name to the container.\ntag is optional, defaults to latest.\n\n\nMo ve\ndoc ker push <image>\nPush a container to a registry.\n\n\n\nMo ve\ndoc ker pull <image>\nPull a container from a registry.\nRarely needed because run pulls the container if needed.\n\n\nR un\ndo cker run <image>\nRun a container.\nSee flags in next table.\n\n\nR un\ndocker stop <container>\nStop a running container.\ndocker kill can be used if stop fails.\n\n\nR un\ndocker ps\nList running containers.\nUseful to get container id to do things to it.\n\n\nR un\ndocker exec <cont ainer> <command>\nRun a command inside a running container.\nBasically always used to open a shell with docker exec -it <container> /bin/bash\n\n\nR un\ndocker logs <container>\nViews logs for a container.\n\n\n\n\n\n\nD.4.2 Flags for docker run\n\n\n\n\n\n\n\n\nFlag\nEffect\nNotes\n\n\n--n ame <name>\nGive a name to container.\nOptional. Auto-assigned if not provided\n\n\n--rm\nRemove container when its stopped.\nDon’t use in production. You probably want to inspect failed containers.\n\n\n-d\nDetach container (don’t block the terminal).\nAlmost always used in production.\n\n\n-p <po rt>:<port>\nPublish port from inside running inside container to outside.\nNeeded if you want to access an app or API inside the container.\n\n\n-v< dir>:<dir>\nMount volume into the container.\n\n\n\n\nReminder: Order for -p and -v is <host>:<container>\n\n\nD.4.3 Dockerfile Commands\nThese are the commands that go in a Dockerfile when you’re building it.\n\n\n\n\n\n\n\n\nCommand\nPurpose\nExample\n\n\nFROM\nIndicate base container.\nFROM rocker/r-ver:4.1.0\n\n\nRUN\nRun a command when building.\nRUN apt-get update\n\n\nCOPY\nCopy from build directory into the container.\nCOPY . /app/\n\n\nCMD\nSpecify the command to run when the container starts.\nCMD quarto render ." + "text": "D.4 Docker\n\nD.4.1 Docker CLI commands\n\n\n\n\n\n\n\n\n\n\n\n\nStage\n\n\n\n\nCommand\n\n\n\n\nWhat it does\n\n\n\n\nNotes and helpful options\n\n\n\n\n\n\nBuild\n\n\n\n\ndocker build <directory>\n\n\n\n\nBuilds a directory into an image.\n\n\n\n\n-t <name:tag> provides a name to the container.\n\n\ntag is optional, defaults to latest.\n\n\n\n\n\n\nMove\n\n\n\n\ndocker push <image>\n\n\n\n\nPush a container to a registry.\n\n\n\n\n\n\n\n\nMove\n\n\n\n\ndocker pull <image>\n\n\n\n\nPull a container from a registry.\n\n\n\n\nRarely needed because run pulls the container if needed.\n\n\n\n\n\n\nRun\n\n\n\n\ndocker run <image>\n\n\n\n\nRun a container.\n\n\n\n\nSee flags in next table.\n\n\n\n\n\n\nRun\n\n\n\n\ndocker stop <container>\n\n\n\n\nStop a running container.\n\n\n\n\ndocker kill can be used if stop fails.\n\n\n\n\n\n\nRun\n\n\n\n\ndocker ps\n\n\n\n\nList running containers.\n\n\n\n\nUseful to get container id to do things to it.\n\n\n\n\n\n\nRun\n\n\n\n\ndocker exec <cont aine r> <command>\n\n\n\n\nRun a command inside a running container.\n\n\n\n\nBasically always used to open a shell with d ocker exec -it <co ntainer> /bin/bash\n\n\n\n\n\n\nRun\n\n\n\n\ndocker logs <container>\n\n\n\n\nViews logs for a container.\n\n\n\n\n\n\n\n\n\n\nD.4.2 Flags for docker run\n\n\n\n\n\n\n\n\nFlag\nEffect\nNotes\n\n\n--name <name>\nGive a name to container.\nOptional. Auto-assigned if not provided\n\n\n--rm\nRemove container when its stopped.\nDon’t use in production. You probably want to inspect failed containers.\n\n\n-d\nDetach container (don’t block the terminal).\nAlmost always used in production.\n\n\n-p <port>:<port>\nPublish port from inside running container to outside.\nNeeded if you want to access an app or API inside the container.\n\n\n-v <dir>:<dir>\nMount volume into the container.\n\n\n\n\nReminder: Order for -p and -v is <host>:<container>\n\n\nD.4.3 Dockerfile commands\nThese are the commands that go in a Dockerfile when you’re building it.\n\n\n\n\n\n\n\n\nCommand\nPurpose\nExample\n\n\nFROM\nIndicate base container.\nFROM rocker/r-ver:4.1.0\n\n\nRUN\nRun a command when building.\nRUN apt-get update\n\n\nCOPY\nCopy from build directory into the container.\nCOPY . /app/\n\n\nCMD\nSpecify the command to run when the container starts.\nCMD quarto render ." }, { "objectID": "chapters/append/cheatsheets.html#cloud-services", "href": "chapters/append/cheatsheets.html#cloud-services", "title": "Appendix D — Cheatsheets", - "section": "D.5 Cloud Services", - "text": "D.5 Cloud Services\n\n\n\n\n\n\n\n\n\nService\nAWS\nAzure\nGCP\n\n\nKubernetes cluster\nEKS or Fargate\nAKS\nGKE\n\n\nRun a container or application\nECS or Elastic Beanstalk\nAzure Container Apps\nGoogle App Engine\n\n\nRun an API\nLambda\nAzure Functions\nGoogle Cloud Functions\n\n\nDatabase\nRDS\nAzure SQL\nGoogle Cloud Database\n\n\nData Warehouse\nRedshift\nDataLake\nBigQuery\n\n\nML Platform\nSageMaker\nAzure ML\nVertex AI\n\n\nNAS\nEFS or FSx\nAzure File\nFilestore" + "section": "D.5 Cloud services", + "text": "D.5 Cloud services\n\n\n\n\n\n\n\n\n\nService\nAWS\nAzure\nGCP\n\n\nKubernetes cluster\nEKS or Fargate\nAKS\nGKE\n\n\nRun a container or application\nECS or Elastic Beanstalk\nAzure Container Apps\nGoogle App Engine\n\n\nRun an API\nLambda\nAzure Functions\nGoogle Cloud Functions\n\n\nDatabase\nRDS\nAzure SQL\nGoogle Cloud Database\n\n\nData Warehouse\nRedshift\nDataLake\nBigQuery\n\n\nML Platform\nSageMaker\nAzure ML\nVertex AI\n\n\nNAS\nEFS or FSx\nAzure File\nFilestore" }, { "objectID": "chapters/append/cheatsheets.html#cheat-cli", "href": "chapters/append/cheatsheets.html#cheat-cli", "title": "Appendix D — Cheatsheets", - "section": "D.6 Command Line", - "text": "D.6 Command Line\n\nD.6.1 General Command Line\n\n\n\nSymbol\nWhat it is\n\n\n\n\nman <command>\nOpen manual for command\n\n\nq\nQuit the current screen\n\n\n\\\nContinue bash command on new line\n\n\nctrl + c\nQuit current execution\n\n\necho <string>\nPrint string (useful for piping)\n\n\n\n\n\nD.6.2 Linux Navigation\n\n\n\n\n\n\n\n\nCommand\nWhat it does/is\nNotes + Helpful options\n\n\n\n\n/\nSystem root or file path separator\n\n\n\n.\ncurrent working directory\n\n\n\n..\nParent of working directory\n\n\n\n~\nHome directory of the current user\n\n\n\nls <dir>\nList objects in a directory\n-l - format as list\n-a - all (include hidden files that start with .)\n\n\npwd\nPrint working directory\n\n\n\ncd <dir>\nChange directory\nCan use relative or absolute paths\n\n\n\n\n\nD.6.3 Reading Text Files\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nNotes + Helpful options\n\n\ncat <file>\nPrint a file from the top.\n\n\n\nless <file>\nPrint a file, but just a little.\nCan be very helpful to look at a few rows of csv.\nLazily reads lines, so can be much faster than cat for big files.\n\n\nhead <file>\nLook at the beginning of a file.\nDefaults to 10 lines, can specify a different number with -n <n>.\n\n\ntail <file>\nLook at the end of a file.\nUseful for logs where the newest part is last.\nThe -f flag is useful to follow for a live view.\n\n\ngre p <expression>\nSearch a file using regex.\nWriting regex can be a pain. I suggest testing on \\(\\text{regex101.com}\\).\nOften useful in combination with the pipe.\n\n\n|\nThe pipe\n\n\n\nwc <file>\nCount words in a file\nUse -l to count lines, useful for .csv files.\n\n\n\n\n\nD.6.4 Manipulating Files\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nNotes + Helpful Options\n\n\nrm <path>\nRemove\n-r - recursively remove everything below a file path\n-f - force - don’t ask for each file\nBe very careful, it’s permanent\n\n\n\n\n\n\n\nc p <from> <to>\nCopy\n\n\n\nm v <from> <to>\nMove\n\n\n\n*\nWildcard\n\n\n\nmkdir/rmdir\nMake/remove directory\n-p - create any parts of path that don’t exist\n\n\n\n\n\nD.6.5 Move things to/from server\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nNotes + Helpful Options\n\n\ntar\nCreate/extract archive file\nAlmost always used with flags.\nCreate is usually tar -czf <a rchive name> <file(s)>\nExtract is usually t ar -xfv <archive name>\n\n\nscp\nSecure copy via ssh\nRun on laptop to server\nCan use most ssh flags (like -i and -v)\n\n\n\n\n\nD.6.6 Write files from the command line\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nNotes\n\n\n\n\ntouch\nCreates file if doesn’t already exist.\nUpdates last updated to current time if it does exist.\n\n\n>\nOverwrite file contents\nCreates a new file if it doesn’t exist\n\n\n>>\nConcatenate to end of file\nCreates a new file if it doesn’t exist\n\n\n\n\n\nD.6.7 Command Line Text Editors (Vim + Nano)\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nNotes + Helpful options\n\n\n\n\n^\nPrefix for file command in nano editor.\nIts the ⌘ or Ctrl key, not the caret symbol.\n\n\ni\nEnter insert mode (able to type) in vim\n\n\n\nescape\nEnter normal mode (navigation) in vim.\n\n\n\n:w\nWrite the current file in vim (from normal mode)\nCan be combined to save and quit in one, :wq\n\n\n:q\nQuit vim (from normal mode)\n:q! quit without saving" + "section": "D.6 Command line", + "text": "D.6 Command line\n\nD.6.1 General command line\n\n\n\nSymbol\nWhat it is\n\n\n\n\nman <command>\nOpen manual for command.\n\n\nq\nQuit the current screen.\n\n\n\\\nContinue bash command on new line.\n\n\nctrl + c\nQuit current execution.\n\n\necho <string>\nPrint string (useful for piping).\n\n\n\n\n\nD.6.2 Linux filesystem navigation\n\n\n\n\n\n\n\n\n\n\n\nCommand\n\n\n\n\nWhat it does/is\n\n\n\n\nNotes + Helpful options\n\n\n\n\n\n\n\n\n/\n\n\n\n\nSystem root or file path separator.\n\n\n\n\n\n\n\n\n.\n\n\n\n\nCurrent working directory.\n\n\n\n\n\n\n\n\n..\n\n\n\n\nParent of working directory.\n\n\n\n\n\n\n\n\n~\n\n\n\n\nHome directory of the current user.\n\n\n\n\n\n\n\n\nls <dir>\n\n\n\n\nList objects in a directory.\n\n\n\n\n-l - format as list\n\n\n-a - all (include hidden files that start with .)\n\n\n\n\n\n\npwd\n\n\n\n\nPrint working directory.\n\n\n\n\n\n\n\n\ncd <dir>\n\n\n\n\nChange directory.\n\n\n\n\nCan use relative or absolute paths.\n\n\n\n\n\n\n\n\nD.6.3 Reading text files\n\n\n\n\n\n\n\n\n\n\n\nCommand\n\n\n\n\nWhat it does\n\n\n\n\nNotes + Helpful options\n\n\n\n\n\n\ncat <file>\n\n\n\n\nPrint a file from the top.\n\n\n\n\n\n\n\n\nless <file>\n\n\n\n\nPrint a file, but just a little.\n\n\n\n\nCan be very helpful to look at a few rows of csv.\n\n\nLazily reads lines, so can be much faster than cat for big files.\n\n\n\n\n\n\nhead <file>\n\n\n\n\nLook at the beginning of a file.\n\n\n\n\nDefaults to 10 lines, can specify a different number with -n <n>.\n\n\n\n\n\n\ntail <file>\n\n\n\n\nLook at the end of a file.\n\n\n\n\nUseful for logs where the newest part is last.\n\n\nThe -f flag is useful to follow for a live view.\n\n\n\n\n\n\ngrep <expression>\n\n\n\n\nSearch a file using regex.\n\n\n\n\nWriting regex can be a pain. I suggest testing on .\n\n\nOften useful in combination with the pipe.\n\n\n\n\n\n\n|\n\n\n\n\nThe pipe.\n\n\n\n\n\n\n\n\nwc <file>\n\n\n\n\nCount words in a file.\n\n\n\n\nUse -l to count lines, useful for .csv files.\n\n\n\n\n\n\n\n\nD.6.4 Manipulating files\n\n\n\n\n\n\n\n\n\n\n\nCommand\n\n\n\n\nWhat it does\n\n\n\n\nNotes + Helpful Options\n\n\n\n\n\n\nrm <path>\n\n\n\n\nRemove.\n\n\n\n\n-r - recursively remove everything below a file path\n\n\n-f - force - dont ask for each file\n\n\nBe very careful, its permanent!\n\n\n\n\n\n\ncp <from> <to>\n\n\n\n\nCopy.\n\n\n\n\n\n\n\n\nmv <from> <to>\n\n\n\n\nMove.\n\n\n\n\n\n\n\n\n*\n\n\n\n\nWildcard.\n\n\n\n\n\n\n\n\nmkdir/rmdir\n\n\n\n\nMake/remove directory.\n\n\n\n\n-p - create any parts of path that dont exist\n\n\n\n\n\n\n\n\nD.6.5 Move things to/from server\n\n\n\n\n\n\n\n\n\n\n\nCommand\n\n\n\n\nWhat it does\n\n\n\n\nNotes + Helpful Options\n\n\n\n\n\n\ntar\n\n\n\n\nCreate/extract archive file.\n\n\n\n\nAlmost always used with flags.\n\n\nCreate is usually tar -czf <archive name> <file(s)>\n\n\nExtract is usually tar -xfv <archive name>\n\n\n\n\n\n\nscp\n\n\n\n\nSecure copy via ssh.\n\n\n\n\nRun on laptop to server.\n\n\nCan use most ssh flags (like -i and -v).\n\n\n\n\n\n\n\n\nD.6.6 Write files from the command line\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nNotes\n\n\n\n\ntouch\nCreates file if doesn’t already exist.\nUpdates last updated to current time if it does exist.\n\n\n>\nOverwrite file contents.\nCreates a new file if it doesn’t exist.\n\n\n>>\nConcatenate to end of file.\nCreates a new file if it doesn’t exist.\n\n\n\n\n\nD.6.7 Command line text editors (Vim + Nano)\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nNotes + Helpful options\n\n\n\n\n^\nPrefix for file command in nano editor.\nIt’s the ⌘ or Ctrl key, not the caret symbol.\n\n\ni\nEnter insert mode (able to type) in vim.\n\n\n\nescape\nEnter normal mode (navigation) in vim.\n\n\n\n:w\nWrite the current file in vim (from normal mode).\nCan be combined to save and quit in one, :wq.\n\n\n:q\nQuit vim (from normal mode).\n:q! quit without saving." }, { "objectID": "chapters/append/cheatsheets.html#cheat-ssh", "href": "chapters/append/cheatsheets.html#cheat-ssh", "title": "Appendix D — Cheatsheets", - "section": "D.7 ssh", - "text": "D.7 ssh\nssh <user>@<host>\n\n\n\n\n\n\n\n\nFlag\nWhat it does\nNotes\n\n\n\n\n-v\nVerbose, good for debugging.\nAdd more vs as you please, -vv or -vvv.\n\n\n-i\nChoose identity file (private key)\nNot necessary with default key names." + "section": "D.7 SSH", + "text": "D.7 SSH\nGeneral usage:\nssh <user>@<host>\n\n\n\n\n\n\n\n\nFlag\nWhat it does\nNotes\n\n\n\n\n-v\nVerbose, good for debugging.\nAdd more vs as you please, -vv or -vvv.\n\n\n-i\nChoose identity file (private key).\nNot necessary with default key names." }, { "objectID": "chapters/append/cheatsheets.html#linux-admin", "href": "chapters/append/cheatsheets.html#linux-admin", "title": "Appendix D — Cheatsheets", - "section": "D.8 Linux Admin", - "text": "D.8 Linux Admin\n\nD.8.1 Users\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nHelpful options + notes\n\n\ns u <username>\nChange to be a different user.\n\n\n\nwhoami\nGet username of current user.\n\n\n\nid\nGet full user + group info on current user.\n\n\n\npasswd\nChange password.\n\n\n\nuseradd\nAdd a new user.\n\n\n\nusermo d <username>\nModify user username\n-aG <group> adds to a group (e.g. sudo)\n\n\n\n\n\nD.8.2 Permissions\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nHelpful options + notes\n\n\n\n\nchmod <permissions> <file>\nModifies permissions on a file or directory.\nNumber indicates permissions for user, group, others: add 4 for read, 2 for write, 1 for execute, 0 for nothing, e.g. 644.\n\n\nchow n <user/group> <file>\nChange the owner of a file or directory.\nCan be used for user or group, e.g. :my-group.\n\n\nsudo <command>\nAdopt root permissions for the following command.\n\n\n\n\n\n\nD.8.3 Install applications (Ubuntu)\n\n\n\n\n\n\n\nCommand\nWhat it does\n\n\napt-get u pdate && apt-get upgrade -y\nFetch and install upgrades to system packages\n\n\napt-get install <package>\nInstall a system package.\n\n\nwget\nDownload a file from a URL.\n\n\ngdebi\nInstall local .deb file.\n\n\n\n\n\nD.8.4 Storage\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nHelpful options\n\n\n\n\ndf\nCheck storage space on device.\n-h for human readable file sizes.\n\n\ndu\nCheck size of files.\nMost likely to be used as d u - h <dir> | sort -h\nAlso useful to combine with head.\n\n\n\n\n\nD.8.5 Processes\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nHelpful options\n\n\n\n\ntop\nSee what’s running on the system.\n\n\n\nps aux\nSee all system processes.\nConsider using --sort and pipe into head or grep\n\n\nkill\nKill a system process.\n-9 to force kill immediately\n\n\n\n\n\nD.8.6 Networking\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nHelpful Options\n\n\nnetstat\nSee ports and services using them.\nUsually used with -tlp, for tcp listening applications, including pid\n\n\nssh -L <port>:<i p>:<port>:<host>\nPort forwards a remote port on remote host to local.\nRemote ip is usually localhost.\nChoose local port to match remote port.\n\n\n\n\n\nD.8.7 The path\n\n\n\n\n\n\n\nCommand\nWhat it does\n\n\nwhich <command>\nFinds the location of the binary that runs when you run command.\n\n\nln -s <location to l ink>:<location of symlink>\nCreates a symlink from file at location to link to location of symlink.\n\n\n\n\n\nD.8.8 systemd\nDaemonizing services is accomplished by configuring them in /etc/systemd/system/<service name>.service.\nThe format of all commands is systemctl <command> <application>.\n\n\n\n\n\n\n\nCommand\nNotes/Tips\n\n\n\n\nstatus\nReport status\n\n\nstart\n\n\n\nstop\n\n\n\nrestart\nstop then start\n\n\nreload\nReload configuration that doesn’t require restart (depends on service)\n\n\nenable\nDaemonize the service\n\n\ndisable\nUn-daemonize the service" + "section": "D.8 Linux admin", + "text": "D.8 Linux admin\n\nD.8.1 Users\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nHelpful options + notes\n\n\nsu <username>\nChange to be a different user.\n\n\n\nwhoami\nGet username of current user.\n\n\n\nid\nGet full user + group info on current user.\n\n\n\npasswd\nChange password.\n\n\n\nuseradd\nAdd a new user.\n\n\n\nusermo d <username>\nModify user username.\n-aG <group> adds to a group (e.g.,sudo)\n\n\n\n\n\nD.8.2 Permissions\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nHelpful options + notes\n\n\n\n\nchmod <permissions> <file>\nModifies permissions on a file or directory.\nNumber indicates permissions for user, group, others: add 4 for read, 2 for write, 1 for execute, 0 for nothing, e.g.,644.\n\n\nchown <user/group> <file>\nChange the owner of a file or directory.\nCan be used for user or group, e.g.,:my-group.\n\n\nsudo <command>\nAdopt root permissions for the following command.\n\n\n\n\n\n\nD.8.3 Install applications (Ubuntu)\n\n\n\n\n\n\n\nCommand\nWhat it does\n\n\napt-get update && apt-get upgrade -y\nFetch and install upgrades to system packages\n\n\napt-get install <package>\nInstall a system package.\n\n\nwget\nDownload a file from a URL.\n\n\ngdebi\nInstall local .deb file.\n\n\n\n\n\nD.8.4 Storage\n\n\n\n\n\n\n\n\n\n\n\nCommand\n\n\n\n\nWhat it does\n\n\n\n\nHelpful options\n\n\n\n\n\n\n\n\ndf\n\n\n\n\nCheck storage space on device.\n\n\n\n\n-h for human readable file sizes.\n\n\n\n\n\n\ndu\n\n\n\n\nCheck size of files.\n\n\n\n\nMost likely to be used as du -h <dir> | sort -h\n\n\nAlso useful to combine with head.\n\n\n\n\n\n\n\n\nD.8.5 Processes\n\n\n\n\n\n\n\n\nCommand\nWhat it does\nHelpful options\n\n\n\n\ntop\nSee what’s running on the system.\n\n\n\nps aux\nSee all system processes.\nConsider using --sort and pipe into head or grep.\n\n\nkill\nKill a system process.\n-9 to force kill immediately\n\n\n\n\n\nD.8.6 Networking\n\n\n\n\n\n\n\n\n\n\n\nCommand\n\n\n\n\nWhat it does\n\n\n\n\nHelpful Options\n\n\n\n\n\n\nnetstat\n\n\n\n\nSee ports and services using them.\n\n\n\n\nUsually used with -tlp, for tcp listening applications, including pid.\n\n\n\n\n\n\nssh -L <port>:<i p>:<port>:<host>\n\n\n\n\nPort forwards a remote port on remote host to local.\n\n\n\n\nRemote ip is usually localhost.\n\n\nChoose local port to match remote port.\n\n\n\n\n\n\n\n\nD.8.7 The path\n\n\n\n\n\n\n\nCommand\nWhat it does\n\n\nwhich <command>\nFinds the location of the binary that runs when you run command.\n\n\nln -s <linked location>:<where to put symlink>\nCreates a symlink from file/directory at linked location to where to put symlink.\n\n\n\n\n\nD.8.8 systemd\nDaemonizing services is accomplished by configuring them in /etc/systemd/system/<service name>.service.\nThe format of all commands is systemctl <command> <application>.\n\n\n\n\n\n\n\nCommand\nNotes/Tips\n\n\n\n\nstatus\nReport status.\n\n\nstart\n\n\n\nstop\n\n\n\nrestart\nstop then start.\n\n\nreload\nReload configuration that doesn’t require restart (depends on service).\n\n\nenable\nDaemonize the service.\n\n\ndisable\nUn-daemonize the service." }, { "objectID": "chapters/append/cheatsheets.html#cheat-ports", "href": "chapters/append/cheatsheets.html#cheat-ports", "title": "Appendix D — Cheatsheets", - "section": "D.9 IP Addresses and Ports", - "text": "D.9 IP Addresses and Ports\n\nD.9.1 Special IP Addresses\n\n\n\n\n\n\n\nAddress\nMeaning\n\n\n\n\n\\(\\text{127.0.0.1}\\)\n\\(\\text{localhost}\\) or loopback – the machine that originated the request.\n\n\n\\(\\text{192.168.x.x}\\)\n\\(\\text{172.16.x.x.x}\\)\n\\(\\text{10.x.x.x}\\)\nProtected address blocks used for private IP addresses.\n\n\n\n\n\nD.9.2 Special Ports\nAll ports below \\(1024\\) are reserved for server tasks and cannot be assigned to admin-controlled services.\n\n\n\nProtocol/Application\nDefault Port\n\n\n\n\nHTTP\n\\(80\\)\n\n\nHTTPS\n\\(443\\)\n\n\nSSH\n\\(22\\)\n\n\nPostgreSQL\n\\(5432\\)\n\n\nRStudio Server\n\\(8787\\)\n\n\nShiny Server\n\\(3939\\)\n\n\nJupyterHub\n\\(8000\\)" + "section": "D.9 IP Addresses and ports", + "text": "D.9 IP Addresses and ports\n\nD.9.1 Special IP Addresses\n\n\n\n\n\n\n\n\n\n\nAddress\n\n\n\n\nMeaning\n\n\n\n\n\n\n\n\n\n\n\n\nor loopback – the machine that originated the request.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nProtected address blocks used for private IP addresses.\n\n\n\n\n\n\n\n\nD.9.2 Special ports\nAll ports below \\(1024\\) are reserved for server tasks and cannot be assigned to admin-controlled services.\n\n\n\nProtocol/application\nDefault port\n\n\n\n\nHTTP\n\\(80\\)\n\n\nHTTPS\n\\(443\\)\n\n\nSSH\n\\(22\\)\n\n\nPostgreSQL\n\\(5432\\)\n\n\nRStudio Server\n\\(8787\\)\n\n\nShiny Server\n\\(3939\\)\n\n\nJupyterHub\n\\(8000\\)" } ] \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index 6e2fec05..bed9084a 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,110 +2,110 @@ https://do4ds.com/index.html - 2023-10-25T14:37:21.282Z + 2023-10-25T15:09:24.452Z https://do4ds.com/chapters/intro.html - 2023-10-25T14:37:21.294Z + 2023-10-25T15:09:24.460Z https://do4ds.com/chapters/sec1/1-0-sec-intro.html - 2023-10-25T14:37:21.302Z + 2023-10-25T15:09:24.472Z https://do4ds.com/chapters/sec1/1-1-env-as-code.html - 2023-10-25T14:37:21.322Z + 2023-10-25T15:09:24.492Z https://do4ds.com/chapters/sec1/1-2-proj-arch.html - 2023-10-25T14:37:21.338Z + 2023-10-25T15:09:24.508Z https://do4ds.com/chapters/sec1/1-3-data-access.html - 2023-10-25T14:37:21.382Z + 2023-10-25T15:09:24.552Z https://do4ds.com/chapters/sec1/1-4-monitor-log.html - 2023-10-25T14:37:21.410Z + 2023-10-25T15:09:24.584Z https://do4ds.com/chapters/sec1/1-5-deployments.html - 2023-10-25T14:37:21.430Z + 2023-10-25T15:09:24.600Z https://do4ds.com/chapters/sec1/1-6-docker.html - 2023-10-25T14:37:21.446Z + 2023-10-25T15:09:24.616Z https://do4ds.com/chapters/sec2/2-0-sec-intro.html - 2023-10-25T14:37:21.458Z + 2023-10-25T15:09:24.632Z https://do4ds.com/chapters/sec2/2-1-cloud.html - 2023-10-25T14:37:21.482Z + 2023-10-25T15:09:24.656Z https://do4ds.com/chapters/sec2/2-2-cmd-line.html - 2023-10-25T14:37:21.506Z + 2023-10-25T15:09:24.680Z https://do4ds.com/chapters/sec2/2-3-linux.html - 2023-10-25T14:37:21.530Z + 2023-10-25T15:09:24.704Z https://do4ds.com/chapters/sec2/2-4-app-admin.html - 2023-10-25T14:37:21.562Z + 2023-10-25T15:09:24.732Z https://do4ds.com/chapters/sec2/2-5-scale.html - 2023-10-25T14:37:21.598Z + 2023-10-25T15:09:24.772Z https://do4ds.com/chapters/sec2/2-6-networking.html - 2023-10-25T14:37:21.650Z + 2023-10-25T15:09:24.812Z https://do4ds.com/chapters/sec2/2-7-dns.html - 2023-10-25T14:37:21.662Z + 2023-10-25T15:09:24.828Z https://do4ds.com/chapters/sec2/2-8-ssl.html - 2023-10-25T14:37:21.674Z + 2023-10-25T15:09:24.840Z https://do4ds.com/chapters/sec3/3-0-sec-intro.html - 2023-10-25T14:37:21.682Z + 2023-10-25T15:09:24.848Z https://do4ds.com/chapters/sec3/3-1-ent-networks.html - 2023-10-25T14:37:21.694Z + 2023-10-25T15:09:24.864Z https://do4ds.com/chapters/sec3/3-2-auth.html - 2023-10-25T14:37:21.706Z + 2023-10-25T15:09:24.872Z https://do4ds.com/chapters/sec3/3-3-ent-scale.html - 2023-10-25T14:37:21.718Z + 2023-10-25T15:09:24.888Z https://do4ds.com/chapters/sec3/3-4-ent-pm.html - 2023-10-25T14:37:21.730Z + 2023-10-25T15:09:24.896Z https://do4ds.com/chapters/append/auth.html - 2023-10-25T14:37:21.750Z + 2023-10-25T15:09:24.920Z https://do4ds.com/chapters/append/lb.html - 2023-10-25T14:37:21.758Z + 2023-10-25T15:09:24.928Z https://do4ds.com/chapters/append/lab-map.html - 2023-10-25T14:37:21.766Z + 2023-10-25T15:09:24.936Z https://do4ds.com/chapters/append/cheatsheets.html - 2023-10-25T14:37:21.806Z + 2023-10-25T15:09:24.980Z