rust-mlops-template

A work in progress to build out solutions in Rust for MLOPs. Will be covered in the O'Reilly book Implementing MLOps in the Enterprise.

Demo Hitlist (Will Solve hopefully almost every day/weekly)

Do an inline python example
Train a model in PyTorch with CPU: https://github.com/LaurentMazare/tch-rs
Train a model in PyTorch with GPU: https://github.com/LaurentMazare/tch-rs
Serve out ONNX with a Rust web framework like Actix
ONNX Command-Line Tool
Simple async network example: (network discovery or chat system)
Rust SQLite Example
Rust AWS Lambda
Simple Rust GUI
Rust Whisper Tool with C++ Bindings
Fast Keyword Extraction (NLP)
Emit Random Mediterranean Meals via CLI
Web Assembly Rust

Advanced Aspirational Demos

Building a database in Rust
Building a search engine in Rust
Building a web server in Rust
Building a batch processing systems in Rust
Build a command-line chat system
Build a locate clone
Build a load-testing tool

Motivation

One of the key goals of this project is to determine workflows that do not involve the #jcpennys (Jupyter, Conda, Pandas, Numpy, Sklearn) stack for #mlops. In particular I am not a fan of the conda installation tool (it is superfluous as I demonstrate in the Python MLOps Template) vs containerized workflows that use the Python Standard Library (Docker + pip + virtualenv) and this is a good excuse to find other solutions outside of that stack. For example:

Why not also find a more performant Data Frame library, faster speed, etc.
Why not have a compiler?
Why not have a simple packaging solution?
Why not have a very fast computational speed?
Why not be able to write both for the Linux Kernel and general purpose scripting?
Why not see if there is a better solution than Python (which is essentially two languages scientific python and regular Python)?
Python is one of the least green languages in terms of energy efficiency, but Rust is one of the best.

In The Beginning Was the Command-Line

What could #mlops and #datascience look like in 2023 without #jupyternotebook and "God Tools" as the center of the universe? It could be the command line. In the beginning, it was the command line, and it may be the best solution for this domain.

"What would the engineer say after you had explained your problem and enumerated all the dissatisfactions in your life? He would probably tell you that life is a very hard and complicated thing; that no interface can change that; that anyone who believes otherwise is a sucker; and that if you don't like having choices made for you, you should start making your own." -Neal Stephensen

Using Data (i.e. Data Science)

StackOverflow https://survey.stackoverflow.co/2022/#section-most-loved-dreaded-and-wanted-programming-scripting-and-markup-languages[states that #rust is on 7th year as the most loved language 87% of developers want to continue developing](https://survey.stackoverflow.co/2022/#section-most-loved-dreaded-and-wanted-programming-scripting-and-markup-languages) in and ties with Python as the most wanted technology. Clearly there is traction.
According to http://www.modulecounts.com/[Modulecounts] it looks like an exponential growth curve to Rust.

Getting Started

This repository is a GitHub Template and you can use it to create a new repository that uses GitHub Codespaces. It is pre-configured with Rust, Cargo and other useful extensions like GitHub Copilot.

Install and Setup

There are a few options:

You can follow the Official Install Guide for Rust
Create a repo with this template

Once you install you should check to see things work:

rustc --version

Other option is to run make rust-version which checks both the cargo and rust version. To run everything locally do: make all and this will format/lint/test all projects in this repository.

Rust CLI Tools Ecosystem

You can see there several tools which help you get things done in Rust:

rust-version:
	@echo "Rust command-line utility versions:"
	rustc --version 			#rust compiler
	cargo --version 			#rust package manager
	rustfmt --version			#rust code formatter
	rustup --version			#rust toolchain manager
	clippy-driver --version		#rust linter

Hello World Setup

This is an intentionally simple full end-to-end hello world example. I used some excellent ideas from @kyclark, author of the command-line-rust book from O'Reilly here. You can recreate on your own following these steps

Create a project directory

cargo new hello

This creates a structure you can see with tree hello

hello/
├── Cargo.toml
└── src
    └── main.rs
1 directory, 2 files

The Cargo.toml file is where the project is configured, i.e. if you needed to add a dependency. The source code file has the following content in main.rs. It looks a lot like Python or any other modern language and this function prints a message.

fn main() {
    println!("Hello, world MLOPs!");
}

To run the project you cd into hello and run cargo run i.e. cd hello && cargo run. The output looks like the following:

@noahgift ➜ /workspaces/rust-mlops-template/hello (main ✗) $ cargo run
   Compiling hello v0.1.0 (/workspaces/rust-mlops-template/hello)
    Finished dev [unoptimized + debuginfo] target(s) in 0.36s
     Running `target/debug/hello`
Hello, world MLOPs!

To run without all of the noise: cargo run --quiet. To run the binary created ./target/debug/hello

Run with GitHub Actions

GitHub Actions uses a Makefile to simplify automation

name: Rust CI/CD Pipeline
on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]
env:
  CARGO_TERM_COLOR: always
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v1
    - uses: actions-rs/toolchain@v1
      with:
          toolchain: stable
          profile: minimal
          components: clippy, rustfmt
          override: true
    - name: update linux
      run: sudo apt update 
    - name: update Rust
      run: make install
    - name: Check Rust versions
      run: make rust-version
    - name: Format
      run: make format
    - name: Lint
      run: make lint
    - name: Test
      run: make test

To run everything locally do: make all.

Simple Marco-Polo Game

Change into MarcoPolo directory and run cargo run -- play --name Marco and you should see the following output:

Polo

First Big Project: Deduplication Command-Line Tool

I have written command-line deduplication tools in many languages so this is what I choose to build a substantial example. The general approach I use is as follows:

Walk the filesystem and create a checksum for each file
If the checksum matches an existing checksum, then mark it as a duplicate file

Getting Started

Create new project: crate new dedupe
Check latest clap version: https://crates.io/crates/clap and put this version in the Cargo.toml The file should look similar to this.

[package]
name = "dedupe"
version = "0.1.0"
edition = "2021"

[dependencies]
clap = "4.0.32"

[dev-dependencies]
assert_cmd = "2"

Next up make a test directory: mkdir tests that is parallel to src and put a cli.rs inside
touch a lib.rs file and use this for the logic then run cargo run
Inside this project I also created a Makefile to easily do everything at once:

format:
	cargo fmt --quiet

lint:
	cargo clippy --quiet

test:
	cargo test --quiet

run:
	cargo run --quiet

all: format lint test run

Now as I build code, I can simply do: make all and get a high quality build.

Next, let's create some test files:

echo "foo" > /tmp/one.txt
echo "foo" > /tmp/two.txt
echo "bar" > /tmp/three.txt

The final version works: cargo run -- --path /tmp

@noahgift ➜ /workspaces/rust-mlops-template/dedupe (main ✗) $ cargo run -- --path /tmp
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/dedupe --path /tmp`
Searching path: "/tmp"
Found 5 files
Found 1 duplicates
Duplicate files: ["/tmp/two.txt", "/tmp/one.txt"]

Next things to complete for dedupe (in another repo):

Switch to subcommands and create a search and dedupe subcommand
Add better testing with sample test files
Figure out how to release packages for multiple OS versions in GitHub

More MLOps project ideas

Query Hugging Face dataset cli
Summarize News CLI
Microservice Web Framework, trying actix to start, that has a calculator API
Microservice Web Framework deploys pre-trained model
Descriptive Statistics on a well known dataset using https://www.pola.rs/[Polars] inside a CLI
Train a model with PyTorch (probably via bindings to Rust)

Actix Microservice

Refer to Actix getting started guide
cargo new calc && cd calc
add dependency to Cargo.toml

[package]
name = "calc"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
actix-web = "4"

create a src/lib.rs and place inside

//calculator functions

//Add two numbers
pub fn add(a: i32, b: i32) -> i32 {
    a + b
}

//Subtract two numbers
pub fn subtract(a: i32, b: i32) -> i32 {
    a - b
}

//Multiply two numbers
pub fn multiply(a: i32, b: i32) -> i32 {
    a * b
}

//Divide two numbers
pub fn divide(a: i32, b: i32) -> i32 {
    a / b
}

In the main.rs put the following:

//Calculator Microservice
use actix_web::{get, web, App, HttpResponse, HttpServer, Responder};

#[get("/")]
async fn index() -> impl Responder {
    HttpResponse::Ok().body("This is a calculator microservice")
}

//library add route using lib.rs
#[get("/add/{a}/{b}")]
async fn add(info: web::Path<(i32, i32)>) -> impl Responder {
    let result = calc::add(info.0, info.1);
    HttpResponse::Ok().body(result.to_string())
}

//library subtract route using lib.rs
#[get("/subtract/{a}/{b}")]
async fn subtract(info: web::Path<(i32, i32)>) -> impl Responder {
    let result = calc::subtract(info.0, info.1);
    HttpResponse::Ok().body(result.to_string())
}

//library multiply route using lib.rs
#[get("/multiply/{a}/{b}")]
async fn multiply(info: web::Path<(i32, i32)>) -> impl Responder {
    let result = calc::multiply(info.0, info.1);
    HttpResponse::Ok().body(result.to_string())
}

//library divide route using lib.rs
#[get("/divide/{a}/{b}")]
async fn divide(info: web::Path<(i32, i32)>) -> impl Responder {
    let result = calc::divide(info.0, info.1);
    HttpResponse::Ok().body(result.to_string())
}

//run it
#[actix_web::main]
async fn main() -> std::io::Result<()> {
    HttpServer::new(|| {
        App::new()
            .service(index)
            .service(add)
            .service(subtract)
            .service(multiply)
            .service(divide)
    })
    .bind(("127.0.0.1", 8080))?
    .run()
    .await
}

Next, use a Makefile to ensure a simple workflow

format:
	cargo fmt --quiet

lint:
	cargo clippy --quiet

test:
	cargo test --quiet

run:
	cargo run 

all: format lint test run

Run make all then test out the route by adding two numbers at /add/2/2

Hugging Face Example

Uses rust-bert crate
Create new project cargo new hfdemo and cd into it: cd hfdemo
Create a new library file: touch src/lib.rs
Add packages to Cargo.toml

[package]
name = "hfdemo"
version = "0.1.0"
edition = "2021"

[dependencies]
rust-bert = "0.19.0"
clap = {version="4.0.32", features=["derive"]}
wikipedia = "0.3.4"

The library code is in lib.rs and the subcommands from clap live in main.rs. Here is the tool in action:

@noahgift ➜ /workspaces/rust-mlops-template/hfdemo (main ✗) $ cargo run sumwiki --page argentina
    Finished dev [unoptimized + debuginfo] target(s) in 4.59s
     Running `target/debug/hfdemo sumwiki --page argentina`
Argentina is a country in the southern half of South America. It covers an area of 2,780,400 km2 (1,073,500 sq mi), making it the second-largest country in South America after Brazil. It is also the fourth-largest nation in the Americas and the eighth-largest in the world.

Hugging Face Q/A Example

cd into hfqa and run cargo run

```bash
cargo run --quiet -- answer --question "What is the best book from 1880 to read?" --context "The Adventures of Huckleberry Finn was released in 1880"
Answer: The Adventures of Huckleberry Finn

Hugging Face Lyrics Analysis using Zero Shot Classification with SQLite

Listen to Maná - En El Muelle De San Blás

@noahgift ➜ /workspaces/rust-mlops-template/sqlite-hf (main ✗) $ cargo run --quiet -- classify
Classify lyrics.txt
rock: 0.06948944181203842
pop: 0.27735018730163574
hip hop: 0.034089818596839905
country: 0.7835917472839355
latin: 0.6906086802482605

Print the lyrics:

cargo run --quiet -- lyrics | less | head

Lyrics lyrics.txt
Uh-uh-uh-uh, uh-uh
Ella despidió a su amor
El partió en un barco en el muelle de San Blas
El juró que volvería
Y empapada en llanto, ella juró que esperaría
Miles de lunas pasaron
Y siempre ella estaba en el muelle, esperando
Muchas tardes se anidaron
Se anidaron en su pelo y en sus labios

Polars Example

Example here
cd into polarsdf and run cargo run

cargo run -- sort --rows 10

You can see an example of how Polars can be used to sort a dataframe in a Rust cli program.

Parallelism

One of the outstanding features of Rust is safe, yet easy paralielism. This project demos parallelism by benchmarking a checksum of several files.

We can see how trivial it is to speed up a program with threads:

Here is the function for the serial version:

// Create a checksum of each file and store in a HashMap if the checksum already exists, add the file to the vector of files with that checksum
pub fn checksum(files: Vec<String>) -> Result<HashMap<String, Vec<String>>, Box<dyn Error>> {
    let mut checksums = HashMap::new();
    for file in files {
        let checksum = md5::compute(std::fs::read(&file)?);
        let checksum = format!("{:x}", checksum);
        checksums
            .entry(checksum)
            .or_insert_with(Vec::new)
            .push(file);
    }
    Ok(checksums)
}

cargo --quiet run -- serial

➜  parallel git:(main) ✗ time cargo --quiet run -- serial
Serial version of the program
d41d8cd98f00b204e9800998ecf8427e:
        src/data/subdir/not_utils_four-score.m4a
        src/data/not_utils_four-score.m4a
b39d1840d7beacfece35d9b45652eee1:
        src/data/utils_four-score3.m4a
        src/data/utils_four-score2.m4a
        src/data/subdir/utils_four-score3.m4a
        src/data/subdir/utils_four-score2.m4a
        src/data/subdir/utils_four-score5.m4a
        src/data/subdir/utils_four-score4.m4a
        src/data/subdir/utils_four-score.m4a
        src/data/utils_four-score5.m4a
        src/data/utils_four-score4.m4a
        src/data/utils_four-score.m4a
cargo --quiet run -- serial  0.57s user 0.02s system 81% cpu 0.729 total

vs threads

time cargo --quiet run -- parallel
Parallel version of the program
d41d8cd98f00b204e9800998ecf8427e:
        src/data/subdir/not_utils_four-score.m4a
        src/data/not_utils_four-score.m4a
b39d1840d7beacfece35d9b45652eee1:
        src/data/utils_four-score5.m4a
        src/data/subdir/utils_four-score3.m4a
        src/data/utils_four-score3.m4a
        src/data/utils_four-score.m4a
        src/data/subdir/utils_four-score.m4a
        src/data/subdir/utils_four-score2.m4a
        src/data/utils_four-score4.m4a
        src/data/utils_four-score2.m4a
        src/data/subdir/utils_four-score4.m4a
        src/data/subdir/utils_four-score5.m4a
cargo --quiet run -- parallel  0.65s user 0.04s system 262% cpu 0.262 total

Ok, so let's look at the code:

// Parallel version of checksum using rayon with a mutex to ensure
//that the HashMap is not accessed by multiple threads at the same time
pub fn checksum_par(files: Vec<String>) -> Result<HashMap<String, Vec<String>>, Box<dyn Error>> {
    let checksums = std::sync::Mutex::new(HashMap::new());
    files.par_iter().for_each(|file| {
        let checksum = md5::compute(std::fs::read(file).unwrap());
        let checksum = format!("{:x}", checksum);
        checksums
            .lock()
            .unwrap()
            .entry(checksum)
            .or_insert_with(Vec::new)
            .push(file.to_string());
    });
    Ok(checksums.into_inner().unwrap())
}

The main takeaway is that we use a mutex to ensure that the HashMap is not accessed by multiple threads at the same time. This is a very common pattern in Rust.

Logging in Rust Example

cd into clilog and type: cargo run -- --level TRACE

//function returns a random fruit and logs it to the console
pub fn random_fruit() -> String {
    //randomly select a fruit
    let fruit = FRUITS[rand::thread_rng().gen_range(0..5)];
    //log the fruit
    log::info!("fruit-info: {}", fruit);
    log::trace!("fruit-trace: {}", fruit);
    log::warn!("fruit-warn: {}", fruit);
    fruit.to_string()
}

AWS

Rust AWS S3 Bucket Metadata Information

You can get it here

Running an optimized version was able to sum all the objects in my AWS Account about 1 second: ./target/release/awsmetas3 account-size

Client-Server Example

Example lives here: https://github.com/noahgift/rust-mlops-template/tree/main/rrgame

Current Status

Client server echo working

cargo run -- client --message "hi" cargo run -- server

A bigger example lives here: https://github.com/noahgift/rust-multiplayer-roulette-game

Containerized Rust Applications

Working Containerized Rust CLI Example

FROM rust:latest as builder
ENV APP containerized_marco_polo_cli
WORKDIR /usr/src/$APP
COPY . .
RUN cargo install --path .
 
FROM debian:buster-slim
RUN apt-get update && rm -rf /var/lib/apt/lists/*
COPY --from=builder /usr/local/cargo/bin/$APP /usr/local/bin/$APP
ENTRYPOINT [ "/usr/local/bin/containerized_marco_polo_cli" ]

Containerized PyTorch Rust

cd into: pytorch-rust-docker

Here is the Dockerfile

FROM rust:latest as builder
ENV APP pytorch-rust-docker
WORKDIR /usr/src/$APP
COPY . .
RUN apt-get update && rm -rf /var/lib/apt/lists/*
RUN cargo install --path .
RUN cargo build -j 6

docker build -t pytorch-rust-docker .
docker run -it pytorch-rust-docker
Next inside the container run: cargo run -- resnet18.ot Walking_tiger_female.jpg

Tensorflow Rust Bindings

See tf-rust-example

/*Rust Tensorflow Hello World */

extern crate tensorflow;
use tensorflow::Tensor;

fn main() {
    let mut x = Tensor::new(&[1]);
    x[0] = 2i32;
    //print the value of x
    println!("{:?}", x[0]);
    //print the shape of x
    println!("{:?}", x.shape());
    //create a multidimensional tensor
    let mut y = Tensor::new(&[2, 2]);
    y[0] = 1i32;
    y[1] = 2i32;
    y[2] = 3i32;
    y[3] = 4i32;
    //print the value of y
    println!("{:?}", y[0]);
    //print the shape of y
    println!("{:?}", y.shape());
}

Pytorch

Pre-trained model: cd into pytorch-rust-example then run: cargo run -- resnet18.ot Walking_tiger_female.jpg

Web Assembly in Rust

Cd into hello-wasm-bindgen and run make install the make serve

You should see something like this:

/* hello world Rust webassembly*/
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
extern "C" {
    fn alert(s: &str);
}

//export the function to javascript
#[wasm_bindgen]
pub fn marco_polo(s: &str) {
    //if the string is "Marco" return "Polo"
    if s == "Marco" {
        alert("Polo");
    }
    //if the string is anything else return "Not Marco"
    else {
        alert("Not Marco");
    }
}

Kmeans Example

cd into linfa-kmeans and run cargo run -- cluster

Lasso Regression CLI

@noahgift ➜ /workspaces/rust-mlops-template/regression-cli (main ✗) $ cargo run -- train --ratio .9
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/regression-cli train --ratio .9`
Training ratio: 0.9
intercept:  152.1586901763224
params: [0, -0, 503.58067499818077, 167.75801599203626, -0, -0, -121.6828192430516, 0, 427.9593531331433, 6.412796328606638]
z score: Ok([0.0, -0.0, 6.5939908998261245, 2.2719123245079786, -0.0, -0.0, -0.5183690897253823, 0.0, 2.2777581181031765, 0.0858408096568952], shape=[10], strides=[1], layout=CFcf (0xf), const ndim=1)
predicted variance: -0.014761955865436382

After all the weights are downloaded run:

cargo run --example stable-diffusion --features clap -- --prompt "A very rusty robot holding a fire torch to notebooks"

Stable Diffusion 2.1 Pegging GPU

Rusty Robot Torching Notebooks

Randomly Select Rust Crates To Work On

cd into rust-ideas

cargo run -- --help cargo run -- popular --number 4 cargo run -- random

@noahgift ➜ /workspaces/rust-mlops-template/rust-ideas (main ✗) $ cargo run -- random
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running `target/debug/rust-ideas random`
Random crate: "libc"

Onnx Example

cd into OnnxDemo and run make install then cargo run -- infer which invokes a squeezenet model.

Build System

This build system is a bit unique because it recursives many Rust repos and tests them all!

Language References and Tutorials

MLOps/ML Engineering and Data Science

Cloud Computing

AWS

Linux Kernel

Rust makes way to Linux Kernel

Systems Tools

An extended deduplication example command-line tool

Deep Learning

Web Microservices and Serverless

Data Frames

Polars. You can see an example here.

Authoring Tools

One goal is to reduce using Notebooks in favor of lightweight markdown tools (i.e. the goal is MLOps vs interactive notebooks)

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.devcontainer		.devcontainer
.github		.github
MarcoPolo		MarcoPolo
OnnxDemo		OnnxDemo
awsmetas3		awsmetas3
calc-microservice		calc-microservice
clilog		clilog
dedupe		dedupe
hello-wasm-bindgen		hello-wasm-bindgen
hello		hello
hf-qa		hf-qa
hfdemo		hfdemo
linfa-kmeans		linfa-kmeans
parallel		parallel
polarsdf		polarsdf
pytorch-rust-docker		pytorch-rust-docker
pytorch-rust-example		pytorch-rust-example
regression-cli		regression-cli
rrgame		rrgame
rust-ideas		rust-ideas
sqlite-hf		sqlite-hf
tf-rust-example		tf-rust-example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
format.sh		format.sh
lint.sh		lint.sh
test.sh		test.sh

License

jose-erickson/rust-mlops-template

Folders and files

Latest commit

History

Repository files navigation

rust-mlops-template

Demo Hitlist (Will Solve hopefully almost every day/weekly)

Advanced Aspirational Demos

Motivation

In The Beginning Was the Command-Line

Using Data (i.e. Data Science)

Getting Started

Install and Setup

Rust CLI Tools Ecosystem

Hello World Setup

Run with GitHub Actions

Simple Marco-Polo Game

First Big Project: Deduplication Command-Line Tool

More MLOps project ideas

Actix Microservice

Hugging Face Example

Hugging Face Q/A Example

Hugging Face Lyrics Analysis using Zero Shot Classification with SQLite

Polars Example

Parallelism

Logging in Rust Example

AWS

Rust AWS S3 Bucket Metadata Information

Client-Server Example

Current Status

Containerized Rust Applications

Containerized PyTorch Rust

Tensorflow Rust Bindings

Pytorch

Web Assembly in Rust

Kmeans Example

Lasso Regression CLI

Transcription with Whisper in Rust

Rust PyTorch Saturating GPU

Rust PyTorch MNIST Saturating GPU

Rust Stable Diffusion Demo

Randomly Select Rust Crates To Work On

Onnx Example

Build System

Language References and Tutorials

MLOps/ML Engineering and Data Science

Cloud Computing

AWS

Linux Kernel

Systems Tools

Deep Learning

Web Microservices and Serverless

Data Frames

Authoring Tools

Linux Tools

Python and Rust integration

GUI

NLP

Onnx

Static Web

Pure Rust Machine Learning

Benchmarking

Testing Tools

Containerized Rust

Embedded Rust

benchmark

OpenAI

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages