GitHub - jingli-wtbox/phoenix: ML Observability in a Notebook - Uncover Insights, Surface Problems, Monitor, and Fine Tune your Generative LLM, CV and Tabular Models

Phoenix provides MLOps insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is notebook-first python library that leverages embeddings to uncover problematic cohorts of your LLM, CV, NLP and tabular models.

Installation

pip install arize-phoenix

Quickstart

Import libraries.

from dataclasses import replace
import pandas as pd
import phoenix as px

Download curated datasets and load them into pandas DataFrames.

train_df = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_training.parquet"
)
prod_df = pd.read_parquet(
    "https://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/cv/human-actions/human_actions_production.parquet"
)

Define schemas that tell Phoenix which columns of your DataFrames correspond to features, predictions, actuals (i.e., ground truth), embeddings, etc.

train_schema = px.Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="predicted_action",
    actual_label_column_name="actual_action",
    embedding_feature_column_names={
        "image_embedding": px.EmbeddingColumnNames(
            vector_column_name="image_vector",
            link_to_data_column_name="url",
        ),
    },
)
prod_schema = replace(train_schema, actual_label_column_name=None)

Define your production and training datasets.

prod_ds = px.Dataset(prod_df, prod_schema)
train_ds = px.Dataset(train_df, train_schema)

Launch the app.

session = px.launch_app(prod_ds, train_ds)

You can open Phoenix by copying and pasting the output of session.url into a new browser tab.

session.url

Alternatively, you can open the Phoenix UI in your notebook with

session.view()

When you're done, don't forget to close the app.

px.close_app()

Features

Embedding Drift Analysis

Explore UMAP point-clouds at times of high euclidean distance and identify clusters of drift.

UMAP-based Exploratory Data Analysis

Color your UMAP point-clouds by your model's dimensions, drift, and performance to identify problematic cohorts.

Cluster-driven Drift and Performance Analysis

Break-apart your data into clusters of high drift or bad performance using HDBSCAN

Exportable Clusters

Export your clusters to parquet files or dataframes for further analysis and fine-tuning.

Documentation

For in-depth examples and explanations, read the docs.

Community

Join our community to connect with thousands of machine learning practitioners and ML observability enthusiasts.

🌍 Join our Slack community.
💡 Ask questions and provide feedback in the #phoenix-support channel.
🌟 Leave a star on our GitHub.
🐞 Report bugs with GitHub Issues.
💌️ Sign up for our mailing list.
🗺️ Check out our roadmap to see where we're heading next.
🎓 Learn the fundamentals of ML observability with our introductory and advanced courses.

Thanks

UMAP For unlocking the ability to visualize and reason about embeddings
HDBSCAN For providing a clustering algorithm to aid in the discovery of drift and performance degradation

Copyright, Patent, and License

Portions of this code are patent protected by one or more U.S. Patents. See IP_NOTICE.

This software is licensed under the terms of the Elastic License 2.0 (ELv2). See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 607 Commits
.github		.github
app		app
docs		docs
src/phoenix		src/phoenix
tests		tests
tutorials		tutorials
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
DEVELOPMENT.md		DEVELOPMENT.md
IP_NOTICE		IP_NOTICE
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Quickstart

Features

Embedding Drift Analysis

UMAP-based Exploratory Data Analysis

Cluster-driven Drift and Performance Analysis

Exportable Clusters

Documentation

Community

Thanks

Copyright, Patent, and License

About

Releases

Packages

Languages

License

jingli-wtbox/phoenix

Folders and files

Latest commit

History

Repository files navigation

Installation

Quickstart

Features

Embedding Drift Analysis

UMAP-based Exploratory Data Analysis

Cluster-driven Drift and Performance Analysis

Exportable Clusters

Documentation

Community

Thanks

Copyright, Patent, and License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages