Skip to content
This repository has been archived by the owner on Oct 27, 2023. It is now read-only.

Commit

Permalink
Merge branch 'main' into change-temp
Browse files Browse the repository at this point in the history
  • Loading branch information
Nicole White authored Aug 2, 2023
2 parents ae839c9 + 8d16d68 commit 600e83c
Show file tree
Hide file tree
Showing 8 changed files with 97 additions and 30 deletions.
8 changes: 0 additions & 8 deletions .github/workflows/autoblocks-replays.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,6 @@ jobs:
poetry config virtualenvs.in-project true
poetry install
- name: Install Autoblocks SDK
run: |
source ${{ github.workspace }}/.venv/bin/activate
pip install git+https://nicolewhite:${{ secrets.NICOLES_GITHUB_TOKEN_DO_NOT_USE }}@github.com/autoblocksai/python-sdk.git@v0
- name: Start the app
run: poetry run start &
env:
Expand All @@ -69,13 +64,10 @@ jobs:
message: request.payload
mappers:
query: properties.payload.query
__autoblocks_replay_trace_id: traceId
# Filter out properties that are expected to be different on each
# run to prevent the replay diffs from containing unnecessary noise
property-filter-config: |
request.payload:
- payload.__autoblocks_replay_trace_id
ai.intermediate.response:
- response.id
- response.created
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
__pycache__/
.ruff_cache/

.idea/

*.pyc
*.DS_Store

Expand Down
71 changes: 63 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,66 @@
# Autoblocks Replays

This repository demonstrates how to integrate LLM chain replays into your code review process. It contains:

* a [`simpleaichat`](https://github.com/minimaxir/simpleaichat) application that uses the [Autoblocks Python SDK](https://github.com/autoblocksai/python-sdk) to send events to the Autoblocks API
* a GitHub Actions workflow that **replays** real, past events from end users on every push to a feature branch

With our GitHub integration enabled, your teammates are not only reviewing your code, but also the impact that code will have on your LLM chains, and therefore your end users.

<img width="1159" alt="Screenshot 2023-07-23 at 11 02 18 AM" src="https://github.com/autoblocksai/actions/assets/7498009/80055f36-a310-4056-8ac3-f2f0e4ac2b3f">

Unlike other solutions for testing LLMs, Autoblocks Replays are end-to-end. They test your LLM chains from the moment a user sends an input to your application to the moment your application sends a response to the user. This means you can not only review changes to the final response to the user, but also any intermediate steps that might have
changed along the way. This is especially useful for complicated chains that involve multiple services and multiple steps, e.g. if you're using a vector database, tool selection, etc. If you are only ever looking at the final response, it is hard to know which of the intermediate steps in your chain is causing the change.

## Examples

### Updating `simpleaichat`'s `character` input to `"Michael Scott"`

* [Pull request](https://github.com/autoblocksai/demo-replays/pull/6)
* [Replay results](https://github.com/autoblocksai/demo-replays/pull/6#issuecomment-1652606398)

<img width="1062" alt="Screenshot 2023-07-26 at 6 24 27 PM" src="https://github.com/autoblocksai/actions/assets/7498009/ef1b5e70-c2c0-41b8-a52b-af73cdcca11c">

This small change leads to a large change in the final response to the user:

<img width="1073" alt="Screenshot 2023-07-26 at 6 44 16 PM" src="https://github.com/autoblocksai/actions/assets/7498009/1b14f6cb-d666-42a5-a37e-dcfe3ea96742">

It also doesn't sound like Michael Scott from The Office. Digging into the intermediate steps, we can see `simpleaichat` updated the prompt with character instructions, but with the wrong Michael Scott:

<img width="979" alt="Screenshot 2023-07-26 at 6 46 32 PM" src="https://github.com/autoblocksai/actions/assets/7498009/ebad9b66-be9b-4f0c-b189-f362dd2a9956">

### Increasing the `temperature` parameter

* [Pull request](https://github.com/autoblocksai/demo-replays/pull/7)
* [Replay results](https://github.com/autoblocksai/demo-replays/pull/7#issuecomment-1652649904)

<img width="967" alt="Screenshot 2023-07-26 at 6 57 27 PM" src="https://github.com/autoblocksai/actions/assets/7498009/1825db5a-84b9-4bfc-bd28-faa2008ddfd4">

This change has pretty inoccuous results on the final response to the user. The model
changes a few words here and there, but the messaging is very similar.

Query about San Francisco:

<img width="1213" alt="Screenshot 2023-07-26 at 6 59 17 PM" src="https://github.com/autoblocksai/actions/assets/7498009/69ded03a-2d29-4287-b014-e6385526a1e6">

Query about highest points:

<img width="941" alt="Screenshot 2023-07-26 at 7 00 13 PM" src="https://github.com/autoblocksai/actions/assets/7498009/b68f5ac9-296f-4935-85dd-5b74e2c7e551">

### Changing the description of the tools

* [Pull request](https://github.com/autoblocksai/demo-replays/pull/2)
* [Replay results](https://github.com/autoblocksai/demo-replays/pull/2#issuecomment-1652129031)

Autoblocks helps you better understand how your code changes affect the intermediate steps in your chain, especially if you're using a wrapper like `simpleaichat` or `LangChain`, both of which are higher level wrappers around calls to LLMs. For example, perhaps a teammate has not fully read the `simpleaichat` documentation and doesn't realize that the
doc strings of the functions passed to the `tools` array are actually used in the prompts!

<img width="1058" alt="Screenshot 2023-07-26 at 7 12 01 PM" src="https://github.com/autoblocksai/actions/assets/7498009/204bfde2-86b4-487e-b92f-86ecacbf4d00">

Autoblocks would easily surface this change during the code review process:

<img width="1071" alt="Screenshot 2023-07-26 at 7 11 28 PM" src="https://github.com/autoblocksai/actions/assets/7498009/5f6117ce-d24e-488f-99f2-256f3115f741">

## Replaying Locally

Start the application with replays enabled:
Expand All @@ -21,9 +82,7 @@ poetry run replay --view-id clkeamsei0001l908cmjjtqrf --num-traces 3
```

```
################################################################################
Your replay id is 2023-07-23_09-36-36
################################################################################
Replaying event {'id': 'geepag24zence2kbe0ppagt9', 'traceId': '7cb3ec98-b320-4e62-9a51-b15d0218ae4c', 'timestamp': '2023-07-22T18:32:51.862Z', 'message': 'request.payload', 'properties': {'payload': {'query': 'What are all of the airports in London?'}, 'source': 'DEMO_REPLAYS'}}
```
Expand Down Expand Up @@ -56,10 +115,6 @@ diff \

Use the [`autoblocksai/actions/replay`](https://github.com/autoblocksai/actions/tree/main/replay) action to replay events in a GitHub Actions workflow. This is similar to replaying events locally but allows you to automate replays in your CI workflow and view results in the GitHub UI.

The action will leave a comment on your commit with a summary of the replay results:
The action will leave a comment on your pull request with a summary of the replay results:

<img width="785" alt="Screenshot 2023-07-23 at 11 49 25 AM" src="https://github.com/autoblocksai/actions/assets/7498009/b6507fde-9a04-4c4d-9049-2bdefb35f933">

You can view diffs of individual events or entire traces:

<img width="1159" alt="Screenshot 2023-07-23 at 11 02 18 AM" src="https://github.com/autoblocksai/actions/assets/7498009/80055f36-a310-4056-8ac3-f2f0e4ac2b3f">
<img width="857" alt="Screenshot 2023-07-26 at 6 50 31 PM" src="https://github.com/autoblocksai/actions/assets/7498009/ebfb31da-af70-45bf-b9b5-5e640a4fa104">
4 changes: 2 additions & 2 deletions demo_replays/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from flask import request

from demo_replays import bot
from demo_replays.settings import AUTOBLOCKS_REPLAYS_TRACE_ID_PARAM_NAME
from demo_replays.settings import AUTOBLOCKS_REPLAY_TRACE_ID_HEADER_NAME
from demo_replays.settings import env

app = Flask(__name__)
Expand All @@ -26,7 +26,7 @@ def main():

# In production we generate a new trace id for each request,
# but in a replay scenario we use the trace id passed in from the replay
trace_id = payload.get(AUTOBLOCKS_REPLAYS_TRACE_ID_PARAM_NAME) or str(uuid.uuid4())
trace_id = request.headers.get(AUTOBLOCKS_REPLAY_TRACE_ID_HEADER_NAME) or str(uuid.uuid4())

autoblocks = AutoblocksTracer(
env.AUTOBLOCKS_INGESTION_KEY, trace_id=trace_id, properties=dict(source="DEMO_REPLAYS")
Expand Down
17 changes: 11 additions & 6 deletions demo_replays/replay.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from autoblocks.replays import replay_events_from_view
from autoblocks.replays import start_replay

from demo_replays.settings import AUTOBLOCKS_REPLAYS_TRACE_ID_PARAM_NAME
from demo_replays.settings import AUTOBLOCKS_REPLAY_TRACE_ID_HEADER_NAME
from demo_replays.settings import env


Expand All @@ -21,7 +21,11 @@ def static():
("eiffel", "Eiffel Tower"),
]:
print(f"Testing static event {trace_id} - {query}")
requests.post("http://localhost:5000", json={AUTOBLOCKS_REPLAYS_TRACE_ID_PARAM_NAME: trace_id, "query": query})
requests.post(
"http://localhost:5000",
json={"query": query},
headers={AUTOBLOCKS_REPLAY_TRACE_ID_HEADER_NAME: trace_id},
)


def dynamic():
Expand Down Expand Up @@ -51,8 +55,9 @@ def dynamic():
# The original payload
payload = event.properties["payload"]

# Modify the payload to pass in the replay trace id
payload[AUTOBLOCKS_REPLAYS_TRACE_ID_PARAM_NAME] = event.trace_id

# Replay the request
requests.post("http://localhost:5000", json=payload)
requests.post(
"http://localhost:5000",
json=payload,
headers={AUTOBLOCKS_REPLAY_TRACE_ID_HEADER_NAME: event.trace_id},
)
7 changes: 3 additions & 4 deletions demo_replays/settings.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
from pydantic_settings import BaseSettings

# A hidden param that is used to override the trace id that would usually
# be randomly generated for each request with the trace id of the event
# that is being replayed
AUTOBLOCKS_REPLAYS_TRACE_ID_PARAM_NAME = "__autoblocks_replay_trace_id"
# When a request is from a replay, this header contains the trace ID of
# the event being replayed.
AUTOBLOCKS_REPLAY_TRACE_ID_HEADER_NAME = "x-autoblocks-replay-trace-id"


# Environment variables
Expand Down
16 changes: 15 additions & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ requests = "^2.31.0"
flask = "^2.3.2"
pydantic-settings = "^2.0.2"
simpleaichat = "^0.2.2"
# autoblocksai = { git = "https://github.com/autoblocksai/python-sdk.git", branch = "v0" }
autoblocksai = "^0.0.1"

[tool.poetry.group.dev.dependencies]
pre-commit = "^3.3.3"
Expand Down

1 comment on commit 600e83c

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Autoblocks Replay Results

Trace 1d3baf6c-2928-459d-845d-dec611b5d140

Replay Inputs

request.payload - cvsy500a8blezkts8ebbsshh

{
  "query": "Alcatraz Island"
}

Replay Outputs

Message Original Replay Difference
request.payload Original Replay +0 / -0
ai.intermediate.request Original Replay +0 / -0
ai.intermediate.response Original Replay +0 / -0
ai.tool.selected Original Replay +0 / -0
ai.tool.response Original Replay +0 / -0
ai.intermediate.request Original Replay +1 / -1
ai.intermediate.response Original Replay +1 / -1
ai.final.response Original Replay +1 / -1
request.response Original Replay +1 / -1
--- ALL --- Original Replay +4 / -4

Trace a3be7623-be9e-45a0-93d8-7edd7beeccb6

Replay Inputs

request.payload - in3ym5sjrsqjqhjeayr8yqx6

{
  "query": "San Francisco tourist attractions"
}

Replay Outputs

Message Original Replay Difference
request.payload Original Replay +0 / -0
ai.intermediate.request Original Replay +0 / -0
ai.intermediate.response Original Replay +0 / -0
ai.tool.selected Original Replay +0 / -0
ai.tool.response Original Replay +3 / -3
ai.intermediate.request Original Replay +2 / -2
ai.intermediate.response Original Replay +1 / -1
ai.final.response Original Replay +4 / -4
request.response Original Replay +1 / -1
--- ALL --- Original Replay +11 / -11

Trace 3db671ca-5462-418b-bbdf-ed07bb375584

Replay Inputs

request.payload - b5svpjqmnip44seqz2113hep

{
  "query": "Highest points in the world"
}

Replay Outputs

Message Original Replay Difference
request.payload Original Replay +0 / -0
ai.intermediate.request Original Replay +0 / -0
ai.intermediate.response Original Replay +0 / -0
ai.tool.selected Original Replay +0 / -0
ai.tool.response Original Replay +2 / -2
ai.intermediate.request Original Replay +4 / -4
ai.intermediate.response Original Replay +3 / -3
ai.final.response Original Replay +3 / -3
request.response Original Replay +1 / -1
--- ALL --- Original Replay +11 / -11

Please sign in to comment.