Skip to content

Commit

Permalink
[DERCBOT-1173] RAG Evaluation - langfuse
Browse files Browse the repository at this point in the history
  • Loading branch information
Morgan Diverrez authored and assouktim committed Nov 6, 2024
1 parent 29de731 commit 3c14892
Show file tree
Hide file tree
Showing 11 changed files with 1,593 additions and 1,216 deletions.
1,794 changes: 912 additions & 882 deletions gen-ai/orchestrator-server/src/main/python/server/poetry.lock

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ colorlog = "^6.8.2"
boto3 = "^1.35.37"
urllib3 = "^2.2.3"
jinja2 = "^3.1.4"
langfuse = "^2.52.0"
langfuse = "2.36.2"
httpx-auth-awssigv4 = "^0.1.4"
langchain-postgres = "^0.0.12"
google-cloud-secret-manager = "^2.20.2"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#for LangFuse dataset provider
LANGFUSE_SECRET_KEY=
LANGFUSE_PUBLIC_KEY=
LANGFUSE_HOST=

# for LangsSmith dataset_provider
LANGCHAIN_API_KEY=

# for smarttribune_consumer.py script
API_KEY=
API_SECRET=
Original file line number Diff line number Diff line change
Expand Up @@ -205,12 +205,12 @@ To configure the default vector store, you can use the following environment var

### generate_dataset.py

Generates a testing dataset based on an input file. The input file should have the correct format (see generate_datset_input.xlsx for sample). The generated dataset can be saved on filesystem, using the --csv-output option, on langsmith, using the --langsmith-dataset-name option, or both.
Generates a testing dataset based on an input file. The input file should have the correct format (see generate_datset_input.xlsx for sample). The generated dataset can be saved on filesystem, using the --csv-output option, on langsmith, using the --langsmith-dataset-name option, on langfuse using the --langfuse-dataset-name option, or both.

```
Usage:
generate_dataset.py [-v] <input_excel> --range=<s> [--csv-output=<path>] [ --langsmith-dataset-name=<name> ] [--locale=<locale>] [--no-answer=<na>]
generate_dataset.py [-v] <input_excel> --sheet=<n>... [--csv-output=<path>] [ --langsmith-dataset-name=<name> ] [--locale=<locale>] [--no-answer=<na>]
generate_dataset.py [-v] <input_excel> --range=<s> [--csv-output=<path>] [ --langsmith-dataset-name=<name> ] [ --langfuse-dataset-name=<name> ] [--locale=<locale>] [--no-answer=<na>]
generate_dataset.py [-v] <input_excel> --sheet=<n>... [--csv-output=<path>] [ --langsmith-dataset-name=<name> ] [ --langfuse-dataset-name=<name> ] [--locale=<locale>] [--no-answer=<na>]
Arguments:
input_excel path to the input excel file
Expand All @@ -220,22 +220,22 @@ Options:
--sheet=<n> Sheet numbers to be parsed. Indices are 0-indexed.
--csv-output=<path> Output path of csv file to be generated.
--langsmith-dataset-name=<name> Name of the dataset to be saved on langsmith.
--langfuse-dataset-name=<name> Name of the dataset to be saved on langfuse.
--locale=<locale> Locale to be included in de dataset. [default: French]
--no-answer=<na> Label of no_answer to be included in the dataset. [default: NO_RAG_SENTENCE]
-h --help Show this screen
--version Show version
-v Verbose output for debugging (without this option, script will be silent but for errors)
Generates a testing dataset based on an input file. The input file should have the correct format (see generate_datset_input.xlsx for sample). The generated dataset can be saved on filesystem, using the --csv-output option, on langsmith, using the --langsmith-dataset-name option, or both.
Generates a testing dataset based on an input file. The input file should have the correct format (see generate_datset_input.xlsx for sample). The generated dataset can be saved on filesystem, using the --csv-output option, on langsmith, using the --langsmith-dataset-name option, on langfuse using the --langfuse-dataset-name option, or both.
```

### rag_testing_tool.py

Retrieval-Augmented Generation (RAG) endpoint settings testing tool based on LangSmith's SDK: runs a specific RAG Settings configuration against a reference dataset.
Retrieval-Augmented Generation (RAG) endpoint settings testing tool based on LangSmith's or LangFuse's SDK: runs a specific RAG Settings configuration against a reference dataset.

```
Usage:
rag_testing_tool.py [-v] <rag_query> <dataset_name> <test_name> [<delay>]
rag_testing_tool.py [-v] <rag_query> <dataset_provider> <dataset_name> <test_name> [<delay>]
rag_testing_tool.py -h | --help
rag_testing_tool.py --version
Expand All @@ -245,6 +245,7 @@ Arguments:
provider, indexation session's unique id, and 'k', i.e. nb
of retrieved docs (question and chat history are ignored,
as they will come from the dataset)
dataset_provider the dataset provider (langsmith or langfuse)
dataset_name the reference dataset name
test_name name of the test run
Expand All @@ -256,7 +257,7 @@ Options:
be silent but for errors)
```

Build a RAG (Lang)chain from the RAG Query and runs it against the provided LangSmith dataset. The chain is created anew for each entry of the dataset, and if a delay is provided each chain creation will be delayed accordingly.
Build a RAG (Lang)chain from the RAG Query and runs it against the provided LangSmith or LangSmith dataset. The chain is created anew for each entry of the dataset, and if a delay is provided each chain creation will be delayed accordingly.
### export_run_results.py

Export a LangSmith dataset run results, in csv format.
Expand All @@ -280,3 +281,27 @@ The exported CSV file will have these columns :
'Reference input'|'Reference output'|'Response 1'|'Sources 1'|...|'Response N'|'Sources N'
NB: There will be as many responses as run sessions
```

### export_run_results_langfuse.py

Export a LangFuse dataset run results, in csv format.

```
Usage:
export_run_results_langfuse.py [-v] <dataset_name> <runs_names>...
export_run_results_langfuse.py -h | --help
export_run_results_langfuse.py --version
Arguments:
dataset_name dataset id
runs_names list of session ids
Options:
-h --help Show this screen
--version Show version
-v Verbose output for debugging
The exported CSV file will have these columns :
'Reference input'|'Reference output'|'Response 1'|'Sources 1'|...|'Response N'|'Sources N'
NB: There will be as many responses as run sessions
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 3c14892

Please sign in to comment.