python[patch]: accept simple evaluators #1200

baskaryan · 2024-11-09T01:39:39Z

can write evaluators like this:

from langsmith import evaluate

def simp(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    return {"results": [
        {"score": inputs == outputs, "key": 'identity'}, 
        {"score": outputs == reference_outputs, "key": "correct"}
    ]}

evaluate(
    (lambda x: x),
    data="Sample Dataset 3",
    evaluators=[simp],
)

example experiment: left-tray-86 https://dev.smith.langchain.com/public/e7782ea0-3de5-4352-8cd4-7b2cdbb03e4c/d

hinthornw

Nice i like the start will re-review this morning

hinthornw · 2024-11-11T14:56:19Z

python/langsmith/evaluation/evaluator.py

+        if not (
+            num_positional in (2, 3) or (num_positional <= 3 and has_positional_var)
+        ):
+            msg = ""


python/langsmith/evaluation/evaluator.py

Co-authored-by: William FH <[email protected]>

…i/langsmith-sdk into bagatur/rfc_simple_evaluator

jakerachleff · 2024-11-12T18:15:37Z

python/langsmith/evaluation/evaluator.py

@@ -632,3 +636,70 @@ def comparison_evaluator(
 ) -> DynamicComparisonRunEvaluator:
    """Create a comaprison evaluator from a function."""
    return DynamicComparisonRunEvaluator(func)
+
+
+def _normalize_evaluator_func(


might be nice to add like a couple unit tests on this to make it obvious it's working

jakerachleff

I think this makes sense, but would add some extra tests in to confirm it works properly

jakerachleff · 2024-11-12T18:28:18Z

python/langsmith/evaluation/evaluator.py

+]:
+    # for backwards compatibility, if args are untyped we assume they correspond to
+    # Run and Example:
+    if not (type_hints := get_type_hints(func)):


Do we want to add debug logs letting you know what function type is being used? Might be helpful since we tell people to enable debug logs for debugging issues in the SDK?

hinthornw · 2024-11-12T19:00:01Z

Do we want it to be like pytest where it's all by name?
run, example, inputs, predictions, reference

baskaryan · 2024-11-12T19:16:38Z

Do we want it to be like pytest where it's all by name? run, example, inputs, predictions, reference

yea i like that. for backwards compat can't enforce run/example but can enforce the others

agola11 · 2024-11-13T19:11:36Z

python/langsmith/evaluation/evaluator.py

+]:
+    # for backwards compatibility, if args are untyped we assume they correspond to
+    # Run and Example:
+    if not (type_hints := get_type_hints(func)):


shouldn't we check the number of args here? traditional evaluators have run and example whereas the simple evaluators take 3 args

agola11 · 2024-11-13T19:12:24Z

python/langsmith/evaluation/evaluator.py

+        if not (
+            num_positional in (2, 3) or (num_positional <= 3 and has_positional_var)
+        ):
+            msg = (
+                "Invalid evaluator function. Expected to take either 2 or 3 positional "
+                "arguments. Please see "
+                "https://docs.smith.langchain.com/evaluation/how_to_guides/evaluation/evaluate_llm_application#use-custom-evaluators"  # noqa: E501
+            )
+            raise ValueError(msg)


seems like this check on arg length should be moved up

hinthornw · 2024-11-14T21:48:02Z

python/langsmith/evaluation/evaluator.py

+        msg = (
+            f"Invalid evaluator function. Must have at least one positional "
+            f"argument. Supported positional arguments are {supported_args}. Please "
+            f"see https://docs.smith.langchain.com/evaluation/how_to_guides/evaluation/evaluate_llm_application#use-custom-evaluators"


This link feels like it's not gonna have a long shelf-life

ye im updating it as we speak, but there will be redirects

hinthornw · 2024-11-14T22:27:27Z

python/langsmith/evaluation/evaluator.py

+        if p.kind in (p.POSITIONAL_OR_KEYWORD, p.POSITIONAL_ONLY)
+        and p.default is p.empty
+    ]
+    if not positional_no_default or (


ooc, why do we require at least one positional one?

we only pass in the supported args as positional args, so equivalent to enforcing that there's at least one supported arg

hinthornw · 2024-11-14T22:29:23Z

python/langsmith/evaluation/evaluator.py

+                    "outputs": run.outputs or {},
+                    "reference_outputs": example.outputs or {},
+                }
+                args = (arg_map[arg] for arg in positional_no_default)


If I put a default in an arg this silently never provides the matching value. Would either want to validate ahead of time that no default is provided (preferred) or pass it in anyway (think not preferred)

hinthornw · 2024-11-14T22:30:28Z

python/langsmith/evaluation/_runner.py

@@ -87,6 +87,7 @@
        [schemas.Run, Optional[schemas.Example]],
        Union[EvaluationResult, EvaluationResults],
    ],
+    Callable[..., Union[dict, EvaluationResults, EvaluationResult]],


Could we update the docstring for evaluate() and aevaluate() to have examples or link to a docs page that shows the valid arguments?

hinthornw · 2024-11-14T22:36:12Z

python/langsmith/evaluation/evaluator.py

+    ):
+        msg = (
+            f"Invalid evaluator function. Must have at least one positional "
+            f"argument. Supported positional arguments are {supported_args}. Please "


Could we include a description of what each argument is here?

feels like ppl should just look at api ref for that?

rfc: accept simple evaluators

7e901ad

baskaryan requested a review from hinthornw November 9, 2024 01:39

hinthornw reviewed Nov 11, 2024

View reviewed changes

baskaryan and others added 4 commits November 11, 2024 15:08

Merge branch 'main' into bagatur/rfc_simple_evaluator

3877b36

Update python/langsmith/evaluation/evaluator.py

9b26f6b

Co-authored-by: William FH <[email protected]>

Merge branch 'bagatur/rfc_simple_evaluator' of github.com:langchain-a…

4a24f9f

…i/langsmith-sdk into bagatur/rfc_simple_evaluator

fmt

b3b841f

baskaryan marked this pull request as ready for review November 12, 2024 01:52

baskaryan changed the title ~~rfc: accept simple evaluators~~ python[patch]: accept simple evaluators Nov 12, 2024

baskaryan requested a review from hinthornw November 12, 2024 14:55

jakerachleff reviewed Nov 12, 2024

View reviewed changes

jakerachleff approved these changes Nov 12, 2024

View reviewed changes

jakerachleff reviewed Nov 12, 2024

View reviewed changes

agola11 reviewed Nov 13, 2024

View reviewed changes

baskaryan added 4 commits November 14, 2024 09:54

Merge branch 'main' into bagatur/rfc_simple_evaluator

8604117

cr

03506c5

merge

a4d2b03

fmt

448bbf3

hinthornw reviewed Nov 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python[patch]: accept simple evaluators #1200

python[patch]: accept simple evaluators #1200

baskaryan commented Nov 9, 2024

hinthornw left a comment

hinthornw Nov 11, 2024

jakerachleff Nov 12, 2024

jakerachleff left a comment

jakerachleff Nov 12, 2024

hinthornw commented Nov 12, 2024

baskaryan commented Nov 12, 2024

agola11 Nov 13, 2024

agola11 Nov 13, 2024

hinthornw Nov 14, 2024

baskaryan Nov 14, 2024

hinthornw Nov 14, 2024

baskaryan Nov 14, 2024

hinthornw Nov 14, 2024

hinthornw Nov 14, 2024

hinthornw Nov 14, 2024

baskaryan Nov 14, 2024

python[patch]: accept simple evaluators #1200

Are you sure you want to change the base?

python[patch]: accept simple evaluators #1200

Conversation

baskaryan commented Nov 9, 2024

hinthornw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakerachleff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hinthornw commented Nov 12, 2024

baskaryan commented Nov 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment