python[patch]: evaluators can return primitives #1203

baskaryan · 2024-11-11T18:35:27Z

from langsmith import evaluate

def foo(run, example):
    return 0

def bar(run, example):
    return "long"

# removed, needs to be list of dict
# def baz(run, example):
#     return [0, 0.2, "how are ya"]

def app(inputs):
    return inputs

evaluate(app, data="Sample Dataset 3", evaluators=[foo, bar, baz])

baskaryan · 2024-11-11T18:38:30Z

how ^ example gets logged

hinthornw · 2024-11-11T18:41:33Z

The list return type seems unexpected to me - I wouldn't expect it to make separate keys for each value

baskaryan · 2024-11-11T18:41:58Z

The list return type seems unexpected to me - I wouldn't expect it to make separate keys for each value

are we able to support list values as feedback?

hinthornw · 2024-11-11T18:43:55Z

The list return type seems unexpected to me - I wouldn't expect it to make separate keys for each value

are we able to support list values as feedback?

Only dict or string it seems.

Maybe doing the _ix suffix is preferable. I'm not sure honestly. Probalby less surprising than just logging duplicates to the same key.

It just messes with the experiment averages (we'd be averaging over index 1 and over index 2 )

baskaryan · 2024-11-11T18:46:56Z

The list return type seems unexpected to me - I wouldn't expect it to make separate keys for each value

are we able to support list values as feedback?

Only dict or string it seems.

Maybe doing the _ix suffix is preferable. I'm not sure honestly. Probalby less surprising than just logging duplicates to the same key.

It just messes with the experiment averages (we'd be averaging over index 1 and over index 2 )

feel less strongly about list behavior either way, think main use case is supporting int/float/bool/str

hinthornw · 2024-11-11T18:50:06Z

Maybe let's land with the numeric and string value support but hold off on list behavior?

Maybe add support for list[evaluationresultlike] to make the "results" key not necessary

jakerachleff · 2024-11-12T18:30:15Z

python/langsmith/evaluation/evaluator.py

        source_run_id: uuid.UUID,
    ) -> Union[EvaluationResult, EvaluationResults]:
-        if isinstance(result, EvaluationResult):
+        if isinstance(result, (bool, float, int)):
+            result = {"score": result}


So if I have four categories that are numbers for some reason (I'm classifying college class levels and they're 100 level, 200 level, 300 level, etc), then I should do str(value) to explicitly use categorical scores?

or return as {"value": 200} (something for us to clearly document)

jakerachleff · 2024-11-12T18:43:44Z

python/tests/unit_tests/evaluation/test_runner.py

+        ordering_of_stuff.append("evaluate")
+        return "good"
+
+    async def eval_list(run, example):


maybe good to confirm that a list of ints, for example, doesn't work?

hinthornw · 2024-11-12T18:53:09Z

python/langsmith/evaluation/evaluator.py

@@ -260,32 +259,46 @@ def _coerce_evaluation_results(
            cp = results.copy()
            cp["results"] = [
                self._coerce_evaluation_result(r, source_run_id=source_run_id)
-                for r in results["results"]
+                for i, r in enumerate(results["results"])


is i needed anymore?

agola11

I think it's worth holding off on the categorical dict case for now and just interpreting strings as categories. We need to think about the UX for allowing users to specify configuration for the feedback users are sending with their evaluators

rfc: evaluators can return primitives

388764f

baskaryan requested a review from hinthornw November 11, 2024 18:35

fmt

a18c139

baskaryan marked this pull request as ready for review November 11, 2024 19:55

baskaryan and others added 2 commits November 11, 2024 11:56

Merge branch 'main' into bagatur/rfc_eval_simple_returns

73c5d5b

undo

7ea4897

baskaryan changed the title ~~rfc: evaluators can return primitives~~ python[patch: evaluators can return primitives Nov 11, 2024

baskaryan changed the title ~~python[patch: evaluators can return primitives~~ python[patch]: evaluators can return primitives Nov 11, 2024

jakerachleff approved these changes Nov 12, 2024

View reviewed changes

hinthornw approved these changes Nov 12, 2024

View reviewed changes

agola11 approved these changes Nov 13, 2024

View reviewed changes

baskaryan added 2 commits November 14, 2024 09:17

Merge branch 'main' into bagatur/rfc_eval_simple_returns

9601d89

fmt

cf80a82

baskaryan merged commit d8adcde into main Nov 14, 2024
9 checks passed

baskaryan deleted the bagatur/rfc_eval_simple_returns branch November 14, 2024 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python[patch]: evaluators can return primitives #1203

python[patch]: evaluators can return primitives #1203

baskaryan commented Nov 11, 2024 •

edited

Loading

baskaryan commented Nov 11, 2024

hinthornw commented Nov 11, 2024

baskaryan commented Nov 11, 2024

hinthornw commented Nov 11, 2024 •

edited

Loading

baskaryan commented Nov 11, 2024

hinthornw commented Nov 11, 2024

jakerachleff Nov 12, 2024

baskaryan Nov 12, 2024 •

edited

Loading

jakerachleff Nov 12, 2024

hinthornw Nov 12, 2024

agola11 left a comment

python[patch]: evaluators can return primitives #1203

python[patch]: evaluators can return primitives #1203

Conversation

baskaryan commented Nov 11, 2024 • edited Loading

baskaryan commented Nov 11, 2024

hinthornw commented Nov 11, 2024

baskaryan commented Nov 11, 2024

hinthornw commented Nov 11, 2024 • edited Loading

baskaryan commented Nov 11, 2024

hinthornw commented Nov 11, 2024

jakerachleff Nov 12, 2024

Choose a reason for hiding this comment

baskaryan Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

jakerachleff Nov 12, 2024

Choose a reason for hiding this comment

hinthornw Nov 12, 2024

Choose a reason for hiding this comment

agola11 left a comment

Choose a reason for hiding this comment

baskaryan commented Nov 11, 2024 •

edited

Loading

hinthornw commented Nov 11, 2024 •

edited

Loading

baskaryan Nov 12, 2024 •

edited

Loading