Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add LLMEvaluator to JS #1186

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

add LLMEvaluator to JS #1186

wants to merge 20 commits into from

Conversation

isahers1
Copy link
Contributor

@isahers1 isahers1 commented Nov 7, 2024

No description provided.

@isahers1 isahers1 changed the title wip add LLMEvaluator to JS Nov 7, 2024
@isahers1 isahers1 marked this pull request as ready for review November 7, 2024 02:29
import * as uuid from "uuid";
import { EvaluationResult, EvaluationResults, RunEvaluator } from "./evaluator.js";
import type { Run, Example } from "../schemas.js";
export class CategoricalScoreConfig {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add good typedocs :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried my best, lmk what needs improving

js/src/evaluation/llm_evaluator.ts Outdated Show resolved Hide resolved
js/src/evaluation/llm_evaluator.ts Outdated Show resolved Hide resolved
properties.score = {
type: "string",
enum: scoreConfig.choices,
description: `The score for the evaluation, one of ${scoreConfig.choices.join(", ")}.`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really confusing to me tbh.

Why do we put categorical in the numeric score field? Why call it a score? Why not call it "category" or something obvious?

Score implies sortability / comparability to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the key to "value", but we also do this in the output parsing which I think solves this a little? not 100% sure, lmk what you think:

if ("choices" in this.scoreConfig) {
      const value = output.score;
      const explanation = output[this.reasoningKey];
      return {
        key: this.scoreConfig.key,
        value,
        comment: explanation,
        sourceRunId: output.runId
      };
    }

js/src/evaluation/llm_evaluator.ts Outdated Show resolved Hide resolved
js/package.json Outdated
@@ -106,14 +106,15 @@
"p-queue": "^6.6.2",
"p-retry": "4",
"semver": "^7.6.3",
"uuid": "^10.0.0"
"uuid": "^10.0.0",
"langchain": "^0.3.3",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't make langchain a dependency of langsmith - it would be cyclic and not all langsmith users use langchain.

I believe we'll have to set them as peerDependencies

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with Jacob to double check. Steps to handle this:

  1. Rm from dependencies
  2. Make the llm evaluator a separate entrypoint (see traceablea nd the create_entrypoints.js script)
  3. We should probably refrain from importing init chat model from the top since that requires langchain to be installed as well as the sub modules. Maybe accept a chat model as an argument and have them initialize, and in our docs we show using that. Core I think more OK

"openai": "^4.67.3",
"prettier": "^2.8.8",
"ts-jest": "^29.1.0",
"ts-node": "^10.9.1",
"typescript": "^5.4.5",
"zod": "^3.23.8"
"zod": "^3.23.8",
"@langchain/core": "^0.3.14",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to keep these sorted and avoid touching unnecessary lines

@@ -12,6 +12,7 @@ const entrypoints = {
traceable: "traceable",
evaluation: "evaluation/index",
"evaluation/langchain": "evaluation/langchain",
"evaluation/llm": "evaluation/llm_evaluator",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming should match filepath

So rename file below to evaluation/llm.ts

) {
try {
// Store the configuration
this.scoreConfig = scoreConfig;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More conventional to make this a constructor instead of a factory method

return variables;
}

private parseOutput(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer protected instead of private to make subclassing possible


constructor() {}

static async create(params: LLMEvaluatorParams): Promise<LLMEvaluator> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely shouldn't have two methods for this, ideally you just use the constructor

mapVariables?: (run: Run, example?: Example) => Record<string, any>;
}

export class LLMEvaluator implements RunEvaluator {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No docstring?


export class LLMEvaluator implements RunEvaluator {
prompt: any;
mapVariables?: (run: Run, example?: Example) => Record<string, any>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are mapVariables?

max: 1,
}),
chatModel: CHAT_MODEL,
mapVariables: (run: any, _example?: any) => ({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name for this we can use? mapVariables sounds like a map of variables or something and .map() has a specific meaning in JS (and most languages?)

interface LLMEvaluatorParams {
promptTemplate: string | [string, string][];
scoreConfig: ScoreConfig;
chatModel: BaseLanguageModel;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be BaseChatModel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants