Add precheck scoring functionality #356

Gregory-Pereira · 2024-05-16T23:16:46Z

draft implementation of scoring answers based on LAB: LARGE-SCALE ALIGNMENT FOR CHATBOTS.
/cc @mingxzhao @nerdalert @vishnoianil

Signed-off-by: greg pereira <[email protected]>

lhawthorn · 2024-05-17T09:42:10Z

@Gregory-Pereira Isn't BAM an IBM internal only service? If so, I am concerned that you may be leaking company confidential details in this PR.

I would also note that separating out founding participant corporate matters from open source project community operations is quite difficult when the a community is very young - InstructLab is our 13 week old puppy as of next Monday, May 20, 2024. However, it is not best practice nor highly awesome for IBM or Red Hat - or any company for that matter - to mention internal only matters as part of our operations in this open source community space.

We InstructLab participants was to be both highly awesome and excellent stewards of this community so everyone feels welcomed and encouraged to join us as participants and leaders.

Perhaps we could have this discussion elsewhere or reconsider how we are framing the PR text in the future.

Yours in kindness, LH

Gregory-Pereira · 2024-05-17T14:18:41Z

Apologies @lhawthorn, still getting used to this new paradigm. Thank you for your kind and swift response. Hopefully not much is impacted, as I only referenced the service name and did not provide any details about what it is or how it works or how to use it.

russellb · 2024-05-17T19:11:52Z

I'm not sure what was here originally, but it's no secret that this bot uses model endpoint APIs hosted on IBM infrastructure.

Signed-off-by: greg pereira <[email protected]>

this change allows us to use different model names for the precheckEndpoint and precheckScoringEndpoint Signed-off-by: greg pereira <[email protected]>

Signed-off-by: greg pereira <[email protected]>

Gregory-Pereira · 2024-05-18T01:33:04Z

RFR cc @nerdalert @vishnoianil

update scoring prompt Signed-off-by: Mingxuan Zhao <[email protected]>

Signed-off-by: Mingxuan Zhao <[email protected]>

vishnoianil · 2024-05-18T03:31:56Z

I am wondering if we should expose precheck-score command to the user? I am assuming that both precheck and precheck scoring endpoints both are separate endpoints to hit. @Gregory-Pereira @mingxzhao

mingxzhao · 2024-05-18T03:38:04Z

I believe the end goal is to tie the precheck score directly to the original precheck. With the current scoring process it is still more expensive than precheck so if precheck is not open to users precheck score likely wouldn't be.

mingxzhao · 2024-05-18T03:38:45Z

Theoretically precheck and precheck score hit endpoints that are equally expensive

Gregory-Pereira · 2024-05-18T15:21:40Z

I agree with Ming's stance on this. A middle ground that I would find acceptable (putting aside the implementation details for now), would be allowing users to select an individual human answer - endpoint answer pairing and score that. It feels a bit heavy handed to give a user access to running the batch run of all seed data scoring.

Signed-off-by: greg pereira <[email protected]>

Swap question target location Signed-off-by: Mingxuan Zhao <[email protected]>

typo Signed-off-by: Mingxuan Zhao <[email protected]>

prompt update Signed-off-by: Mingxuan Zhao <[email protected]>

Signed-off-by: Mingxuan Zhao <[email protected]>

lint error Signed-off-by: Mingxuan Zhao <[email protected]>

Gregory-Pereira · 2024-05-19T15:14:56Z

Current status for provenance: Functionality there. At this point were prompt engineering for performance.

vishnoianil

Overall it looks good to me. But I think I am bit skeptical about the overall user experience, so let's narrow that down. For example, "@instructlab-bot precheck" will give you scoring in the yaml output if the "scoring" endpoint is set, otherwise it won't. It's two different output for the same "precheck" command based on the different backend configuration, which the user might not be aware of (because it's the backend configuration that we admin configure). So It seems like we are kind of intermingling two separate stuff here.

Although we can always set the precheck-endpoing and scoring-endpoint both, but in this case it will always give you scoring. which sounds bit better, but if that's the case, than i think we should just make that assumption that precheck will always give a score and make sure that precheck-endpoing and scoring-endpoint both must be configured for the backend.

Other option is to add a new "@instructlab-bot precheck-score" command, which gives user scoring as well. I think both of these options probably is less confusing options. wdyt? happy to hear any other ideas as well.

ui/apiserver/apiserver.go

vishnoianil · 2024-05-20T17:11:58Z

worker/cmd/generate.go

@@ -118,6 +121,7 @@ func init() {
 	generateCmd.Flags().StringVarP(&WorkDir, "work-dir", "w", "", "Directory to work in")
 	generateCmd.Flags().StringVarP(&VenvDir, "venv-dir", "v", "", "The virtual environment directory")
 	generateCmd.Flags().StringVarP(&PreCheckEndpointURL, "precheck-endpoint-url", "e", "http://localhost:8000/v1", "Endpoint hosting the model API. Default, it assumes the model is served locally.")
+	generateCmd.Flags().StringVarP(&PreCheckScoringEndpointURL, "precheck-scoring-endpoint-url", "", PreCheckEndpointURL, "Endpoint hosting the model API that will be scoring the output of precheck against the answers supplied in the PR. Default, it assumes the model is the same as precheck model and is served locally.")


question for my own understanding- if user doesn't provide the ScoringEndpoint, and it endup using the default PrecheckEndpoint, who does the scoring ?

You nailed it. A model will do the scoring, whether or not the same model as precheck or a different one. Note, by scoring we do not mean anything used in training, just a summary of if the model was close to the human answer for triager conenience.

vishnoianil · 2024-05-20T17:15:55Z

worker/cmd/generate.go

+	for i := 0; i < len(precheckPRAnswers); i++ {
+		err, promptTemplate := generatePrecheckScoringPrompt(precheckPRAnswers[i], precheckEndpointAnswers[i], precheckPRQuestions[i])
+		if err != nil {
+			w.logger.Errorf("Failed to generate a prompt for precheck scorring: %v", err)


scorring -> scoring

vishnoianil · 2024-05-20T17:26:57Z

worker/cmd/generate.go

@@ -378,6 +472,12 @@ func (w *Worker) runPrecheck(lab, outputDir, modelName string) error {
 				w.logger.Error("Question not found or not a string")
 				continue
 			}
+			answer, ok := example["answer"].(string)
+			if !ok {
+				w.logger.Error("Question not found or not a string")


need to fix the log ?

vishnoianil · 2024-05-20T17:28:42Z

worker/cmd/generate.go

@@ -450,7 +553,8 @@ func (w *Worker) runPrecheck(lab, outputDir, modelName string) error {
 			time.Sleep(1 * time.Second)
 		}
 	}
-	return nil
+	return nil, precheckPRAnswers, precheckEndpointAnswers, precheckPRQuestions
+	// return nil, precheckPRAnswers, precheckPRQuestions


Suggested change

// return nil, precheckPRAnswers, precheckPRQuestions

// return nil, precheckPRAnswers, precheckPRQuestions

vishnoianil · 2024-05-20T18:14:04Z

I agree with Ming's stance on this. A middle ground that I would find acceptable (putting aside the implementation details for now), would be allowing users to select an individual human answer - endpoint answer pairing and score that. It feels a bit heavy handed to give a user access to running the batch run of all seed data scoring.

i understand the cost aspect. But I think the user experience is bit confusing the way currently it's implemented. I added some more details about it in the comment above. Now the question about single pair vs batch scoring, i think we just need to see how user can actually do that from github comment. Overall the intention is to keep end to end gobot UX simple and less confusing.

Signed-off-by: Mingxuan Zhao <[email protected]>

Gregory-Pereira added 2 commits May 16, 2024 16:15

WIP: add precheck scoring functionality

36c71ae

Signed-off-by: greg pereira <[email protected]>

gnerate prompt template and using input + answers vs 2 answer comparison

03f05a3

Signed-off-by: greg pereira <[email protected]>

nerdalert requested a review from mingxzhao May 17, 2024 19:46

comparing using both answers

5d8d28a

Signed-off-by: greg pereira <[email protected]>

Gregory-Pereira mentioned this pull request May 18, 2024

Automate the 'precheck' validation step using semantic similarity scores #354

Open

rework fetchModelName to work by endpoint

e20843c

this change allows us to use different model names for the precheckEndpoint and precheckScoringEndpoint Signed-off-by: greg pereira <[email protected]>

Gregory-Pereira force-pushed the allow-precheck-scoring-endpoint branch from ca19ae6 to e20843c Compare May 18, 2024 01:15

removing generatePrecheckScoringPrompt test

4a4a95f

Signed-off-by: greg pereira <[email protected]>

Gregory-Pereira marked this pull request as ready for review May 18, 2024 01:32

mingxzhao added 2 commits May 17, 2024 21:29

Update templates.go

7b41dad

update scoring prompt Signed-off-by: Mingxuan Zhao <[email protected]>

Update templates.go

acaa095

Signed-off-by: Mingxuan Zhao <[email protected]>

Gregory-Pereira added 2 commits May 18, 2024 17:18

actually use the precheck scoring endpoint :sweat-smile:

2a53e47

Signed-off-by: greg pereira <[email protected]>

write directly to output dir, no need for chat dir bc data in memory

ac9db5c

Signed-off-by: greg pereira <[email protected]>

Gregory-Pereira force-pushed the allow-precheck-scoring-endpoint branch from 2506bfa to ac9db5c Compare May 19, 2024 00:50

mingxzhao added 5 commits May 18, 2024 19:02

Update templates.go

ac90322

Swap question target location Signed-off-by: Mingxuan Zhao <[email protected]>

Update templates.go

b712bef

typo Signed-off-by: Mingxuan Zhao <[email protected]>

Update templates.go

b939aab

prompt update Signed-off-by: Mingxuan Zhao <[email protected]>

Update generate.go

3eda593

Signed-off-by: Mingxuan Zhao <[email protected]>

Update templates.go

01a2003

lint error Signed-off-by: Mingxuan Zhao <[email protected]>

Gregory-Pereira changed the title ~~WIP: add precheck scoring functionality~~ Add precheck scoring functionality May 19, 2024

vishnoianil requested changes May 20, 2024

View reviewed changes

mingxzhao added 3 commits May 20, 2024 13:44

Update templates.go

9faef62

Signed-off-by: Mingxuan Zhao <[email protected]>

Update templates.go

619e4a3

Signed-off-by: Mingxuan Zhao <[email protected]>

Update templates.go

5f51dc3

Signed-off-by: Mingxuan Zhao <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add precheck scoring functionality #356

Add precheck scoring functionality #356

Gregory-Pereira commented May 16, 2024 •

edited

Loading

lhawthorn commented May 17, 2024

Gregory-Pereira commented May 17, 2024

russellb commented May 17, 2024

Gregory-Pereira commented May 18, 2024

vishnoianil commented May 18, 2024 •

edited

Loading

mingxzhao commented May 18, 2024

mingxzhao commented May 18, 2024

Gregory-Pereira commented May 18, 2024

Gregory-Pereira commented May 19, 2024

vishnoianil left a comment

vishnoianil May 20, 2024

Gregory-Pereira May 20, 2024

vishnoianil May 20, 2024

vishnoianil May 20, 2024

vishnoianil May 20, 2024

vishnoianil commented May 20, 2024 •

edited

Loading

	// return nil, precheckPRAnswers, precheckPRQuestions
	// return nil, precheckPRAnswers, precheckPRQuestions

Add precheck scoring functionality #356

Are you sure you want to change the base?

Add precheck scoring functionality #356

Conversation

Gregory-Pereira commented May 16, 2024 • edited Loading

lhawthorn commented May 17, 2024

Gregory-Pereira commented May 17, 2024

russellb commented May 17, 2024

Gregory-Pereira commented May 18, 2024

vishnoianil commented May 18, 2024 • edited Loading

mingxzhao commented May 18, 2024

mingxzhao commented May 18, 2024

Gregory-Pereira commented May 18, 2024

Gregory-Pereira commented May 19, 2024

vishnoianil left a comment

Choose a reason for hiding this comment

vishnoianil May 20, 2024

Choose a reason for hiding this comment

Gregory-Pereira May 20, 2024

Choose a reason for hiding this comment

vishnoianil May 20, 2024

Choose a reason for hiding this comment

vishnoianil May 20, 2024

Choose a reason for hiding this comment

vishnoianil May 20, 2024

Choose a reason for hiding this comment

vishnoianil commented May 20, 2024 • edited Loading

Gregory-Pereira commented May 16, 2024 •

edited

Loading

vishnoianil commented May 18, 2024 •

edited

Loading

vishnoianil commented May 20, 2024 •

edited

Loading