Enable messages api #581

radames · 2024-03-26T05:09:31Z

Address #574: The idea is to utilize the endpoint as the base client, requiring users to specify the model name.

I had to replace model with endpointUrl because model is a required parameter for all APIs, including TGI.
Types are still incomplete, but choices?: Choice[]; is included in the StreamResponse when using messageAPI, maybe we can have a way to differentiate the input.
For custom streams endpoints, options are not needed and often throw a backend error, so it's preferable to exclude them on streamRequest

Considering the requirement for the model parameter, we might not need TaskWithNoAccessTokenNoModel WDYT?

HuggingFaceDocBuilderDev · 2024-03-26T08:50:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

packages/inference/src/lib/makeRequestOptions.ts

julien-c

be sure to tag @SBrandeis and @Wauplin for a review when this is ready to review (given we'd use the generated types)

radames · 2024-03-26T19:16:26Z

interesting, point @julien-c, reading @Wauplin implementation here huggingface/huggingface_hub#2094 it's very complete! , in that regard, it makes sense to also support similar API with the js client in another PR, where we can use @xenova jinja + hub , to build the chat_template in case the user wants to use chat completion with a simple text generation model

packages/inference/src/tasks/nlp/textGeneration.ts

packages/inference/test/HfInference.spec.ts

Wauplin · 2024-03-27T08:50:30Z

interesting, point @julien-c, reading @Wauplin implementation here huggingface/huggingface_hub#2094 it's very complete! , in that regard, it makes sense to also support similar API with the js client in another PR, where we can use @xenova jinja + hub , to build the chat_template in case the user wants to use chat completion with a simple text generation model

@radames I agree focus should be put on handling the /v1/chat/completions route (i.e. server-side template rendering). The solution with jinja+hub is nice but in the end, only useful for a very few models AFAIK. Typically the microsoft/DialoGPT-*** family because all major chat-based LLMs are already supported and served with TGI. If I had to start over the PR in huggingface_hub I wouldn't start with the client-side rendering 😬

EDIT: especially the part where I try to handle Inference Endpoints URLs for which I don't have the model_id / chat_template. It's quite complex logic -hence the figure- for very little impact IMO, retrospectively 😕

Wauplin

I do think it'd be best to separate completely chatCompletion from the textGeneration method. Both are generating text but their API is very different. In the new spec-ed types we differenciated them (see chat completion here, and text generation here).

In particular parameters/options are not sent on the same level. For all HF tasks, we have a parameters key that is a mapping with all parameters. For chat completion, we wanted to mimic OpenAI's API which sets all the options at the root level. Also, output types are completely different and we don't benefit from combining them IMO.

Also, chat completion URL is not the same for models served on serverless Inference API. Usually, the url is https://api-inference.huggingface.co/models/{model_id}. For chat-completion, it's https://api-inference.huggingface.co/models/{model_id}/v1/chat/completions. In huggingface_hub, this is handled here. This rule also applies for TGI-served models where / serves text-generation API while /v1/chat/completions serves chat-completion API.

packages/inference/test/HfInference.spec.ts

@Wauplin

following @Wauplin's review

julien-c

thanks for the awesome review + help + feedback from the python implem, @Wauplin

radames · 2024-03-30T17:40:51Z

Hi @Wauplin, thank you for the feedback. I think separating chatCompletion and textGeneration is a great idea!
Also, @coyotte508, is it the plan to remove all types from the inference package and unify by using types from tasks? it's already possible to do this
import type { TextGenerationInput, TextGenerationStreamOutput } from "@huggingface/tasks";

julien-c · 2024-04-02T07:44:17Z

Also, @coyotte508, is it the plan to remove all types from the inference package and unify by using types from tasks?

indeed

coyotte508 · 2024-04-02T09:31:36Z

Yes but we need to handle #584 first (and also using the types for validation would be great)

radames · 2024-04-02T17:19:01Z

shall we wait then to split this into chatCompletion and textGeneration ?

coyotte508 · 2024-04-02T17:40:18Z

yes it should definitely be split.

Maybe make a separate PR / start from scratch? Maybe we should pass model along only for chatCompletion as well? (still we can move from model to endpointUrl)

gary149 · 2024-04-29T14:51:19Z

Any news on this? that is quite needed imo

radames · 2024-04-29T15:03:33Z

Any news on this? that is quite needed imo

hi @gary149 I'll do a new PR today, this one will be throw away in favor of a split creating a new chatCompletion method that will be cleaner than using it all with textGeneration , as well as matching the python api

radames · 2024-05-01T04:38:18Z

close in favor of #645

@Wauplin

Supersede #581. Thanks to @Wauplin, I can import the types from "@huggingface/tasks" I've followed the pattern for `textGeneration` and `textGenerationStream`. --------- Co-authored-by: coyotte508 <[email protected]> Co-authored-by: Julien Chaumond <[email protected]>

radames added 8 commits March 25, 2024 16:31

make it compatible with OpenAI messages API

c3dfe7c

fix lint

d0971c2

bring back check

2185cf9

remove log

07dc20d

fix args

d4a7679

fix typo

03c91d5

streamingRequest

760da88

remove log

0fc90db

radames requested a review from coyotte508 March 26, 2024 05:09

radames requested a review from vvmnnnkv as a code owner March 26, 2024 05:09

coyotte508 added 2 commits March 26, 2024 09:45

🩹 Fixes

d1027b4

🩹 Do not send model if its url

4053532

coyotte508 reviewed Mar 26, 2024

View reviewed changes

packages/inference/src/lib/makeRequestOptions.ts Show resolved Hide resolved

julien-c reviewed Mar 26, 2024

View reviewed changes

♻️ Better way of sending options optionnally

71d1dee

another test

ff66687

radames commented Mar 26, 2024

View reviewed changes

packages/inference/src/tasks/nlp/textGeneration.ts Outdated Show resolved Hide resolved

packages/inference/test/HfInference.spec.ts Outdated Show resolved Hide resolved

radames and others added 3 commits March 27, 2024 11:48

Merge branch 'main' into enable-messages-api

0d0a3d4

🏷️ Fix typing for textGeneration?

5b06f7c

✅ Fix tests

24af762

coyotte508 previously approved these changes Mar 28, 2024

View reviewed changes

coyotte508 requested review from Wauplin and SBrandeis March 28, 2024 23:22

Wauplin reviewed Mar 29, 2024

View reviewed changes

packages/inference/test/HfInference.spec.ts Outdated Show resolved Hide resolved

julien-c reviewed Mar 29, 2024

View reviewed changes

Merge branch 'main' into enable-messages-api

65a3f84

radames added 2 commits March 30, 2024 10:25

move to root level

b755faf

move to root level

eec9f25

Merge branch 'main' into enable-messages-api

a97d7f7

radames mentioned this pull request May 1, 2024

Add chat completion method #645

Merged

radames closed this May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable messages api #581

Enable messages api #581

radames commented Mar 26, 2024

HuggingFaceDocBuilderDev commented Mar 26, 2024

julien-c left a comment

radames commented Mar 26, 2024

Wauplin commented Mar 27, 2024 •

edited

Loading

Wauplin left a comment

julien-c left a comment

radames commented Mar 30, 2024 •

edited

Loading

julien-c commented Apr 2, 2024

coyotte508 commented Apr 2, 2024

radames commented Apr 2, 2024

coyotte508 commented Apr 2, 2024

gary149 commented Apr 29, 2024

radames commented Apr 29, 2024

radames commented May 1, 2024

Enable messages api #581

Enable messages api #581

Conversation

radames commented Mar 26, 2024

HuggingFaceDocBuilderDev commented Mar 26, 2024

julien-c left a comment

Choose a reason for hiding this comment

radames commented Mar 26, 2024

Wauplin commented Mar 27, 2024 • edited Loading

Wauplin left a comment

Choose a reason for hiding this comment

julien-c left a comment

Choose a reason for hiding this comment

radames commented Mar 30, 2024 • edited Loading

julien-c commented Apr 2, 2024

coyotte508 commented Apr 2, 2024

radames commented Apr 2, 2024

coyotte508 commented Apr 2, 2024

gary149 commented Apr 29, 2024

radames commented Apr 29, 2024

radames commented May 1, 2024

Wauplin commented Mar 27, 2024 •

edited

Loading

radames commented Mar 30, 2024 •

edited

Loading