Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Add support for asymmetric embedding models to neural-search #620

Open
br3no opened this issue Feb 29, 2024 · 13 comments
Open

[PROPOSAL] Add support for asymmetric embedding models to neural-search #620

br3no opened this issue Feb 29, 2024 · 13 comments
Assignees
Labels
Features Introduces a new unit of functionality that satisfies a requirement

Comments

@br3no
Copy link

br3no commented Feb 29, 2024

What/Why

What are you proposing?

With opensearch-project/ml-commons#1799 OpenSearch can now use asymmetric embedding models such as e5. Asymmetric embedding models differ from symmetric ones by embedding queries and passages differently. Many of them require the text to be prefixed by special strings (e.g. query: and passage: for the e5 model family). The aforementioned issue documents the work that led to ml-commons supporting serving these kinds of model. There, new model configurations have been added, a special new AsymmetricTextEmbeddingParameters type was introduced and the correct handling of inference for these models has been implemented. Here I want to propose to add support for this in the neural-search plugin.

What users have asked for this feature?

I'm a user and would like to have this feature. 😉
Many of the best performing embedding models are asymmetric ones (cf. MTEB board https://huggingface.co/spaces/mteb/leaderboard). For OpenSearch it is important to keep up with the latest developments in information retrieval research and give users the freedom to choose as many different models as possible.

What problems are you trying to solve?

Neural-search should be able to use the newly introduced capability in ml-commons.

What is the developer experience going to be?

My proposal will not affect any API. The idea is to have neural-search automatically detect that a particular model being used is asymmetric (it is possible to infer this information from the model configuration) and then call the inference method with the right parameters. In the embedding processor, we need to add the information that we are embedding a passage; in the query builder, that we are embedding a query. That's it.

Are there any security considerations?

No.

Are there any breaking changes to the API

No.

What is the user experience going to be?

The user experience will not change in any way.

Are there breaking changes to the User Experience?

No.

Why should it be built? Any reason not to?

Not building this reduces the attractiveness of OS for dense retrieval. Other vector DBs already support these models.

What will it take to execute?

The ground work has been done in ml-commons, here we only need to build on top of it. I will open a PR soon on which we can base the discussion.

Any remaining open questions?

I can't think of any.

@model-collapse
Copy link
Collaborator

model-collapse commented Mar 4, 2024

Seems you are trying to add prompts for text embedding right? Here is my thought.
1, It is tough to deploy GPT based LLM inside opensearch cluster since we don't support model parallel mode to deploy LLM across GPUs. Therefore we strongly suggest to deploy e5-Mistral-7b using a 3rdparty model inference service platform such as TorchServe, Triton or AWS Sagemaker.
2, With remote connector features, we create a ml-commons model connecting to the endpoint of model inference.
3, The remote connector can be configured with request body template, where you can put the prompt in. Please refer to this doc: https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/index/

@model-collapse
Copy link
Collaborator

@br3no does my reply solve your problem? If yes, this issue will be closed.

@br3no
Copy link
Author

br3no commented Mar 13, 2024

@model-collapse, it were busy days, I didn’t have time to go through your reply yet. Please don’t close the issue, I’ll get to it soon.

@vamshin vamshin added Features Introduces a new unit of functionality that satisfies a requirement and removed untriaged labels Mar 21, 2024
@reuschling
Copy link

I also vote supporting embedding models that need to fill a template in order to create meaningful responses. Especially e5-multilingual is one of the current rare multilingual embedding models available.
Where making direct requests to these models - which are hosted locally which is crucial - seems to be supported now, for e.g. hybrid search support inside neural search is missing, which is a real blocker currently.

@br3no
Copy link
Author

br3no commented Apr 12, 2024

@model-collapse my use-case is the same as @reuschling's. I'm aware that running larger models in OpenSearch might not be the best approach, but there are plenty of small asymmetric models that could be interesting (cf. https://huggingface.co/intfloat/multilingual-e5-small).

@reuschling
Copy link

Are there any new thoughts on it?
Currently it is possible to load these embedding models into OpenSearch, using them with _predict. It is simply not possible to use them in neural search / ingest and search pipelines in order to implement a document search.

@br3no
Copy link
Author

br3no commented Apr 19, 2024

I have a PR ready that will allow you to use the neural search plugin with an e5 family model. As soon as opensearch-project/ml-commons#2318 gets merged I will open it here.

@model-collapse maybe you could have a look at the issue above. There’s just one approval missing.

@br3no
Copy link
Author

br3no commented Apr 25, 2024

@reuschling @model-collapse I have opened the PR (#710) mentioned above.

@model-collapse
Copy link
Collaborator

@br3no @reuschling if possible, let's book a video meeting to talk about this. You can reach me in the public slack channel. The name is Charlie Yang.

@yuye-aws
Copy link
Member

@br3no You can use the script processor in OpenSearch: https://opensearch.org/docs/latest/ingest-pipelines/processors/script/. Here is a processor which adds two strings.

{
  "description": "Concatenate two fields",
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": "ctx.concatenated_field = ctx.field1 + ' ' + ctx.field2"
      }
    }
  ]
}

@zhichao-aws
Copy link
Member

Hi @br3no , have you considered using search templates for query? We can create a search template which concats the query prompt, and we do query using this template

@reuschling
Copy link

reuschling commented Oct 16, 2024

I have the impression that we have to refresh the understanding what's the issue with this proposal and the according PR #710 . In December 2023, @br3no creates the issue Support for asymmetric embedding models in ml-commons, the according PR asymmetric embeddings was merged in February 2024, which enables to add a situation-aware prefix specification inside a model configuration at model registration time.

The two supported 'situations' are query and passage, because asymmetric embedding models are trained to create different embeddings for the query situation and the doc/chunk indexing situation (passage). To specify the current situation, it is possible now to set it as "content_type" : "query" inside the parameters section when calling the model. This currently works if you do it manually with the _predict API, see an example here.

But you almost never use this manually - in OpenSearch you index/ingest documents where you add embeddings with an ingest processor - where you have to specify the situation "passage". Rename it to 'index' if you want. The ingest processor have to do it, because he is the caller to the model. Only he can set this simple parameter, that doesn't influence anything else (it is just calling an existing java method AsymmetricTextEmbeddingParameters.builder().embeddingContentType(EmbeddingContentType.PASSAGE).build()). There is no other possibility to modify this call to the model. This is done in the PR #710 for this proposal, in the TextEmbeddingProcessor.

Same is for the situation "query". It has to be specified when you make a neural query call, the PR #710 sets this inside the NeuralQueryBuilder.

The functionality is available in ml-commons models, it is just about these ~two lines of code that it can be used in OpenSearch searching and indexing also.

@model-collapse
Copy link
Collaborator

model-collapse commented Oct 19, 2024

@reuschling thanks for this clarification! We didn't aware the fundamentals are already merged in ml-commons. Although we have some concern on this design but the consistency between the repo ml-commons and neural-search is the primary value. We are ok to move forward and will start code review ASAP. cc: @br3no
Later, we might raise another issue to propose a more generic solution. Comments will be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement
Projects
None yet
Development

No branches or pull requests

6 participants