Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DJL support for embedding models using sentence-transformers #2755

Closed
pchamart opened this issue Aug 17, 2023 · 5 comments
Closed

DJL support for embedding models using sentence-transformers #2755

pchamart opened this issue Aug 17, 2023 · 5 comments

Comments

@pchamart
Copy link

pchamart commented Aug 17, 2023

Hi,

Per this doc seems like only the below tasks are supported.

image

Any plans on including feature-extraction task as well in the future?

I'd be great if we can use text embedding models (both bi and cross encoders) from huggingface for e.g

  • sentence-transformers/all-MiniLM-L6-v2 (bi-encoder) and
  • cross-encoders/mmarco-mMiniLMv2-L12-H384-v1 (cross-encoder).

Thanks!

@frankfliu
Copy link
Contributor

@pchamart What you need is text-embedding task

@frankfliu
Copy link
Contributor

@pchamart
Copy link
Author

pchamart commented Aug 17, 2023

Thanks @frankfliu

Using DJL DLC on Amazon SageMaker for inference - in the serving.properties if we specify translatorFactory=TextEmbedding would that suffice.

engine=MPI
translatorFactory=TextEmbedding
option.model_id=sentence-transformers/all-MiniLM-L6-v2
option.trust_remote_code=true
option.tensor_parallel_degree=1
...

Also, can you please confirm if you plan to support cross-encoders as well in the future.
for e.g. cross-encoders/mmarco-mMiniLMv2-L12-H384-v1 (cross-encoder).

@frankfliu
Copy link
Contributor

@pchamart

We do have this model in our model zoo, but this is a text-classification model not for TextEmbedding.

If you want to serve sentence-transformers/all-MiniLM-L6-v2 model, we really recommend you to Java engine instead of using python engine, it at least 2X the throughput.

And you can use CPU container for this model as well: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#djl-cpu-full-inference-containers

image_uri=f"763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0.23.0-cpu-full"
env = {
    "SERVING_LOAD_MODELS":
    "djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2"
}
endpoint_name = sagemaker.utils.name_from_base("textembedding")

model = Model(
    image_uri=image_uri,
    env=env,
    role=role,
)

# deploy the endpoint endpoint
model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    endpoint_name=endpoint_name,
)

@frankfliu
Copy link
Contributor

feel free to reopen this issue if you still have questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants