Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically select pod instead of statically passing Pods/PodIPs into ext proc #12

Open
Xunzhuo opened this issue Sep 29, 2024 · 2 comments

Comments

@Xunzhuo
Copy link
Member

Xunzhuo commented Sep 29, 2024

Currently we pass the static pod names and IPs to ext proc server, we should use more dynamic approach to fetch the data from kubernetes cluster like by selectors.

@Joffref
Copy link

Joffref commented Sep 29, 2024

In the near future, a new CR, tentatively called BackendPool, will be introduced.

**BackendPool**
```golang
// The BackendPool is a construct for pooling compute (often model servers) to
// serve large models, that have the ability to share capacity across multiple
// use cases (such as through prompt engineering, LoRA adapters, etc).
// BackendPools have a dependency on a Gateway that is compatible with ext-proc
// (External Processing). When a new BP object is created, a new ext proc
// deployment is created. BackendPools require at minimum a single UseCase to
// be subscribed to them to accept traffic, any traffic with a model not
// definied within a UseCase will be rejected.
type BackendPool struct {
metav1.ObjectMeta
metav1.TypeMeta
Spec BackendPoolSpec
}
type BackendPoolSpec struct {
// Select the distinct services to include in the backend pool. These
// services should be consumed by only the backendpool they are part
// of. Should this behavior be breached, routing behavior is not
// guaranteed.
ServiceRef []corev1.ObjectReference
}
```

This resource will reference the services exposing inference servers that share certain characteristics, mainly the same set of loaded adapters. The final name of this resource is still being discussed, and you can review the related documents for more information: https://docs.google.com/document/d/1v1Rp6v_AfY5EfwpLqDadDpAaCg7OcnrUutzBUNxGoJE/edit?pli=1

Once introduced, BackendPool will be referenced in the backendRefs field of HTTPRoute to manage routing.

The idea of referencing services directly could be re-evaluated, given that using selectors was the original approach. However, directly referencing pods would involve managing a structure similar to EndpointSlices within the gateway. IMO, this adds unnecessary complexity and introduces potential security risks.

Let me know your thoughts on this—I’d be happy to discuss it further.

@liu-cong
Copy link
Contributor

liu-cong commented Oct 1, 2024

cc @kfswain @robscott

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants