You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I met a question when using multi-adapter. It works with loading different PEFT adapter and call it by the adapter_name/ adapter_id. However, can i call the Vanilla llm? For example, I deploy Llama2 with multi-adapters, can i disable adapters and using the original llam2 model to inference by the framework? Looking forward to u asap.
The text was updated successfully, but these errors were encountered:
Currently, we yet to support unloading lora layers. This has to do with unloading models of the memory are pretty slow from what I have tested so far, when loading around 10-15 layers
Another approach is not to disable lora layers when loading model into memory, and load dynamically on request. Imagine in a distributed environment, there is no way to ensure that all model pods will load the adapter correctly.
I think for multi adapters, the ability to use the base model can be supported, but I think it is probably very low priority right now.
I met a question when using multi-adapter. It works with loading different PEFT adapter and call it by the adapter_name/ adapter_id. However, can i call the Vanilla llm? For example, I deploy Llama2 with multi-adapters, can i disable adapters and using the original llam2 model to inference by the framework? Looking forward to u asap.
The text was updated successfully, but these errors were encountered: