When will buffer queues be enabled? #17

bodybreaker · 2024-08-19T06:08:00Z

I noticed that it states that requests can queue when all llama.cpp instances are busy.
I was wondering if the queuing is done per llama.cpp server or per slot?
I am currently trying to scale up from 1 to multiple llama.cpp servers and the paddler_requests_buffered metric is always 0.

mcharytoniuk · 2024-08-20T10:35:22Z

@bodybreaker I will check if those metrics work correctly and get back to you.

mcharytoniuk added the question Further information is requested label Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When will buffer queues be enabled? #17

When will buffer queues be enabled? #17

bodybreaker commented Aug 19, 2024

mcharytoniuk commented Aug 20, 2024

When will buffer queues be enabled? #17

When will buffer queues be enabled? #17

Comments

bodybreaker commented Aug 19, 2024

mcharytoniuk commented Aug 20, 2024