-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libshortfin] Initial implementation of LLM inference server. #181
Conversation
5b0b230
to
de95c51
Compare
This patch moves logging functions from nod-ai#181
This patch moves logging functions from nod-ai#181
This patch moves logging functions from nod-ai#181
This patch moves logging functions from #181
359eca8
to
2748783
Compare
806fb13
to
a03658a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any deep insights on this yet, I've left a few nit comments for typos but you already have noted the areas needing most improvement.
I see some of this could be abstracted out and reused for the next inference server implementation, but some of it, like the batcher, might not have a very substantial base class to carve out (and we might want to resolve some of your bugaboos about current implementation before trying to reuse it).
Anyway, this seems to cover all the points I have in my head, and there's lots for us to iterate on. Thanks.
Some class naming conventions were a bit general, but it seems like an intentional choice, and I see no problem besides a bit of readability -- e.g. InferencePhase, InferenceExecRequest. I doubt this would ever make much of a difference in developer experience and it saves us long ugly class names, but maybe worth mentioning even if it's extremely subjective.
libshortfin/python/shortfin_apps/llm/components/config_struct.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. I noticed a few TODOs which I assume are still WIP but the structure felt right to me. Only high level thought was having some utility / abstraction for data transfer. The direct invocations of the IREE h2d or d2h commands felt slightly out of place (but thats just a nit).
Yeah, I swallowed my own objections while typing those transfers. To do them right needs more system configuration, which I don't have yet (the way you stage transfers and allocation is system and use case specific), so I just did the least bad thing and wrote it out long hand for now. That's why today, I'm building out the system config layer more -- that is where you root fixes for things like this. |
This is very much a first draft that needs a fair bit of work to turn into a final form. As the first real user of
libshortfin
's core APIs, a number of rough spots were worked out even to get it to this first stage.Just a few things that need attention: