Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better documentation around how unloading is triggered #511

Open
dsgibbons opened this issue Jun 5, 2024 · 3 comments
Open

Better documentation around how unloading is triggered #511

dsgibbons opened this issue Jun 5, 2024 · 3 comments

Comments

@dsgibbons
Copy link

dsgibbons commented Jun 5, 2024

When loading some models, I receive the WARN log: Memory over-allocation due to under-prediction of model size... (which stems from here) followed by the INFO log: Eviction triggered for model ... (I couldn't find exactly where this comes from). This unloading happens despite it being the only model on a large machine with 64GB RAM, 40GB VRAM and all of the k8s resource limits being set to max.

I've tried to piece together how to avoid this from various GitHub issues (e.g., this one) but would really appreciate some clear documentation around how unloading is triggered in ModelMesh. Even variables such as MODELSIZE_MULTIPLIER as referenced by this reply aren't properly documented, and I can't find where they are used in either the modelmesh or the modelmesh-serving source code.

Could the documentation please be updated to formally describe how models are prioritized and subsequently unloaded with more discussion around the various configurations that we can alter on a per runtime/per isvc basis? I'm happy to contribute by helping to update the documentation, but I don't fully understand the underlying design decisions.

@dsgibbons
Copy link
Author

dsgibbons commented Jun 5, 2024

For some additional context, I'm using the Python backend in Triton. An example model that triggers unloading has custom dependencies via conda pack and has a file tree like so:

├── 1/
│   ├── model.py
│   └── model.pkl (approx 100MiB)
├── config.pbtxt
└── conda_env.tar.gz (approx 3GiB)

I'm not sure whether using models in this way messes with how ModelMesh computes usage.

@GolanLevy
Copy link

Hi @dsgibbons
Please see my reply here kserve/modelmesh#82 (comment), it might help you with the documentation on how modelmesh decides to load/unload models.

Maybe the DEFAULT_MODELSIZE property can help you, especially if most of your models are of the same size.
DEFAULT_MODELSIZE is used to estimate the model size if no prior knowledge is known about the model type before loading it.
According to the code documentation:

// conservative "default" model size,
// such that "most" models are smaller than this

Since most of our models have the same size, setting it to the correct value eliminated the WARN log you are seeing and helped modelmesh make better model allocation decisions.

@dsgibbons
Copy link
Author

Thank you for linking your reply @GolanLevy. I'd still love to see some formal documentation for this, as it seems like critical information that shouldn't require trawling through the issue tracker. I'll see how I go this week. I hope I'll eventually understand ModelMesh well enough to submit a PR to address this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants