Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding config search #805

Open
Talavig opened this issue Dec 29, 2023 · 11 comments
Open

Questions regarding config search #805

Talavig opened this issue Dec 29, 2023 · 11 comments
Labels
enhancement New feature or request

Comments

@Talavig
Copy link

Talavig commented Dec 29, 2023

We want to include model analyzer as part of our model deployment pipeline. We want to run a search on the model and get the best configuration automatically as part of the rollout. After experimenting with the tool, we would like to raise a few points regarding the config search functionality:

  1. We have noticed that when we try to run a search on a model that has already been configured, the search immediately stops, we only manage to trigger a search when we strip the instance group and the dynamic batching, and set max batch size to 1. Is this by design or is it a bug? It does not throw an error when it happens, model-analyzer simply creates the configurations and says its done running the search. This happens in both quick and brute searches.
  2. For security reasons and because we run the pipeline on k8s, we have to run model analyzer in remote mode. We have noticed that when we run the triton-sdk pod without assigning it a gpu, model-analyzer can only offer configurations that use instance groups of type kind_cpu. I get it for local, docker and c_api modes, but why for remote as well?
  3. We have noticed that the way you configure dynamic batching only allows for usage of default configuration when running a quick search, while in brute it is a bit more advanced, with the added control of the max_queue_delay_microseconds parameter. Are there any plans to add more advanced configuration for dynamic search in quick search in the future? It could really help us.
  4. We would also like to know about support for the max_queue_size parameter in dynamic batching: as the instance count increases, it makes sense to increase the queue size accordingly. Do you plan to support this parameter in both quick and brute search, or at least calculate it based on the throughput?
  5. We know quick search is faster and the configuration space is smaller in quick search, but is it possible for you to add support for multi model brute force searches? We ask for it because we want the finer control that brute force allows for, and we do not mind waiting for the best results.
    We’d really love the product and would appreciate it if you could answer these question:)
@tgerdesnv
Copy link
Collaborator

  1. I need to investigate further. Is this for quick search? You are saying that if you start with a model configuration with instance_count and/or max_batch_size not equal to 1, that the search doesn't happen?

  2. This sounds like a bug. I will investigate

  3. If you set max_queue_delay_microseconds in the default config, that value should persist through all created configurations. Are you saying that isn't happening? Or are you asking if that value can be 'searched' as well?

  4. We don't currently support it in quick search. You should be able to specify a list of possible values if you wanted to sweep it in brute, although I haven't verified that it works. I've created a ticket to investigate adding support for searching or intelligently setting max_queue_size.

  5. Our observations was that multi-model brute search could take weeks or even months to finish. It is something we could add support for, but we don't want people accidentally using it. Please confirm if this is something you really want.

@Talavig
Copy link
Author

Talavig commented Jan 4, 2024

  1. This happens for both quick and brute force searches. What you have described is exactly whats happens, when max_batch_size is not configured to 1, model analyzer immediately says "done with config search" and continues to profile the model.
  2. Ok, thanks for that.
  3. The max_queue_delay_microseconds parameter works as you have described, I'm asking for the addition of max_queue_delay_microseconds parameter to quick search. From what we have seen, the only thing configured by model analyzer is whether to enable or disable it, but it is not enough a lot of the time. If this configurable in brute force, we would also expect it to be the same in quick search.
  4. We tried defining max_queue_size in our config and we got an error sayin our configuration was invalid. We have also not found a single appearance of the parameter in your code or your documentation, so we are unsure if it is supported at all.
  5. What exactly bloats this number so much? Is it simply the configuration space being too big? From what we understand, the configuration process for multi model deployments comes down to a zero sum game between all the model configs, which boils down to the product of all the underlying models configs. The max amount of models we want to run this on is around 12, is this too much for it to handle? Maybe the way we are going around configuring these tritons is wrong? If the other features listed above can be implemented for quick search, maybe this will become irrelevant.

In the meantime, we found a new weird phenomenon we’d like to discuss: when letting model analyzer go to an k8s ingress route, where two tritons pods are deployed, after a short period model analyzer crashes and says that perf analyzer failed to find the requested model version. But running regular perf analyzer in the same triton results in no error. From this we have deduced that the error is somehow related to the mode switching done by mode analyzer. Do you know what could possibly cause this and what can we do about it? We ask because we want our automation to configure the amount of triton pods running, and we want to profile the results using model analyzer.

@tgerdesnv
Copy link
Collaborator

tgerdesnv commented Jan 4, 2024

1- I will try to reproduce

3- We do have a ticket to try to make quick search more customizable instead of the fixed parameters that are currently searched. This won't be a small change however.

4- There are a ton of parameters inside of triton as well as for various backends. In theory we support them all, although most of them are not explicitly called out in the code base. I will try to come up with an example of a brute search with max_queue_size. Finishing bullet # 3 would also potentially allow max_queue_size to be quick searched.

5- Yes, the space grows exponentially with each model. You want to do brute with 12 different models? Even if you are only trying 5 configs for each model, that would be 5^12 (244 million) different tests to run. 12 models is definitely too many for brute. With a limited search space and only 2 models it is somewhat viable, but beyond that it just grows too fast.

For your latest comment, can you create a new issue? Add any further information that would be helpful in trying to reproduce.

@tgerdesnv
Copy link
Collaborator

I have not been able to reproduce number 1. Can you provide what a starting config looks like that stops right away for you? Note that model analyzer keeps a checkpoint, so if you run model analyzer and then run it again, it will look at all of the results it already has and conclude that it doesn't need any new measurements.

As for number 2, there is a PR up that fixes the main issue (although the reporting still has some typos) #806

@Talavig
Copy link
Author

Talavig commented Jan 5, 2024

  1. Here is the config yaml file:
    triton_launch_mode: remote
    output_model_repository_path: /output_models/ourput
    export_path: profile_results
    override_output_mode_repository: true
    triton_metrics_url: {my_url}
    triton_http_endpoint: {my_endpoint}
    triton_grpc_endpoint: {my_endpoint}
    run_config_search_mode: quick
    profile_models:
  • my_model
    And here is the config.pbtxt:
    platform: “pytorch_libtorch”
    version_policy {
    specific {
    versions[1]
    }
    }
    max_batch_size: 1
    input: {
    name: “input”
    data_type: TYPE_FP32
    dims: [3, 112, 112]
    }
    output: {
    name: “output”
    data_type: TYPE_FP32
    dims: [512]
    }
    When the max_batch_size is 2, i immediately get skipping all illegal configurations, done with quick search. When its 1, everything works as expected.
  1. Then we'll simply wait for it to be merged back into a stable release.

  2. This is extremely important to us, because we have seen major improvements when tweaking dynamic batching by hand, and because we automate the process, it makes sense for it to be there, especially when brute force supports it.

  3. I did not manage to use it in my models as you have described, if you could provide an example or even add the relevant info to the documentation it would be great.

  4. If that’s what you say, We’ll give up on it for now. We have originally asked for it because of the missing options in quick search, and because we have seen it miss the mark by a lot in quick search so we thought to ask for a more accurate search option. But by how you described it, it does not seam feasible.

@tgerdesnv
Copy link
Collaborator

I'm still unable to reproduce 1), but your comment about illegal configuration is a good clue. If there is an illegal configuration at the start of quick search then it doesn't know where to go and will stop. Can you run with -v (model-analyzer -v profile -f config.yml) and paste what is being printed? Generally an illegal configuration will involve some bad combination of batch sizes.

@tgerdesnv
Copy link
Collaborator

For 4), anything in the model config can be manually swept via brute search. It just takes some yaml magic, which is hard to explain in the documents.

There are some examples in this documentation

Here is an example that includes max_queue_size:

run_config_search_mode: brute
profile_models:
  my_model:
    model_config_parameters:
      dynamic_batching:
        max_queue_delay_microseconds: [100, 200]
        default_queue_policy:
          max_queue_size: [1000, 2000]

and the output from MA:

[Model Analyzer] Creating model config: my_model_config_0
[Model Analyzer]   Setting dynamic_batching to {'max_queue_delay_microseconds': 100, 'default_queue_policy': {'max_queue_size': 1000}}
[Model Analyzer]
[Model Analyzer] Creating model config: my_model_config_1
[Model Analyzer]   Setting dynamic_batching to {'max_queue_delay_microseconds': 100, 'default_queue_policy': {'max_queue_size': 2000}}
[Model Analyzer]
[Model Analyzer] Creating model config: my_model_config_2
[Model Analyzer]   Setting dynamic_batching to {'max_queue_delay_microseconds': 200, 'default_queue_policy': {'max_queue_size': 1000}}
[Model Analyzer]
[Model Analyzer] Creating model config: my_model_config_3
[Model Analyzer]   Setting dynamic_batching to {'max_queue_delay_microseconds': 200, 'default_queue_policy': {'max_queue_size': 2000}}

@Talavig
Copy link
Author

Talavig commented Jan 5, 2024

Regarding 1, i tried it and it as it turns out, the problem occurs when we enable dynamic batching as well. Without it the search works as expected. We get the following message:
Illegal model run config because maximum of my model's preferred batch size 2 is greater than model max batch size 1. From what i know the preferred batch size is set by the max batch size, so it is still a bit weird, but thanks for the help.
Regarding 4, we tried it and it worked this time, thanks for that. Maybe the docs should make this clearer that all dynamic batching parameters are available.
And i would like to expand my questing regarding your comment that anything is possible to sweep through: do you really mean anything? For example, if we theoretically have different optimizations like graph optimizations and such, or we would like to try different backends/platforms to try, is this theoretically possible in brute search? If so, I didn't get it from the docs.

@tgerdesnv
Copy link
Collaborator

tgerdesnv commented Jan 5, 2024

For 1), this all makes sense now. There was a fix a few weeks ago for this. See bullet 3 of the description here. This won't actually help you if you already have a config with preferred_batch_size specified, but if you start from scratch that value will no longer magically appear and cause you problems on further reruns. The fix will be included in the 24.01 release.

For 4), I'll look into improving the documentation

For sweeping anything - Model Analyzer should be able to sweep over any model config option that can be specified in the config.pbtxt file. I believe this also includes any options that are specific to whatever backend you are using. For example, ONNX and TensorFlow have a bunch of "execution accelerators" that can be specified, and you should be able to brute search through legal values of those.
If you have different versions of the same model across different build optimizations or backends, Model Analyzer supports running each of them one at a time and then finding the best configs across all of them. This can be achieved by passing multiple models to Model Analyzer, setting --num-top-model-configs to something like 3, and NOT specifying --run-config-profile-models-concurrently-enable. What should happen is that it should pick the top 3 performing configs across all of the models it profiled.
I know you were talking "theoretically", but if you do have something more concrete I can confirm and/or give an example.

@Talavig
Copy link
Author

Talavig commented Jan 7, 2024

Ok, thanks for the help!
Do you have any plans for the quick search improvements we discussed?

@tgerdesnv
Copy link
Collaborator

I created a ticket to design and implement the ability to quick search custom fields. I do think there is a lot of value in this, beyond just the cases you are asking for. It is something we plan to do, but likely won't get started at least until the product lead gets back from medical leave. Unfortunately I can't imagine this getting done for at least a few months.

I also created a ticket to look into Model Analyzer setting max_queue_size for some cases. This has a higher chance of getting done in the near term since the scope is smaller, but it probably won't be prioritized unless it is a really important feature to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

3 participants