-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost MLeap bundle speed #833
Comments
This will depend on which xgboost runtime you are using. We have two xgboost runtimes:
See https://github.com/combust/mleap/tree/master/mleap-xgboost-runtime has details on how to swap between them P.S. I'm guessing your chart is showing the stats per row and not the aggregate for batch size? I.e., the mean time for batch_size=20 is 0.625*20 in aggregate. It would be pretty surprising to me if |
Thanks! I ran for 1000 iterations for a fixed batch size, so for example 1000 iterations of batch size 1 took 1.05 * 1000 ms. For batch size 20, it was 0.625 * 1000 and for size 50 it was 0.468 * 1000. So yes, I'm showing predict(50_rows) < predict(1_rows) which is what is curious. This is not expected? Do you have a slack channel btw? |
I definitely would not expect
Only ideas I have are some weirdness in the benchmarking setup like cache warming, startup, etc.. If you're not already, then using jmh for benchmarking is usually helpful for eliminating that kind of noise. |
Right I also am a bit weirded out by this but also in production the worst latencies I saw seem to have been by requests which have a 1 feature row; more feature rows seems to do better (a large batch has better latency than 1 row, predict(50_rows) < predict(1_row)). So this is confirming what I see but it does not make sense and I'm trying to get an understanding of it ... Is it possible that the reason this is happening is that in the case when batches are small the number of threads that go up is large and then they "wait" to come down and this has some sort of inefficiency? I did not use jmh yet but I'm also loading another model and for this model when the number of rows grows the latency grows, which makes sense (predict(50_rows) > predict(1_rows)). The only thing I can come up with currently is the threading inside of the bundle has some optimization specific to larger batches and it's detrimental to smaller batches ... Can try jmh and come back or maybe a quick zoom? |
Hi, I have the question below from another repo that I think is no longer active, so I pasted it. Basically, I don't quite understand why with bigger data batch sizes MLeap XGBoost bundle seems to be running faster. I assume it is threading but unsure. Please let me know that and if I can turn off such optimizations, I am trying to compare with something that's unoptimized currently.
The text was updated successfully, but these errors were encountered: