-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for torch.compile #1684
Comments
Hi @Fabioomega 👋, Thanks for opening the feature request, that's a really cool idea. The optimal way would look like:
In this case we would avoid to add backend specific parameter/s to the interface.
For running on older hardware you could also check out: https://github.com/felixdittrich92/OnnxTR which is optimized for inference and more lightweight. Wdyt ? :) PS: I haven't had the time to check your branch but i remember i tried this with pytorch 2.0 already and most models failed with the postprocessing or was slower as before !? |
Hey! Thank you for the reply. I’ve already tried OnnxTR, but for some reason, and I’m unsure if it was because of the detection or recognition models, it actually performed worse. Maybe I did something wrong? I’m unsure. Also, as far as my tests go, the first time the model runs, it takes significantly longer because of the compiling step. But after that, it becomes somewhat faster, with a 20-30% improvement on my machine. It did have some problems in the post-processing step with some OpenCV functions, which I fixed by wrapping them into custom operators; I haven’t had an issue since. For the interface, I didn’t change it at all. I only added an environment variable to control whether to run I will write some documentation and add unittests as soon as I have some free time. Also, should I write unittest for each model? As far as I checked it some models may neeed some specific tweaks to make they react better to the compilation step, but they should work fine without it. Thank you for your amazing work and for making it public! |
Hey @Fabioomega 👋,
|
I'm not sure which version I used, but I believe it was before 0.3.2, it's been a couple of moths, so I'm definitly trying the new one! Also, I should've handled the issue with the shared postprocessors because I've made a cv2 fallback option in case the new wrapped cv2 functions are not being used. But again, the way I structured may be kinda annoying. Either way I'm going to open the PR and thank you very much to letting me contribute to such an amazing project! |
👍 Happy to get your feedback. Do you have tested how huge the impact is if we compile only the model itself without the postprocessing step ? This would make it much easier to integrate Happy about every new contributor 🤗 |
I haven't tested. In fact, I'm unsure whether that is even possible. I may be mistaken but I think the postprocessing is called in the foward function of the model, which will make the torch.compile try to compile them. It would make it easier to integrate it if it was not part of the foward function, tough I imagine that would require some annoying refactoring that I'm unsure it's worth the effort. Do you have any ideia on how could I test that? I think the only way I see it to force the torch.compile to not compile those functions would be to maybe wrap them in a custom operator, which, as you said before, is not ideal. One solution that, maybe, is better is to do something similar to the Anyway, based on the performance metrics I've tried while wrapping the cv2 function into custom ops I think the postprocessing step shouldn't affect anything by being compiled or the improvemenet is negligible. Mostly I did the wrapping because some functions were causing crashes on my machine while being compiled. Should probably investigate that further, shouldn't I? Either way, I will try to remove the custom operators and see if there's some signficant performance loss or if I can narrow down the crashing function. Never know maybe I was just hallucinating. Also, I'll try to think some solutions that are not so extensive in changes or focus on the user as you said before. |
I’ve just confirmed. It seems that not compiling the postprocessor is not possible with the current architecture, at least to my knowledge. Though it seems that I was just crazy when I said it couldn’t compile without any changes to the postprocessor; I’ve just tested and it seems that, while there’s a little degradation in performance, it could be used regardless. Also, the performance gain is almost insignificant on my machine when used with |
Hey @Fabioomega 👋, Excuse the late response. In this case the scope of the PR would be more related to writing test cases and a well defined documentation section + maybe small model related changes. pseudo code example test case: (same for the classification and detection models)
|
Yeah with |
All recognition models seems to work out of the box with torch.compile - no fullgraph (except So i would suggest the following for the current PR:
If that's done we can take care of the failing models and fix it one by one (so 1 fix == 1 PR) That way we have a clear structure and can keep track of the progress wdyt ? |
Hey! Sorry for the delay; I’ve been busy with university stuff! Your idea sounds good! I should be able to run the tests this week, hopefully. Should I add a new commit to remove the changes in the post-processing, or start a new branch and create a new PR? I also like the idea of separating the models into different PRs. Are there any failing models right now? Or by “taking care of the failing models” did you mean enabling fullgraph support? |
Hey @Fabioomega 🤗, No stress :) Correct i would say let's update your current PR 👍 I have tested only the recognition models quickly: compile with For the documentation we could provide a table in the section for the first iteration: for example: So in the first PR we provide the tests for all models and skip failing onces (should run but raise only a warning - same we have for the onnx tests) and the documentation section with an example usage and an overview which models can be compiled. On follow up PR's we pick a model which does not work - fix it (if possible) - update the table -> done -> next model :) |
PS: Last column should be PS 2: updated the issue to keep track on the progress |
🚀 The feature
torch.compile
is a Pytorch feature that compiles the model into static kernels using a JIT. More information on: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.htmlMotivation, pitch
I've been using the doctr for some of my projects but some of the more robust models take a long time on old hardware. Using some of those nifty zero-overhead features of pytorch seemed like a good way to improve performance.
Alternatives
I've already considered using using pytorch jit or trace but it seemed like the easiest to implement would be torch.compile approach. Also I think it's the easiest to implement while being 'zero-overhead'.
Additional context
I've already implemented a version of doctr that has support for the feature: https://github.com/Fabioomega/doctr/tree/pytorch-compile-support. But, because of the significant code changes and incomplete fullgraph support for some models, I was unsure if it was of any interest so I thought it would be prudent to open an issue first. As of right now I should be finished in implementing all that I could for the detection models, the crop and orientation models and fullgraph support for parseq.
Model tasks
Recognition
crnn_vgg16_bn
fullgraph=True
crnn_mobilenet_v3_
fullgraph=True
sar_resnet31
fullgraph=True
master
fullgraph=True
vitstr_
fullgraph=True
parseq
fullgraph=True
Detection
db_
fullgraph=True
fast_
fullgraph=True
linknet_
fullgraph=True
Classification
mobilenet_v3_small_crop_orientation
fullgraph=True
mobilenet_v3_small_page_orientation
fullgraph=True
The text was updated successfully, but these errors were encountered: