-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to query engine in offline mode to ask user whether to download #2875
Comments
I create an PR to address your |
Thanks @frankfliu for looking into this so quickly. I should clarify that PyTorch isn't the only engine affected - I expect that all that support downloading are affected. If I switch PyTorch to TensorFlow in my example code the same problem occurs:
Therefore this fix only helps with PyTorch, but it's still not possible to query all engines offline without risking that all access to the engine will be broken through Is it possible to roll back #2826 for all engines, or address this another way? |
Necessary because of deepjavalibrary/djl#2875
One thing you can take a look at is the DJL |
Thanks @zachgk I hadn't seen DJL Serving Dependency Manager - this looks like it could be a good way for us to add additional engines in the future. I don't think I can use it to solve the problem here. We provide PyTorch and TensorFlow engines (although developers can build from source with other engines if needed) using the recommended We then provide a dialog showing potentially-available engines, so the user can manually request them to be downloaded (which temporarily switches the off In the screenshot, the TensorFlow native libraries have been downloaded by the PyTorch ones have not. Our problem is that we can only know that an engine is really available (i.e. natives available) by querying it through In 0.24.0 we could query with In 0.25.0 we can't, because querying with #2876 addresses this for PyTorch, but not for TensorFlow. I'm not sure if other engines are affected by this issue. I realize we're doing things in a non-standard way, but it feels like a bug for a failed request for |
We can make changes for TensorFlow. but changing system properties at runtime isn't thread-safe and system properties are global, it may impact other external modules. The Engine is meant for singleton. Rely on a lazy load singleton implementation is not a good idea. Instead of try and fail approach, can you check the cache folder before you load |
Thanks @frankfliu I understand there are ways to break the use of the system property, but I couldn't find a better way. The calls in our app are through a method that uses try/finally and synchronisation to try to reduce the risk of things going wrong. I don't know how to check for the correct cache directory existing, because I don't see a way to determine which version and flavor of any specific engine will be required in advance. And even if I knew the cache directory, I wouldn't know in advance that it contains all the required dependencies. In fact, with TensorFlow it is especially challenging if an incompatible CUDA is found - even when the CPU engine has already been downloaded. This seems to be because a call to The ideal solution for us would be to have two
An option for progress monitoring any download would also be helpful.... but not essential. Having 1. would enable us to know whether calling 2. is likely to return quickly or slowly, and we wouldn't need to work with the system property at all. If this isn't possible, we may be able to figure out some other approach in our app that informs our users that any use of DJL requires going online. But I think the requirement for 'check an engine and control whether it is downloaded' could be general enough that if it can be solved in DJL itself that would be great. |
Thanks @frankfliu ! |
fixes deepjavalibrary#2875 reverts deepjavalibrary#2876 This adds new support for DJL manual initialization of engines to support `DJL_ENGINE_MANUAL_INIT`. Once done, no engines providers will be found or loaded on startup. Instead, they can be added manually by: ```java PtEngineProvider provider = new PtEngineProvider(); provider.getEngine(); // Optional, throws exception if the provider can not load Engine.registerEngine(provider); Engine.setDefaultEngine(provider.getEngineName()); // Optional, sets as default ```
@petebankhead I think the issue here is that the Engine was not really intended to be used that way. It started out as loading all engines strictly on startup and would throw exceptions immediately. Then we made it lazy for performance, which also accidentally means it could handle that case. But it wasn't really designed for it. I opened up a PR #2885 to support it better. Basically, it takes out the manual initialization so you can intentionally trigger the engine registration. You can also verify that the engines can initialize properly and then only perform the engine registration afterwards |
Excellent, thank you @zachgk! |
* Creates DJL manual engine initialization fixes #2875 reverts #2876 This adds new support for DJL manual initialization of engines to support `DJL_ENGINE_MANUAL_INIT`. Once done, no engines providers will be found or loaded on startup. Instead, they can be added manually by: ```java PtEngineProvider provider = new PtEngineProvider(); provider.getEngine(); // Optional, throws exception if the provider can not load Engine.registerEngine(provider); Engine.setDefaultEngine(provider.getEngineName()); // Optional, sets as default ``` * Revert "[tensorflow] Revert InstanceHolder for TensorFlow engine (#2884)" This reverts commit 586bb07. * Revert "[api] Replace double-check singlton with lazy initialization (#2826)" This reverts commit 3927867. * Make engines initialized This makes several updates: - engines will now initialize once per instance of EngineProvider rather than re-attempt to initialize - Registering an engine can overwrite the existing one - All engines now use the synchronized form rather than the static instance holder. This allows them to have multiple versions and be local to the instance rather than global (but the instance is saved globally) * Throws Exception on bad getEngine * Removes unnecessary check
* Creates DJL manual engine initialization fixes #2875 reverts #2876 This adds new support for DJL manual initialization of engines to support `DJL_ENGINE_MANUAL_INIT`. Once done, no engines providers will be found or loaded on startup. Instead, they can be added manually by: ```java PtEngineProvider provider = new PtEngineProvider(); provider.getEngine(); // Optional, throws exception if the provider can not load Engine.registerEngine(provider); Engine.setDefaultEngine(provider.getEngineName()); // Optional, sets as default ``` * Revert "[tensorflow] Revert InstanceHolder for TensorFlow engine (#2884)" This reverts commit 586bb07. * Revert "[api] Replace double-check singlton with lazy initialization (#2826)" This reverts commit 3927867. * Make engines initialized This makes several updates: - engines will now initialize once per instance of EngineProvider rather than re-attempt to initialize - Registering an engine can overwrite the existing one - All engines now use the synchronized form rather than the static instance holder. This allows them to have multiple versions and be local to the instance rather than global (but the instance is saved globally) * Throws Exception on bad getEngine * Removes unnecessary check
Description
I maintain an open-source desktop application that uses DJL (which is great, thanks!).
Our complication is that we
DJL's ability to download engines on demand solves 1.
What we need for 2. is a way to query whether an engine is available without prompting any download if it is not.
I've called this a 'bug' rather than 'enhancement' because it was possible to solve this in DJL v0.24.0 and earlier using
System.setProperty("offline", "true");
.However since #2826 this strategy no longer works, so it fails in v0.25.0.
Related issue:
How to Reproduce?
I've written a short demo of the issue at
https://gist.github.com/petebankhead/836c8e0db8a4e4d9f0d5ecdb032a08ba
I have focussed only on PyTorch.
Error Message
Current behavior, after #2826 (v0.25.0)
I get the following messages using the current
master
:Without #2826 (v0.24.0)
Replacing
PtEngineProvider
to remove the changes in #2826 it works as expectedExpected Behavior
It should be possible to query programmatically whether an engine exists without immediately prompting the download.
Any failure to download the engine (because of offline mode) shouldn't prevent requesting the engine later.
What have you tried to solve it?
I can't see a workaround in v0.25, so we will likely need to stay with v0.24.
It seems that
InstanceHolder
can't be initialized in offline mode if the engine isn't available, so the class then isn't available.Whenever that happens, all subsequent attempts fail with
NoClassDefFoundError
even ifSystem.setProperty("offline", "false");
has been called.Restoring the behavior prior to #2826 would enable our previous workaround to work, although I think a cleaner solution would be to provide an API to query a model that doesn't prompt a download.
I am reluctant to rely upon
"offline"
as a system property, and share @docent's opinion at #2632 (comment) that an alternative name would help avoid potential clashes.Environment Info
I'm using Java 17 on Apple Silicon & can provide further detail if needed, but I expect the issue to affect all platforms.
The text was updated successfully, but these errors were encountered: