-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider removing lstm
and gru
operators
#689
Comments
🔔 For RNN, GRU, LSTM: ✅: ❌: |
Thanks for the investigation! 🙏 Updated the table |
We decomposed the Gru operator in the RNNosie sample into some other operators supported by WebNN. And we ran the sample on some integrated and discrete graphics:
|
Thanks for the investigation, @miaobin! Thinking about the options I mentioned above, this RNNoise model is a nice example of a model which could benefit from the third option:
The sample always uses ...This may sound familiar! :) https://github.com/webmachinelearning/webnn/pull/188/files#diff-5e793325cd2bfc452e268a4aa2f02b4024dd9584bd1db3c2595f61f1ecf7b985L909-L913 Given that
There are still open questions with regards to supporting
I propose the following:
WDYT? |
I see there are a few thumbs-up on my previous comment so I just put up #718. PTAL :) |
/cc @BruceDai |
@huningxin Thanks for the reminder! |
@a-sully : It is a beautiful table btw. |
FYI I added a Proposal Status header to the issue description to track progress on my proposal from #689 (comment). @BruceDai please post an update on this issue when conformance WPTs are added! |
For adding gru/gruCell and lstm/lstmCell conformance tests into WPT ASAP, I created another two CL CL-1' and CL-2' by leveraging test data and tolerance criteria from existed gru and lstm tests of ONNX CTS. Then we could go forward with continuing to work on defining exact tolerance criteria for these complex operators and add more others tests for test coverage. Updated |
Thanks for the update, @BruceDai! I've filed https://crbug.com/360052663 to track the next step: implementation of these operators in Chromium on TFLite and CoreML |
Here is an update on CoreML implementation: CoreML gru only gives correct result with "cpu" target when gpu sandboxing is disabled, and gives incorrect results for all device targets when gpu sandbox is on. I've reported the bug to @mwyrzykowski . For now given the bug, we can't use coreml's gru and has to implement full decomposition too. If when the bug is resolved, here are the API differences that could be emulated: Constant requirementWeights and bias need to be constant, so weights/bias manipulation need to be done as a preprocessing step before passing to CoreML. directionWebNN support bi-directional gru. CoreML only support forward/backward. Bias: [numDirections, 3 * hiddenSize] -> [3 * hiddenSize] Merged bias and reset_afterCoreML takes a single bias value. It’s the result of add(bias, recurrentBias). |
Circling back, given that there is only a single backend supporting this, we are back to the original question: should we just remove lstm and gru? |
#689 (comment) data shows there is a significant performance improvement by using native gru op. If we are going to remove the lstm and gru, my question is: what WebNN can help for an implementation to recognize the pattern of decomposed form of lstm / gru and fuse them into native lstm / gru op when available? |
Proposal Status
From #689 (comment)
MLActivation
in favor of restoring the formerMLRecurrentNetworkActivation
enum (described here)lstm()
andgru()
lstm()
andgru()
based on the results of the previous stepRelated to #453. One recommendation of that issue is:
Current support across backends
†TFLite delegates generally do not support the entire TFLite opset. For instance, TFLite GPU delegates only support a very basic variant of LSTM which does not support most of the parameters specified by WebNN.
What does "supporting" LSTM really mean?
Higher-level operators such as
lstm
andgru
tend to have more knobs to turn, and each additional knob increases the likelihood that WebNN will not actually be able to use the backend's operator of the same name. There are many variants of LSTM. Just because a backend has an LSTM operator does not mean that operator can express everything required by the variant of LSTM that WebNN specifies. Meanwhile, frameworks sitting on top of WebNN may only take advantage of WebNN’slstm
operator if it is exactly the variant the calling framework needs.For example, TFLite's variant of LSTM uses coupled "input" and "forget" gates (CIFG), whereas this option is not available in WebNN - Chromium's DML implementation currently does not couple these gates. User agents implementing WebNN on TFLite cannot use its LSTM operator, and neither can frameworks calling into WebNN use its LSTM operator if they want the CIFG behavior.
Let's look at the problem @philloooo mentioned about the activations supported by LSTM across various backends: #573 (comment)
Supported activation functions for LSTM
†I couldn't find documentation in DirectML saying which activations are supported by which operators. @ folks familiar with DML, please feel free to chime in!(done! #689 (comment))††Does not support passing
alpha
norbeta
values, as far as I can tellAside: Now that
MLActivation
s are no longer used for op fusion, we should consider removingMLActivation
s which do not make sense for use with recurrent operators.What activations can be specified on each backend?
Let's also remember that WebNN's
lstm
operator has multiple activations.input (i)
f()
recurrent_activation
fused_activation_function
activations[0]
output (o)
f()
recurrent_activation
fused_activation_function
activations[0]
forget (f)
f()
recurrent_activation
fused_activation_function
activations[0]
cell (g)
g()
cell_activation
activations[1]
hidden (h)
h()
activation
activations[2]
†DML also supports passing different activations for LSTM's forward and backward passes
Summary
Reviewing the Guildelines for new operations to retroactively evaluate whether
lstm
andgru
meet these guidelines, this stands out:The rationale for having these operators in WebNN is that the user agent's implementation of WebNN can plumb this directly to the backend's operator of the same name; otherwise there's no benefit compared to having the framework sitting on top of WebNN decompose the operator itself.
While there is some overlap - DML and CoreML are the most similar - there are still far more differences than similarities. For a web developer looking to deploy a model on WebNN across platforms, the tables suggest that aside from a few exceptional cases, these operators would need to be decomposed by the user agent.
If a high-level operator:
...should it still be in WebNN? :)
Options:
Updates
2024-06-07:
clamp
andsoftmax
activations from the table as per Remove MLActivations definitely not usable with recurrent ops #703lstm
andgru
operators #689 (comment) by @fdwr (thanks!)2024-07-23:
The text was updated successfully, but these errors were encountered: