-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to define the algorithm of L2_Pool2d? #278
Comments
@huningxin, please report your proposed approach in this issue for the WG to review when you've discussed this issue with @mingmingtasd. |
@mingmingtasd : What sources are you seeing that average elements before the |
Agree. The L2Pool2d should follow and be based on the Lp-normalization function(Y = (X1^P + X2^P + ... + Xn^P) ^ (1/P)). |
@mingmingtasd Is there anything remaining unresolved on this one, or can it be closed? |
Let's close it, thanks. |
TFLite also need to average the count of sum elements |
As discussed before, l2 normalization should be calculated by BTW, do we have implementation experience on CoreML's l2_pool? https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS15.pool.l2_pool @philloooo |
🤔 If TFLite defines it that way, that may be a useful operation, but it's something besides L2 pooling 👀. function poolLebesgue(input, axes, windowDimensions, padding, strides, dilations, exponent)
// y = (x1^p + x2^p + ... + xn^p) ^ (1/p) // y is the reduced output for all applicable inputs
return root(poolSum(pow(input, exponent), axes, windowDimensions, padding, dilations), exponent)
endfunction |
Agreed. It seems like a TFLite WebNN backend will have to decompose 1. Is this decomposition expressible in WebNN?Recall that we eventually need to clearly define all WebNN operators #462
2. Is this decomposition expressible in TFLite?Anything is possible - especially if a device type is not mandated #749 - but potentially with severe performance cliffs, especially for non-CPU backends (e.g.), and especially if the operator implementation has to be hand-rolled. FWIW #689 has the same issue: if at least two backends (i.e. Core ML and DML) have consistent behavior, then maybe that's okay (at least once discrepancies like #180 are resolved!) |
I just realized #180 (comment) provides a decomposition of |
@a-sully The more time I devote to thinking of expressing aggregate operators in terms of more fundamental operators, the more I realize some primitive ops (like function poolSum(input, axes, windowDimensions, padding, strides, dilations)
return poolGeneric(input, axes, windowDimensions, padding, strides, dilations, add, 0)
// OR convolve(input, filter = ones(windowDimensions), axes, windowDimensions, padding, strides, dilations)
endfunction
Yep.
Yep.
Well, these {ONNX, DML, NNAPI, the original paper} agree (and I suspect CoreML too). |
CoreML agrees. l2Pool2d WPT test cases pass on CoreML except for those with dilation and rounding: |
Thanks for the decomposition! It's very helpful. IIUC, we may need to set the convolution groups to input channels, and make the all-ones filter in shape [groups, 1, windowDimensions.height, windowDimensions.width]. |
Not related to
|
I filed the issue in TensorFlow, they also consider it's an issue of TFLite kernel implementation, maybe they will fix it later. |
Junwei: Thanks for filing. So it appears WebNN's TFLite backend would need a decomposition until any future TFLite fix.
Austin: If it's any perf consolation, LP pooling is evidently not so common (only a few models in my little stash of hundreds of 1411 model files). |
Regarding to our new op proposal check list, there are two aspects: Cross-framework support and Cross-platform implementability. I understand we usually study ONNX / ONNXRuntime as one example of frameworks, among TensorFlow and Pytorch etc., and investigate DML as one example of platform APIs, among TFLite and CoreML. |
If that's the case and there's a straightforward decomposition which could be performed in "userspace"... is this operator needed in WebNN at all? Put another way, if this operator wasn't already in the WebNN spec, would it pass the new op proposal checklist? |
See discussion here: webmachinelearning/webnn#278 PiperOrigin-RevId: 664908697
See discussion here: webmachinelearning/webnn#278 PiperOrigin-RevId: 664908697
See discussion here: webmachinelearning/webnn#278 PiperOrigin-RevId: 664908697
Woot, it appears that TFLite is already fixed (per https://github.com/tensorflow/tensorflow/pull/74079/files), which means it's just contingent on Chromium updating it's TF version. Given there will now be a direct call to TFLite with no decomposition, does that change the difficulty of implementing this?
I think it's still worth adding to the complete collective of pooling operations, and several implementations offer faster implementations than their decomposition, suggesting it's useful. Even if it's rare in my mini-model collection, I do see people asking questions about it on forums, indicating utility. Barring that one implementation bug (now fixed), the backends implement it consistently too (unlike potentially more dubious localResponseNormalization where multiple implementations have little complicating differences).
|
Thanks for thoroughly following through with this issue @fdwr. TFLite's alignment with the other platforms does improve the "Cross-framework support" line item. Seems reasonable to me 👍 Filed https://crbug.com/361717758 to track implementation in Chromium Can we close this issue? |
Mingming, I'm closing it from the spec perspective, as Austin created a Chromium issue for it. 👍 Thank you for raising it. |
See discussion here: webmachinelearning/webnn#278 PiperOrigin-RevId: 664908697
See discussion here: webmachinelearning/webnn#278 PiperOrigin-RevId: 664908697
See discussion here: webmachinelearning/webnn#278 PiperOrigin-RevId: 664908697
See discussion here: webmachinelearning/webnn#278 PiperOrigin-RevId: 664908697
See discussion here: webmachinelearning/webnn#278 PiperOrigin-RevId: 664908697
See discussion here: webmachinelearning/webnn#278 Reverts f3e47f1 PiperOrigin-RevId: 664908697
As you know, the algorithm of L2_Pool2d is based on the
Lp-normalization function which should be Y = (X1^P + X2^P + ... + Xn^P) ^ (1/P).
But for L2_pool2d, I am not sure whether need to average the sum of elements as
Y =( (X1^2 + X2^2 + ... + Xn^2)/2) ^ (1/2) or directly use Lp-normalization function as
Y = (X1^2 + X2^2 + ... + Xn^2) ^ (1/2).
I find two papers: https://sci-hub.yncjkj.com/10.1109/cvpr.2011.5995370 and https://sci-hub.yncjkj.com/10.1109/tcsvt.2015.2461978, they describe the Lp-normalization as below:
So I confirm that Lp-normalization function should be Y = (X1^P + X2^P + ... + Xn^P) ^ (1/P), but I am still not sure whether need to averge the sum of elements for L2_pool2d. I go through some framwork API spec and find the description as below:
1. NNAPI ANEURALNETWORKS_L2_POOL_2D:
output[b, i, j, c] =
sqrt(sum_{di, dj} pow(input[b, strides[1] * i + di, strides[2] * j + dj, c], 2) /
sum(1))
2. ONNX LpPool:
LpPool consumes an input tensor X and applies Lp pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Lp pooling consisting of computing the Lp norm on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing.
3. OpenVINO: Not Supported
4. DML DML_LP_POOLING_OPERATOR_DESC :
Computes the Lp-normalized value across the elements within the sliding window over the input tensor. The value of the P variable in the Lp-normalization function Y = (X1^P + X2^P + ... + Xn^P) ^ (1/P), where X1 to Xn representing each of the values within the sliding window. In common use cases, this value is either set to 1 or 2, representing either the L1 or L2 normalization respectively.
So it seems that NNAPI ANEURALNETWORKS_L2_POOL_2D should do averge. But after verifying on DML, DML DML_LP_POOLING_OPERATOR_DESC doesn't averge. Thus the algroithm and implementation for l2_pool2d in these frameworks may be different.
The text was updated successfully, but these errors were encountered: