You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DotLayer should respect the dyn size (or seq mask) when you reduce dynamic axes, just like ReduceLayer etc would do over dynamic axes. The DotLayer currently completely ignores this. This is fine when the user explicitly performs masking beforehand. E.g. SoftmaxOverSpatialLayer will do the right thing. However, otherwise it will be wrong in general. This is esp also a problem for the concept of dynamic axes with extended dynamic sizes.
Implementing this for DotLayer is not so nice, though, because this additional masking is not needed for the common case where there was a SoftmaxOverSpatialLayer before. The additional masking is not wrong, but it would make it a bit slower. So we definitely want to avoid this. But how can we know when this can be skipped? It could explicitly check whether one of the inputs comes from SoftmaxOverSpatialLayer but this would be very ugly, hacky, and also fail in some cases (e.g. by any automatic internal wrapping layers such as SelectSearchSourcesLayer). But is there a better more generic way? Somehow some way that the input can say "I'm already masked, padded values are 0, no need to mask again". Other layers like ReduceLayer could also use this information. The padded values are also relevant. For DotLayer we want 0. For ReduceLayer with sum or avg we want 0. For ReduceLayer with max or also SoftmaxOverSpatialLayer we want -inf. But then, I'm not sure if we are maybe making it too complicated in end.
DotLayer
should respect the dyn size (or seq mask) when you reduce dynamic axes, just likeReduceLayer
etc would do over dynamic axes. TheDotLayer
currently completely ignores this. This is fine when the user explicitly performs masking beforehand. E.g.SoftmaxOverSpatialLayer
will do the right thing. However, otherwise it will be wrong in general. This is esp also a problem for the concept of dynamic axes with extended dynamic sizes.Implementing this for
DotLayer
is not so nice, though, because this additional masking is not needed for the common case where there was aSoftmaxOverSpatialLayer
before. The additional masking is not wrong, but it would make it a bit slower. So we definitely want to avoid this. But how can we know when this can be skipped? It could explicitly check whether one of the inputs comes fromSoftmaxOverSpatialLayer
but this would be very ugly, hacky, and also fail in some cases (e.g. by any automatic internal wrapping layers such asSelectSearchSourcesLayer
). But is there a better more generic way? Somehow some way that the input can say "I'm already masked, padded values are 0, no need to mask again". Other layers likeReduceLayer
could also use this information. The padded values are also relevant. ForDotLayer
we want 0. ForReduceLayer
withsum
oravg
we want 0. ForReduceLayer
withmax
or alsoSoftmaxOverSpatialLayer
we want -inf. But then, I'm not sure if we are maybe making it too complicated in end.Originally posted by @albertz in #391 (comment)
The text was updated successfully, but these errors were encountered: