DotLayer, mask for dynamic axis #631

albertz · 2021-09-04T21:54:32Z

Fix/implement #629.

albertz · 2021-09-05T20:55:36Z

I was planning to add padding_value to Data (as discussed in #391). However, then I realized that the padding might be per dynamic dim, so rather a dict dim tag -> value.

Also, padding_value in Data has the problem that most existing code will just leave it as-is (via copy or get_kwargs) which is then incorrect. So it should better be the other way around, i.e. explicit. But still e.g. in copy? But even there, this might break the logic when placeholder gets reassigned. Except maybe when we reset it on every placeholder assignment. I'm not sure.

Or, we attach it to the tf.Tensor object directly. Like we do for other things. This is probably much easier and also more safe w.r.t. the behavior we want.

Zettelkasten · 2021-09-05T23:39:37Z

Or, we attach it to the tf.Tensor object directly. Like we do for other things. This is probably much easier and also more safe w.r.t. the behavior we want.

I was about to argue for something entirely different, but I think this is the cleanest solution overall.
I think it's important to distinguish the tasks between get_out_data_from_opts and __init__ of a layer.
get_out_data_from_opts should really construct the template only (perhaps get_out_template_from_opts would be a better name?), and __init__ then the placeholder (and stuff that is related to the placeholder).
If padding_values was a direct property of Data, it would not be directly clear if should be set in get_out_data_from_opts or in __init__. Conceptually, __init__ makes more sense I think because this is really not part of the template, it's part of the placeholder itself. So making this a property of the tensor makes that super clear.
Another option would be to add this property to LayerBase instead, just like we do for right now for allow_inf_in_output. But that's inconvenient (all utility methods in Data couldn't automatically set this). I'd also argue that allow_inf_in_output should better be attached to the placeholder tensor itself for the same reasons.

One problem maybe that we have when we attach this to the tensor itself is that two Data objects might want to have the same placeholder, but different padding_values. For example if the user themselves is sure that some layer output is masked but RETURNN cannot figure this out, they might want to introduce a layer (or update ReinterpretDataLayer) to update this padding_values info. This layer would need a tf.identity op then. But that's fine I think because this is all after template construction within __init__.

albertz · 2021-09-06T21:56:26Z

One problem maybe that we have when we attach this to the tensor itself is that two Data objects might want to have the same placeholder, but different padding_values. For example if the user themselves is sure that some layer output is masked but RETURNN cannot figure this out, they might want to introduce a layer (or update ReinterpretDataLayer) to update this padding_values info.

I don't think it make sense that a tensor can have different padding values (per dyn dim tag).
(For simplicity, let's assume there is only a single dyn dim tag. Like in a standard [B,T,D] tensor.)
Either the padding value is undefined, unknown, or known to be a specific value. It cannot really be multiple different values.
It might make sense that the user can explicitly set this information (as you say, maybe via ReinterpretDataLayer or another new layer). But the logic should be: When it is unset, set it. When it is already set, assert that it is the same. Again with the reasoning that it cannot be sth different.
Or a layer like SeqLenMaskLayer will anyway explicitly overwrite the padding, and thus also set the padding value.

Fix #629

albertz · 2021-09-07T22:15:41Z

Already merged now. But that doesn't mean we might want this to change in some way.

DotLayer, extended dim tags checks, cleanup

d830aea

albertz added 9 commits September 7, 2021 23:57

test_GenericAttentionLayer* some dyn dim tag fixes

ecf10b6

test_DotLayer* small code style fixes

90684db

Data auto_create_placeholders also for explicit dim_tags

9519702

padding info in tensor, and mask_dyn_seq_len_nd

8198940

SoftmaxOverSpatialLayer, set padding info

43fbbc4

SeqLenMaskLayer, set padding info

a4d85ee

DotLayer, mask dyn axes when needed

61a7f70

Fix #629

test_DotLayer_mask_dyn_seq

b1dcb4e

test_DotLayer_mask_dyn_seq_after_softmax

de1e559

albertz force-pushed the albert-dot-mask branch from 8283ab5 to de1e559 Compare September 7, 2021 21:59

albertz marked this pull request as ready for review September 7, 2021 22:00

albertz requested a review from a team as a code owner September 7, 2021 22:00

albertz requested a review from Zettelkasten September 7, 2021 22:00

albertz merged commit 8f5142b into master Sep 7, 2021

albertz deleted the albert-dot-mask branch September 7, 2021 22:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DotLayer, mask for dynamic axis #631

DotLayer, mask for dynamic axis #631

albertz commented Sep 4, 2021

albertz commented Sep 5, 2021

Zettelkasten commented Sep 5, 2021

albertz commented Sep 6, 2021

albertz commented Sep 7, 2021

DotLayer, mask for dynamic axis #631

DotLayer, mask for dynamic axis #631

Conversation

albertz commented Sep 4, 2021

albertz commented Sep 5, 2021

Zettelkasten commented Sep 5, 2021

albertz commented Sep 6, 2021

albertz commented Sep 7, 2021