Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DotLayer, mask for dynamic axis #631

Merged
merged 10 commits into from
Sep 7, 2021
Merged

DotLayer, mask for dynamic axis #631

merged 10 commits into from
Sep 7, 2021

Conversation

albertz
Copy link
Member

@albertz albertz commented Sep 4, 2021

Fix/implement #629.

@albertz
Copy link
Member Author

albertz commented Sep 5, 2021

I was planning to add padding_value to Data (as discussed in #391). However, then I realized that the padding might be per dynamic dim, so rather a dict dim tag -> value.

Also, padding_value in Data has the problem that most existing code will just leave it as-is (via copy or get_kwargs) which is then incorrect. So it should better be the other way around, i.e. explicit. But still e.g. in copy? But even there, this might break the logic when placeholder gets reassigned. Except maybe when we reset it on every placeholder assignment. I'm not sure.

Or, we attach it to the tf.Tensor object directly. Like we do for other things. This is probably much easier and also more safe w.r.t. the behavior we want.

@Zettelkasten
Copy link
Member

Or, we attach it to the tf.Tensor object directly. Like we do for other things. This is probably much easier and also more safe w.r.t. the behavior we want.

I was about to argue for something entirely different, but I think this is the cleanest solution overall.
I think it's important to distinguish the tasks between get_out_data_from_opts and __init__ of a layer.
get_out_data_from_opts should really construct the template only (perhaps get_out_template_from_opts would be a better name?), and __init__ then the placeholder (and stuff that is related to the placeholder).
If padding_values was a direct property of Data, it would not be directly clear if should be set in get_out_data_from_opts or in __init__. Conceptually, __init__ makes more sense I think because this is really not part of the template, it's part of the placeholder itself. So making this a property of the tensor makes that super clear.
Another option would be to add this property to LayerBase instead, just like we do for right now for allow_inf_in_output. But that's inconvenient (all utility methods in Data couldn't automatically set this). I'd also argue that allow_inf_in_output should better be attached to the placeholder tensor itself for the same reasons.

One problem maybe that we have when we attach this to the tensor itself is that two Data objects might want to have the same placeholder, but different padding_values. For example if the user themselves is sure that some layer output is masked but RETURNN cannot figure this out, they might want to introduce a layer (or update ReinterpretDataLayer) to update this padding_values info. This layer would need a tf.identity op then. But that's fine I think because this is all after template construction within __init__.

@albertz
Copy link
Member Author

albertz commented Sep 6, 2021

One problem maybe that we have when we attach this to the tensor itself is that two Data objects might want to have the same placeholder, but different padding_values. For example if the user themselves is sure that some layer output is masked but RETURNN cannot figure this out, they might want to introduce a layer (or update ReinterpretDataLayer) to update this padding_values info.

I don't think it make sense that a tensor can have different padding values (per dyn dim tag).
(For simplicity, let's assume there is only a single dyn dim tag. Like in a standard [B,T,D] tensor.)
Either the padding value is undefined, unknown, or known to be a specific value. It cannot really be multiple different values.
It might make sense that the user can explicitly set this information (as you say, maybe via ReinterpretDataLayer or another new layer). But the logic should be: When it is unset, set it. When it is already set, assert that it is the same. Again with the reasoning that it cannot be sth different.
Or a layer like SeqLenMaskLayer will anyway explicitly overwrite the padding, and thus also set the padding value.

@albertz albertz marked this pull request as ready for review September 7, 2021 22:00
@albertz albertz requested a review from a team as a code owner September 7, 2021 22:00
@albertz albertz merged commit 8f5142b into master Sep 7, 2021
@albertz albertz deleted the albert-dot-mask branch September 7, 2021 22:14
@albertz
Copy link
Member Author

albertz commented Sep 7, 2021

Already merged now. But that doesn't mean we might want this to change in some way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants