Specify expected output shape for verification #706

albertz · 2021-10-01T10:38:07Z

This was one result of the discussion in #597, which is somewhat orthogonal to #597 though.

The proposal was to introduce the out_shape option, which is of type set[DimensionTag].
It's a set to emphasize that the order should never matter.
Although for scalars, we might want to allow () as well.
Maybe we can potentially also extend this, to allow integers (for static dimensions) and strings (e.g. "B", via get_axis_from_description).

This would only be used for verification.

The text was updated successfully, but these errors were encountered:

albertz · 2021-10-21T09:18:53Z

Another open question: Should this also include extra (implicit) dimensions from dyn_size_ext of other axes?
(@Zettelkasten maybe you have some opinion on this?)

albertz · 2021-10-21T09:19:44Z

Also: Should this include a potential (implicit) sparse dimension (when sparse=True)?
(Related: #722)

albertz · 2021-10-21T11:06:59Z

When we add those implicit dimensions (sparse_dim or from dyn_size_ext), would we enforce them to be specified?

Or do we want to have two variants of out_shape here, one which includes the implicit dimensions (maybe call this one out_implicit_shape), and one which only contains those dims which are really present?

Zettelkasten · 2021-10-21T11:33:47Z

Yea, I think being able to specify these implicit dims too would be good.
In general, I think this should essentially specify the things that Data.__repr__ contains too.

When we add those implicit dimensions (sparse_dim or from dyn_size_ext), would we enforce them to be specified?

Or do we want to have two variants of out_shape here, one which includes the implicit dimensions (maybe call this one out_implicit_shape), and one which only contains those dims which are really present?

I'm not sure, probably it's not too annoying to have to explicitly specify all implicit dimensions. There aren't used too frequently anyway, and are easy to specify.

Apart from including implicit dims somehow, I think it's important to also mark them as being virtual.
E.g. like you proposed by having the out_shape / out_implicit_shape distinction, but repeating the non-implicit dims in out_implicit_shape seems annoying. Or would out_implicit_shape not include the non-implicit dims? (maybe then the name seems confusing?)

I think I like it more if you only have one shape. Just like you would write it out on paper or in comments - it should be relatively simple to understand and intuitive.
For the sparse_dim, it is anyway explicit that this dim is virtual.
For implicit dims from dyn_size_ext, maybe we can mark them as virtual (e.g. out_shape={batch_dim, trg_dim, Virtual(src_dim)}).
Or, adapt the syntax we currently have for the Data.__repr__ output, something like out_shape={batch_dim, trg_dim[batch_dim, src_dim]} by overloading the index operator.
With this approach you could just skip specifying dyn_size_ext however you feel like.
Or, in returnn-common, we could even parse any arbitrary string. And then map this to whatever in RETURNN here. (But than you don't get IDE type hints and so, that's annoying).

albertz · 2021-10-21T11:56:47Z

Another aspect:

When we now discuss about multiple options (not just out_shape), maybe we can actually reuse or extend the existing out_type, which serves a very similar purpose. Currently, out_type is used in some rare cases to define the shape (e.g. when you use x: {class: eval, from: [prev:x], ...}, it must be defined somehow) but otherwise for verification (although currently many layers also just ignore it).

The options of out_type are supposed to be compatible to Data. Which makes it maybe a bit unintuitive (batch_shape vs shape etc).

So, the proposed out_shape, is kind to equivalent to using out_type: {"dim_tags": out_shape}. Although not exactly when we allow to mix it with integers. But we could extend or relax out_type.

But maybe it is simpler for the user to have it as a separate option out_shape.

albertz · 2021-11-09T16:30:40Z

I think we agreed that when implicit dimensions are specified, they should be marked clearly as implicit somehow. E.g. like:

out_simplicit_shape = {batch, time, Implicit(classes)}

To clarify:

Do we want out_shape without implicit dimensions? Or with implicit dimensions? (Optional implicit dimensions in out_shape would not make sense.)

Do we want out_implicit_shape in addition, with implicit dimension?

Do we want some other options to separately specify implicit dimension? Maybe out_implicit_dims? This would not need the Implicit(...) from above.

So maybe out_shape without implicit dimensions, and out_implicit_dims? Both are optional anyway.

Or also additionally out_implicit_shape? Or is this too much?

Or only out_implicit_shape? But then we could also rename it to out_shape and require that one to also require implicit dimensions.

Zettelkasten · 2021-11-09T17:37:07Z

I would want to make it as easy as possible to specify these explicit and implicit dims for the user.
So in particular, I would avoid that you need to specify an implicit dim twice. Instead, I prefer

just having one big out_shape with both explicit + implicit dims, but implicit ones marked as Implicit(dim_tag) or so.
Or maybe have out_shape + out_implicit_dims (which both are disjoint sets).

For Returnn Common, I would still find it very cool to use type annotations to specify this out_shape. Then, something like solution (1) makes more sense I think. But Returnn Common can of course map this to whatever format Returnn expects.
But for Returnn itself, where you specify the layer as a dict, I think (2) is slightly nicer? With out_implicit_dims=set() being the default?

Do we want to force the user to specify implicit dims if they specify out_shape? I tend to think yes (if you think about recurrent self attention e.g., it's super important that the keys tensor has an implicit query-time dim. I think, if I were to write this down on paper, I would also include it always.)

albertz · 2021-11-11T09:20:29Z

I would want to make it as easy as possible to specify these explicit and implicit dims for the user. So in particular, I would avoid that you need to specify an implicit dim twice. Instead, I prefer

just having one big out_shape with both explicit + implicit dims, but implicit ones marked as Implicit(dim_tag) or so.

Or maybe have out_shape + out_implicit_dims (which both are disjoint sets).

Yes. That's basically also what I suggested in my last post. So, which of these do you prefer? The second variant is slightly more flexible, as a user might just define out_shape and not out_implicit_dims. In the first variant, the user always have to specify both. Although, maybe that's a good thing?

For Returnn Common, I would still find it very cool to use type annotations to specify this out_shape. Then, something like solution (1) makes more sense I think.

It's still early in its design phase. If you have some specific suggestion, please open some issue in returnn_common. Very related is this: rwth-i6/returnn_common#17

But Returnn Common can of course map this to whatever format Returnn expects.

Yes.

But for Returnn itself, where you specify the layer as a dict, I think (2) is slightly nicer? With out_implicit_dims=set() being the default?

No, the default would be None or NotSpecified, which implies to not check this. A set() would imply that there are no implicit dims.

Do we want to force the user to specify implicit dims if they specify out_shape? I tend to think yes (if you think about recurrent self attention e.g., it's super important that the keys tensor has an implicit query-time dim. I think, if I were to write this down on paper, I would also include it always.)

If we would do that, then I think we should just have one single out_implicit_shape. Or we could also rename that to out_shape for simplicity and include the implicit dims in there as well (but specifically marked via implicit or Implicit or so).

Yes, I think this is good.

The Implicit, this would be a class? Which wraps a DimensionTag. The hash would be sth like hash((type(self), self.dim_tag)) and __eq__ accordingly.

Should we differ between sparse dims and dyn size ext implicit dims? So then have one _ImplicitBase, and derived ImplicitSparseDim(...) and ImplicitDynSizeDim(...)?

albertz · 2021-11-12T10:56:08Z

One other small thing, just to mention explicitly: The options we describe here are always for the canonical cases, not taking any optimization or other things into account. When RETURNN does some automatic optimization (e.g. moving out of rec layer), any layers should maybe adapt the corresponding options (including out_shape or so). Similar as DotLayer.transform_config_dict is doing it.

albertz · 2021-11-14T22:08:30Z

So, out_shape would contain all the dims, implicit and explicit. And implicit dims are specially marked, via ImplicitSparseDim or ImplicitDynSizeDim.

I wonder whether we should maybe this special marking optional. So e.g. for some sparse tensor, you could specify both {batch, time, ImplicitSparseDim(classes)} or also {batch, time, classes}.

This could maybe allow for future optimizations which make some dims implicit, or maybe other variants, like the dim is still there but more compressed, or so.

Not sure. (I think I also had other reasons in mind, although I cannot remember right now.)

albertz · 2021-11-14T22:17:53Z

Another related thing: The batch dim is kind of special, and can itself contain multiple dims (all handled in BatchInfo, called "virtual dims"). It could also e.g. be a packed version of the time dim. Or it can be a beam. And maybe other things.

I wonder if out_shape should also handle that in some way or not. The beam, or a packed dyn dim, or other things embedded inside the batch could also be interpret as other implicit dims? Or maybe not...

It's also a question on the equality of dim tags (#634). When you specify batch, does this match to just any batch dim (no matter if it has another beam, or whatever)? Currently all dim tags of kind batch are all treated as equal to each other, ignoring this. The beam search logic anyway handles this externally such that all inputs to some layer get automatically converted to use the same beam, so this logic makes sure that the batch dim is indeed the same in the end. Although this would not handle other embedded things in the batch dim...

albertz · 2021-11-22T00:19:18Z

Another thing to clarify: Without enforced unique dim tags (#632), it could happen that the set over dim tags (or implicit dim tags) collapses multiple dims. However, we want that out_shape is specified as a set (unless empty, where anything else is fine as well), and thus it cannot really handle that case (or rather, it would hide maybe one dim).

My opinion is that this is fine anyway, and then later enforced unique dim tags (#632) would anyway fix this.

#706

Fix #706

Test for #706

#706

Fix #706

Test for #706

#706

Fix #706

Test for #706

albertz mentioned this issue Oct 21, 2021

Loop.State incomplete rwth-i6/returnn_common#35

Closed

albertz mentioned this issue Oct 21, 2021

Data: sparse_dim: DimensionTag to unify sparse, dim, vocab #722

Closed

This was referenced Nov 8, 2021

DotLayer defaults should not use integers, change defaults #627

Closed

DotLayer argument simplifications #739

Closed

Zettelkasten mentioned this issue Nov 8, 2021

Generalize DotLayer to allow sparse inputs #741

Closed

albertz added a commit that referenced this issue Nov 22, 2021

Data.dim_tags_set, Data.verify_out_shape

a0f33c1

#706

albertz added a commit that referenced this issue Nov 22, 2021

Layer out_shape option for verification

5d24559

Fix #706

albertz added a commit that referenced this issue Nov 22, 2021

test_out_shape

3235490

Test for #706

albertz mentioned this issue Nov 22, 2021

Layer out_shape option for verification #757

Merged

albertz added a commit that referenced this issue Nov 22, 2021

Data.dim_tags_set, Data.verify_out_shape

6706707

#706

albertz added a commit that referenced this issue Nov 22, 2021

Layer out_shape option for verification

86c846f

Fix #706

albertz added a commit that referenced this issue Nov 22, 2021

test_out_shape

5822bee

Test for #706

albertz closed this as completed in #757 Nov 22, 2021

albertz added a commit that referenced this issue Nov 22, 2021

Data.dim_tags_set, Data.verify_out_shape

81525df

#706

albertz added a commit that referenced this issue Nov 22, 2021

Layer out_shape option for verification

c99586e

Fix #706

albertz added a commit that referenced this issue Nov 22, 2021

test_out_shape

4ca13ba

Test for #706

albertz mentioned this issue Oct 19, 2022

Data verify_out_shape not intuitive when there are implicit dims #1153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify expected output shape for verification #706

Specify expected output shape for verification #706

albertz commented Oct 1, 2021

albertz commented Oct 21, 2021

albertz commented Oct 21, 2021

albertz commented Oct 21, 2021

Zettelkasten commented Oct 21, 2021

albertz commented Oct 21, 2021

albertz commented Nov 9, 2021 •

edited

Loading

Zettelkasten commented Nov 9, 2021 •

edited

Loading

albertz commented Nov 11, 2021 •

edited

Loading

albertz commented Nov 12, 2021

albertz commented Nov 14, 2021

albertz commented Nov 14, 2021

albertz commented Nov 22, 2021

Specify expected output shape for verification #706

Specify expected output shape for verification #706

Comments

albertz commented Oct 1, 2021

albertz commented Oct 21, 2021

albertz commented Oct 21, 2021

albertz commented Oct 21, 2021

Zettelkasten commented Oct 21, 2021

albertz commented Oct 21, 2021

albertz commented Nov 9, 2021 • edited Loading

Zettelkasten commented Nov 9, 2021 • edited Loading

albertz commented Nov 11, 2021 • edited Loading

albertz commented Nov 12, 2021

albertz commented Nov 14, 2021

albertz commented Nov 14, 2021

albertz commented Nov 22, 2021

albertz commented Nov 9, 2021 •

edited

Loading

Zettelkasten commented Nov 9, 2021 •

edited

Loading

albertz commented Nov 11, 2021 •

edited

Loading