Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify expected output shape for verification #706

Closed
albertz opened this issue Oct 1, 2021 · 12 comments · Fixed by #757
Closed

Specify expected output shape for verification #706

albertz opened this issue Oct 1, 2021 · 12 comments · Fixed by #757

Comments

@albertz
Copy link
Member

albertz commented Oct 1, 2021

This was one result of the discussion in #597, which is somewhat orthogonal to #597 though.

The proposal was to introduce the out_shape option, which is of type set[DimensionTag].
It's a set to emphasize that the order should never matter.
Although for scalars, we might want to allow () as well.
Maybe we can potentially also extend this, to allow integers (for static dimensions) and strings (e.g. "B", via get_axis_from_description).

This would only be used for verification.

@albertz
Copy link
Member Author

albertz commented Oct 21, 2021

Another open question: Should this also include extra (implicit) dimensions from dyn_size_ext of other axes?
(@Zettelkasten maybe you have some opinion on this?)

@albertz
Copy link
Member Author

albertz commented Oct 21, 2021

Also: Should this include a potential (implicit) sparse dimension (when sparse=True)?
(Related: #722)

@albertz
Copy link
Member Author

albertz commented Oct 21, 2021

When we add those implicit dimensions (sparse_dim or from dyn_size_ext), would we enforce them to be specified?

Or do we want to have two variants of out_shape here, one which includes the implicit dimensions (maybe call this one out_implicit_shape), and one which only contains those dims which are really present?

@Zettelkasten
Copy link
Member

Yea, I think being able to specify these implicit dims too would be good.
In general, I think this should essentially specify the things that Data.__repr__ contains too.

When we add those implicit dimensions (sparse_dim or from dyn_size_ext), would we enforce them to be specified?

Or do we want to have two variants of out_shape here, one which includes the implicit dimensions (maybe call this one out_implicit_shape), and one which only contains those dims which are really present?

I'm not sure, probably it's not too annoying to have to explicitly specify all implicit dimensions. There aren't used too frequently anyway, and are easy to specify.

Apart from including implicit dims somehow, I think it's important to also mark them as being virtual.
E.g. like you proposed by having the out_shape / out_implicit_shape distinction, but repeating the non-implicit dims in out_implicit_shape seems annoying. Or would out_implicit_shape not include the non-implicit dims? (maybe then the name seems confusing?)

I think I like it more if you only have one shape. Just like you would write it out on paper or in comments - it should be relatively simple to understand and intuitive.
For the sparse_dim, it is anyway explicit that this dim is virtual.
For implicit dims from dyn_size_ext, maybe we can mark them as virtual (e.g. out_shape={batch_dim, trg_dim, Virtual(src_dim)}).
Or, adapt the syntax we currently have for the Data.__repr__ output, something like out_shape={batch_dim, trg_dim[batch_dim, src_dim]} by overloading the index operator.
With this approach you could just skip specifying dyn_size_ext however you feel like.
Or, in returnn-common, we could even parse any arbitrary string. And then map this to whatever in RETURNN here. (But than you don't get IDE type hints and so, that's annoying).

@albertz
Copy link
Member Author

albertz commented Oct 21, 2021

Another aspect:

When we now discuss about multiple options (not just out_shape), maybe we can actually reuse or extend the existing out_type, which serves a very similar purpose. Currently, out_type is used in some rare cases to define the shape (e.g. when you use x: {class: eval, from: [prev:x], ...}, it must be defined somehow) but otherwise for verification (although currently many layers also just ignore it).

The options of out_type are supposed to be compatible to Data. Which makes it maybe a bit unintuitive (batch_shape vs shape etc).

So, the proposed out_shape, is kind to equivalent to using out_type: {"dim_tags": out_shape}. Although not exactly when we allow to mix it with integers. But we could extend or relax out_type.

But maybe it is simpler for the user to have it as a separate option out_shape.

@albertz
Copy link
Member Author

albertz commented Nov 9, 2021

I think we agreed that when implicit dimensions are specified, they should be marked clearly as implicit somehow. E.g. like:

out_simplicit_shape = {batch, time, Implicit(classes)}

To clarify:

Do we want out_shape without implicit dimensions? Or with implicit dimensions? (Optional implicit dimensions in out_shape would not make sense.)

Do we want out_implicit_shape in addition, with implicit dimension?

Do we want some other options to separately specify implicit dimension? Maybe out_implicit_dims? This would not need the Implicit(...) from above.

So maybe out_shape without implicit dimensions, and out_implicit_dims? Both are optional anyway.

Or also additionally out_implicit_shape? Or is this too much?

Or only out_implicit_shape? But then we could also rename it to out_shape and require that one to also require implicit dimensions.

@Zettelkasten
Copy link
Member

Zettelkasten commented Nov 9, 2021

I would want to make it as easy as possible to specify these explicit and implicit dims for the user.
So in particular, I would avoid that you need to specify an implicit dim twice. Instead, I prefer

  1. just having one big out_shape with both explicit + implicit dims, but implicit ones marked as Implicit(dim_tag) or so.
  2. Or maybe have out_shape + out_implicit_dims (which both are disjoint sets).

For Returnn Common, I would still find it very cool to use type annotations to specify this out_shape. Then, something like solution (1) makes more sense I think. But Returnn Common can of course map this to whatever format Returnn expects.
But for Returnn itself, where you specify the layer as a dict, I think (2) is slightly nicer? With out_implicit_dims=set() being the default?

Do we want to force the user to specify implicit dims if they specify out_shape? I tend to think yes (if you think about recurrent self attention e.g., it's super important that the keys tensor has an implicit query-time dim. I think, if I were to write this down on paper, I would also include it always.)

@albertz
Copy link
Member Author

albertz commented Nov 11, 2021

I would want to make it as easy as possible to specify these explicit and implicit dims for the user. So in particular, I would avoid that you need to specify an implicit dim twice. Instead, I prefer

  1. just having one big out_shape with both explicit + implicit dims, but implicit ones marked as Implicit(dim_tag) or so.
  2. Or maybe have out_shape + out_implicit_dims (which both are disjoint sets).

Yes. That's basically also what I suggested in my last post. So, which of these do you prefer? The second variant is slightly more flexible, as a user might just define out_shape and not out_implicit_dims. In the first variant, the user always have to specify both. Although, maybe that's a good thing?

For Returnn Common, I would still find it very cool to use type annotations to specify this out_shape. Then, something like solution (1) makes more sense I think.

It's still early in its design phase. If you have some specific suggestion, please open some issue in returnn_common. Very related is this: rwth-i6/returnn_common#17

But Returnn Common can of course map this to whatever format Returnn expects.

Yes.

But for Returnn itself, where you specify the layer as a dict, I think (2) is slightly nicer? With out_implicit_dims=set() being the default?

No, the default would be None or NotSpecified, which implies to not check this. A set() would imply that there are no implicit dims.

Do we want to force the user to specify implicit dims if they specify out_shape? I tend to think yes (if you think about recurrent self attention e.g., it's super important that the keys tensor has an implicit query-time dim. I think, if I were to write this down on paper, I would also include it always.)

If we would do that, then I think we should just have one single out_implicit_shape. Or we could also rename that to out_shape for simplicity and include the implicit dims in there as well (but specifically marked via implicit or Implicit or so).

Yes, I think this is good.

The Implicit, this would be a class? Which wraps a DimensionTag. The hash would be sth like hash((type(self), self.dim_tag)) and __eq__ accordingly.

Should we differ between sparse dims and dyn size ext implicit dims? So then have one _ImplicitBase, and derived ImplicitSparseDim(...) and ImplicitDynSizeDim(...)?

@albertz
Copy link
Member Author

albertz commented Nov 12, 2021

One other small thing, just to mention explicitly: The options we describe here are always for the canonical cases, not taking any optimization or other things into account. When RETURNN does some automatic optimization (e.g. moving out of rec layer), any layers should maybe adapt the corresponding options (including out_shape or so). Similar as DotLayer.transform_config_dict is doing it.

@albertz
Copy link
Member Author

albertz commented Nov 14, 2021

So, out_shape would contain all the dims, implicit and explicit. And implicit dims are specially marked, via ImplicitSparseDim or ImplicitDynSizeDim.

I wonder whether we should maybe this special marking optional. So e.g. for some sparse tensor, you could specify both {batch, time, ImplicitSparseDim(classes)} or also {batch, time, classes}.

This could maybe allow for future optimizations which make some dims implicit, or maybe other variants, like the dim is still there but more compressed, or so.

Not sure. (I think I also had other reasons in mind, although I cannot remember right now.)

@albertz
Copy link
Member Author

albertz commented Nov 14, 2021

Another related thing: The batch dim is kind of special, and can itself contain multiple dims (all handled in BatchInfo, called "virtual dims"). It could also e.g. be a packed version of the time dim. Or it can be a beam. And maybe other things.

I wonder if out_shape should also handle that in some way or not. The beam, or a packed dyn dim, or other things embedded inside the batch could also be interpret as other implicit dims? Or maybe not...

It's also a question on the equality of dim tags (#634). When you specify batch, does this match to just any batch dim (no matter if it has another beam, or whatever)? Currently all dim tags of kind batch are all treated as equal to each other, ignoring this. The beam search logic anyway handles this externally such that all inputs to some layer get automatically converted to use the same beam, so this logic makes sure that the batch dim is indeed the same in the end. Although this would not handle other embedded things in the batch dim...

@albertz
Copy link
Member Author

albertz commented Nov 22, 2021

Another thing to clarify: Without enforced unique dim tags (#632), it could happen that the set over dim tags (or implicit dim tags) collapses multiple dims. However, we want that out_shape is specified as a set (unless empty, where anything else is fine as well), and thus it cannot really handle that case (or rather, it would hide maybe one dim).

My opinion is that this is fine anyway, and then later enforced unique dim tags (#632) would anyway fix this.

albertz added a commit that referenced this issue Nov 22, 2021
albertz added a commit that referenced this issue Nov 22, 2021
albertz added a commit that referenced this issue Nov 22, 2021
albertz added a commit that referenced this issue Nov 22, 2021
albertz added a commit that referenced this issue Nov 22, 2021
albertz added a commit that referenced this issue Nov 22, 2021
albertz added a commit that referenced this issue Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants