Ideas for RETURNN behaviour changes #508

JackTemaki · 2021-05-19T14:55:28Z

As there are now many different ways (as in parameter names, syntax styles etc...) to make RETURNN do the same things, it might be a good idea to add a behaviour_version parameter to prohibit using "deprecated" parameters and suboptimal behaviour of RETURNN. It can also be used to change the default behaviour of some layers which is currently suboptimal, such as the batch norm layers. With each desired change (e.g. commit or PR) to the behaviour, the version number would have to be increased by one.

Here are some things we already recommend to avoid according to the documentation, e.g.:

num_inputs and num_outputs should not be used in favor of extern_data
using the "dict" syntax for extern_data instead of the tuple syntax
prohibit using old theano parameters such as "multiprocessing" when working with tensorflow
prohibit direct optimizer parameters, so writing optimizer = {'class': "adam"} or maybe optimizer="adam" instead of adam=True
always use explicit input and target definitions (no default fallback to "data" or "classes")

Then there are other redundant parameters which I can not name explicitely, but I think we have:

overlapping/redundant parameters names for the dataset (e.g. eval versus search_data or so)
learning_rate vs learning_rates

And then we have sub-optimal default parameters, that should be changed or not have a default in the first place, e.g.:

batch norm layer parameters
ExtractAudioFeatures should have no defaults, so that the feature settings are clearly visible in the config
no default lstm unit for rec layer

Other things I would personally recommend doing in configs, e.g.:

always provide an explicit available_for_inference for the extern_data entries
always use seq_order_control_dataset within a MetaDataset
not using out_type for any layer except maybe eval layer. Although here I am not sure if this has any major implications. Maybe there are some networks with rather instable construction and you want to have this fixed by hand.

There is probably a lot more which needs to be collected/discussed, so this is just intended to give some initial ideas.

The text was updated successfully, but these errors were encountered:

albertz · 2021-05-19T15:34:18Z

As discussed, this is just an example list. For every individual potential change, we should make a separate issue, and discuss things there. Every such issue should use the potential-new-behavior tag (and maybe reference this issue here).

This issue here probably can be closed once we have a first version of behaviour_version.
The setup of behaviour_version can probably be done in a similar way as use_tensorflow.
(Btw, use_tensorflow is also a default we can change by this.)

JackTemaki · 2021-05-21T09:03:10Z

Here is also a full list of all possible parameters:
https://gist.github.com/JackTemaki/b8c23b8eba61e4951ea22561f30e54a1

Many of those only belong to the Theano backend, and can possibly be ignored

albertz · 2021-05-21T09:07:45Z

I would just start by one single change (e.g. num_inputs/num_outputs + disallowing tuples) and introduce behaviour_version=1 for that.

Also, I would prefer to keep each increment of behaviour_version mostly for a single logical change, and not mix up different things (eg optimizer and num_inputs are totally independent of each other, should not be mixed up).

albertz · 2021-05-21T09:31:09Z

Btw, as this came up now with a proposed change on enforcing available_for_inference, but I want to put this more general:

In principle, I would say that it should be easy for most users to use their existing setups and increase behaviour_version, and if they have followed good practice and our general recommendations, it would all still work (maybe just get better, by better random init defaults or so).

I'm not sure on changes which would break the majority of setups. E.g. enforcing available_for_inference is such an example. I think really no-one explicitly specifies this (unless really needed, when the default is wrong). (Let's not discuss available_for_inference specifically here but leave that to a separate issue.)

So for a user which wants to keep up by using always the latest behaviour_version, how simple should that be for him/her? Should we still try to keep this simple, and avoid breaking changes unless for very good reasons? But maybe this just has to be discussed and decided for each change individually, and there is no generic answer.

albertz · 2021-05-21T09:34:20Z

Ah, another thing: We should have a dedicated documentation for this. Where users can read what is being changed for each behavior_version. (Maybe similar as the changelog.)

albertz · 2021-05-21T11:45:54Z

Btw, one other idea, maybe should be discussed now before we start working on any of this:

Instead of having behavior_version (an integer), maybe we should explicitly (or additionally) have flags for each change?

I wonder for the case of a Sisyphus setup which consists of many experiments, where maybe some older experiments use some older behavior_version, and some others use a newer behavior_version. And when Sisyphus interacts with these configs, some interactions might then also depend on the behavior_version (e.g. random example: the search job needs to use search_data, and sth changed there). But then this might clash. Sisyphus is just an example. There might be other cases maybe (not sure how often, or how difficult it is to deal with that). If we also have explicit flags, we can maybe overwrite some of this, like behavior_enable = {flag_x} or so.

Just an idea. Maybe can also be added later.

JackTemaki · 2021-05-21T14:23:35Z

This is not what I meant. I meant that e.g. some job, e.g. search job, might require some specific behavior

Ah okay, no I do not think this will ever become a problem. And if for some reason this should ever happen, the job can just get an assert that the behaviour_version needs to be higher and thats it.

albertz · 2021-05-25T19:46:10Z

Another aspect: Sometimes (or even often), the discussion will not be about behavior change, but just about removing some functionality, in favor of some better way.

We don't strictly need to disallow this then. We can just put a deprecation warning with information what to use instead.

The deprecation warning should probably be printed in any case, no matter the behavior_version. So the question is just, should sth strictly be disallowed with a new behavior_version or is the deprecation warning enough?

Maybe this again has to be decided just individually per case.

JackTemaki · 2021-05-26T13:21:14Z

I would prefer to strictly disallow this with new behavior versions, because warnings are often missed.

This is of course no reason to not add warnings to everything "deprecated" even before adding something as behavior version.

I think this is what we discuss in each issue: should we put only a warning, or should this be added as new behavior version and then be strictly enforced.

albertz · 2021-11-09T13:28:45Z

I think all the open questions here have been clarified, and we are already having the first couple of behavior versions. So we can close this.

This is required to ensure that layers can reorder axes in whatever way to allow for potential optimizations. Nothing should depend on the order of axes. See the [RETURNN principles](https://github.com/rwth-i6/returnn/wiki/RETURNN-principles). This is also for #792 to allow for an easier transition. This introduces a new behavior version (#508). While it probably requires changes for many configs, the changes should still be quite simple.

Fix #787. Introduces new behavior version (#508).

Fix #666. New behavior version 11 (#508).

Fix #522 This introduces a new behavior version (#508).

albertz · 2022-01-18T16:31:34Z

PyTorch might introduce sth similar (specifically for their param init defaults): pytorch/pytorch#41638

torch.nn.init.set_init_version('1.7.1')

Edit They decided against it:

It doesn't look to me like this would fully work, so I'd be inclined to close it, even if I still think that this is one of PyTorch's murkier bits.

A check on matching time dim of RecLayer sub output layer to the RecLayer time dim. Fix #1140 This introduces a new behavior version 13 (#508).

JackTemaki added the potential-new-behavior Discussions about RETURNN behaviour label May 19, 2021

albertz mentioned this issue May 20, 2021

Subnetwork: add construction warning #506

Closed

JackTemaki closed this as completed May 21, 2021

JackTemaki reopened this May 21, 2021

This comment has been minimized.

Sign in to view

JackTemaki mentioned this issue May 21, 2021

No default epsilon of 1e-16 for all optimizers #525

Open

albertz mentioned this issue May 25, 2021

SubnetworkLayer: deprecate from with concat_sources=False #531

Open

albertz closed this as completed Nov 9, 2021

albertz added a commit that referenced this issue Dec 7, 2021

ConvLayer, with_bias=True by default

e132494

Fix #787. Introduces new behavior version (#508).

albertz mentioned this issue Dec 7, 2021

ConvLayer, with_bias=True by default #842

Merged

albertz added a commit that referenced this issue Dec 7, 2021

ConvLayer, with_bias=True by default (#842)

4ac9998

Fix #787. Introduces new behavior version (#508).

albertz mentioned this issue Dec 16, 2021

Fix #666, broadcast no longer matches dims #864

Merged

albertz pushed a commit that referenced this issue Dec 17, 2021

Fix #666, broadcast no longer matches dims (#864)

af7a588

Fix #666. New behavior version 11 (#508).

albertz added a commit that referenced this issue Jan 6, 2022

batch norm, new defaults

f663348

Fix #522 This introduces a new behavior version (#508).

albertz mentioned this issue Jan 6, 2022

batch norm, new defaults #899

Merged

albertz added a commit that referenced this issue Jan 6, 2022

batch norm, new defaults

6a46ae9

Fix #522 This introduces a new behavior version (#508).

albertz added a commit that referenced this issue Jan 6, 2022

batch norm, new defaults (#899)

fe591de

Fix #522 This introduces a new behavior version (#508).

albertz mentioned this issue Apr 13, 2022

Allophone State Fsa bug for arc's weight leaving silence rwth-i6/rasr#14

Merged

albertz mentioned this issue Oct 13, 2022

Warning: assuming dim tags are same with different size placeholders - make exception with new behavior version #1141

Open

albertz added a commit that referenced this issue Oct 13, 2022

Recover old relaxed behavior, strict with new behavior version

f239e08

A check on matching time dim of RecLayer sub output layer to the RecLayer time dim. Fix #1140 This introduces a new behavior version 13 (#508).

albertz mentioned this issue Oct 13, 2022

Recover relaxed behavior, strict with new behavior version #1144

Merged

albertz added a commit that referenced this issue Oct 13, 2022

Recover relaxed behavior, strict with new behavior version (#1144)

f38f5ae

A check on matching time dim of RecLayer sub output layer to the RecLayer time dim. Fix #1140 This introduces a new behavior version 13 (#508).

This was referenced Oct 14, 2022

learning_rate_control_error_measure should match exactly #1146

Open

Change warning for placeholder size to exception #1143

Open

DotLayer fix incorrect dim matching #1155

Merged

albertz mentioned this issue Nov 14, 2022

Behavior version? rwth-i6/returnn_common#247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for RETURNN behaviour changes #508

Ideas for RETURNN behaviour changes #508

JackTemaki commented May 19, 2021

albertz commented May 19, 2021

JackTemaki commented May 21, 2021 •

edited

Loading

albertz commented May 21, 2021 •

edited by JackTemaki

Loading

albertz commented May 21, 2021

albertz commented May 21, 2021

albertz commented May 21, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

JackTemaki commented May 21, 2021

albertz commented May 25, 2021

JackTemaki commented May 26, 2021

albertz commented Nov 9, 2021

albertz commented Jan 18, 2022 •

edited

Loading

Ideas for RETURNN behaviour changes #508

Ideas for RETURNN behaviour changes #508

Comments

JackTemaki commented May 19, 2021

albertz commented May 19, 2021

JackTemaki commented May 21, 2021 • edited Loading

albertz commented May 21, 2021 • edited by JackTemaki Loading

albertz commented May 21, 2021

albertz commented May 21, 2021

albertz commented May 21, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

JackTemaki commented May 21, 2021

albertz commented May 25, 2021

JackTemaki commented May 26, 2021

albertz commented Nov 9, 2021

albertz commented Jan 18, 2022 • edited Loading

JackTemaki commented May 21, 2021 •

edited

Loading

albertz commented May 21, 2021 •

edited by JackTemaki

Loading

albertz commented Jan 18, 2022 •

edited

Loading