Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allophone State Fsa bug for arc's weight leaving silence #14

Merged
merged 8 commits into from
Dec 2, 2022

Conversation

Marvin84
Copy link
Contributor

@Marvin84 Marvin84 commented Apr 13, 2022

Given a static allophone Finite State Acceptor (FSA), the ClassicTransducerBuilder, applies the transition penalties on the arcs. The Fsa at this level makes use of Epsilon arcs in order to distinguish between different word ends, including silence. In the current implementation, any arc coming after Epsilon arc will have speech forward penalty. This means that after trimming the Epsilon arcs, the transition penalty for leaving silence and final speech state is always the respective exit penalties plus speech forward. This should be corrected in the doExit() and doForward() functions in order to appkly correctly the silence forward, when leaving silence.

@Marvin84 Marvin84 self-assigned this Apr 13, 2022
@curufinwe
Copy link
Contributor

We don't use this much anymore, but does this code also work with skip connections? What if we have state-repetition set to 2 (because you work with phon0 here, but there is also phon1)?

@Marvin84 Marvin84 closed this Apr 13, 2022
@Marvin84 Marvin84 reopened this Apr 13, 2022
@Marvin84
Copy link
Contributor Author

Marvin84 commented Apr 13, 2022

What if we have state-repetition set to 2 (because you work with phon0 here, but there is also phon1)?

As far as I see here, this is a transition-specific state type, and is not about the distinction between different state labels but just speech and silence. There is actually no use of phone1 in any part of the code.
[Edit] current.weights here is actually silence. I need to check which state type is used on the epsilon arc. Since this is the additional weight that we need to subtract

@Marvin84
Copy link
Contributor Author

but does this code also work with skip connections?

Good point, this should be also corrected for the skip, where we have a similar problem. I will do the changes.

@albertz
Copy link
Member

albertz commented Apr 13, 2022

Should we introduce an option to enable the new fixed behavior? I assume people who upgrade RASR would not expect that the behavior suddenly is different from before.

In RETURNN, we introduced the "behavior version" (rwth-i6/returnn#508) such that the default settings can be reasonable when people use a recent behavior version while not breaking any older setup which uses an older behavior version. I think RASR doesn't have such a concept yet. Not sure if we want to introduce it (but this should maybe be a separate discussion).

In any case, I think the behavior should not just suddenly change, even if the old behavior was somehow wrong. For other things we also have always introduced options (@Marvin84 remember the other options regarding FSA stuff).

@Marvin84
Copy link
Contributor Author

the behavior should not just suddenly change

Here, in my opinion, we are not dealing with a behavior change due to the introduction of a new feature or functionality, but with a pure algorithmic bug. Surely, everybody should be notified about this, but continuing to use the 'old behavior' means applying a transition model that is incorrect.

@albertz
Copy link
Member

albertz commented Apr 13, 2022

... but continuing to use the 'old behavior' means applying a transition model that is incorrect.

Yes. But we also have this case for some of the other options I introduced before.

"incorrect" is maybe also a strong word. Maybe "unexpected" is better. Because it "worked" already before (since many years).

People who would update RASR now suddenly might even get worse results when they don't check and maybe adapt the config?

@Marvin84
Copy link
Contributor Author

"incorrect" is maybe also a strong word. Maybe "unexpected" is better.

leaving silence state with probability of speech forward is not unexpected is just incorrect

@Marvin84
Copy link
Contributor Author

People who would update RASR now suddenly might even get worse results when they don't check and maybe adapt the config?

Yes, this should be tested. But this is independent of fixing the bug.

@albertz
Copy link
Member

albertz commented Apr 13, 2022

People who would update RASR now suddenly might even get worse results when they don't check and maybe adapt the config?

Yes, this should be tested. But this is independent of fixing the bug.

Well, I personally would not introduce some new changed behavior (even when the old behavior was "wrong") without having an explicit option to enable the new behavior.

At least in RETURNN, we are very strict about never breaking old setups or changing the behavior of old setups. I think this is a good policy in general, also for RASR. But maybe @curufinwe, @michelwi, or also @ZhouW321 have different opinions on this? @JackTemaki maybe?

@@ -532,11 +532,25 @@ class TransitionModel::Applicator {
void doExit(const StackItem& current, const Fsa::Arc* ra) {
require(alphabet_->isDisambiguator(ra->input()));
verify(!applyExitTransitionToFinalStates_);
Fsa::Weight correctedWeight;
if (current.emission == silenceLabel_){
//we need to correct for the algorithmic bug introduced by epsilon arcs in case of exit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment-style is inconsistent to the remaining comment-style in this file. I think there should be a space after //.

@@ -532,11 +532,25 @@ class TransitionModel::Applicator {
void doExit(const StackItem& current, const Fsa::Arc* ra) {
require(alphabet_->isDisambiguator(ra->input()));
verify(!applyExitTransitionToFinalStates_);
Fsa::Weight correctedWeight;
if (current.emission == silenceLabel_){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a space before { (consistency).

StateType silType = TransitionModel::silence;
Fsa::Weight speechFwdToSubtract = t_->semiring()->invert(Fsa::Weight((*transitionModel_[speechType])[StateTransitionModel::forward]));
Fsa::Weight correctedTdp = t_->semiring()->extend(ra->weight(), speechFwdToSubtract);
correctedWeight = t_->semiring()->extend(correctedTdp, Fsa::Weight((*transitionModel_[silType])[StateTransitionModel::forward]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the best place to have this. (It's long ago that I worked with this code so I don't really remember anymore.) For example, can't we directly do this when we introduce the eps arcs? It's sounds a bit like we need this here as a workaround for another bug.

Copy link
Contributor Author

@Marvin84 Marvin84 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this solution is kind of hacky, because it corrects for this specific algorithmic solution and this specific fsa.
The clean solution would be to extend the State struct to have also the information about the emission label of the previous arc. But this would be even a bigger change and would actually need probably a separate module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because you work with phon0 here

I just saw that the statetype(Fsa::LabelId emission) function here in this class also distinguishes only between silence and phone0.

@Marvin84
Copy link
Contributor Author

Marvin84 commented Apr 17, 2022

Hi everybody,
Following @albertz 's suggestion, I am proposing a less hacky version, where a proper fsa implementation with context is provided. Furthermore, I introduced a flag that will decide between the default and the new behavior.
Looking forward to your suggestions.

@JackTemaki
Copy link
Contributor

In my opinion, as we were previously not using this particular version of RASR in experiments, I think it is fine to directly make the new behavior the default.

@albertz
Copy link
Member

albertz commented May 27, 2022

In my opinion, as we were previously not using this particular version of RASR in experiments, I think it is fine to directly make the new behavior the default.

I don't understand this argument. We would expect that all versions of RASR behave the same, or not? Otherwise it complicates everything to switch over to another RASR versions.

@JackTemaki
Copy link
Contributor

In my opinion, as we were previously not using this particular version of RASR in experiments, I think it is fine to directly make the new behavior the default.

I don't understand this argument. We would expect that all versions of RASR behave the same, or not? Otherwise it complicates everything to switch over to another RASR versions.

I do not consider them behaving the same. It only complicates anything it if this affects not only full-sum training. Because in that case we anyway have multiple different versions of RASR. Also changes to RASR are very rare (the last algorithmic change was more than one year ago), So we could even just start (or rather continue) giving version numbers for RASR itself.

@Marvin84
Copy link
Contributor Author

I already changed the first implementation proposed in the original pull request with the option of having a flag that selects new or old behavior. I would appreciate if you could give feedback on the specific implementation. @curufinwe @JackTemaki

"fix-tdp-leaving-epsilon-arc",
"Calls a different applicator that keeps track of the emission label at previous step "
"This solved the inconsistent transition model application due to epsilon arcs ",
false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant in my comments is changing this to true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much disagree here. I think there should never be behavior changes between RASR versions. Users should be able to switch or update RASR versions without being afraid that this would break their setup.

I have no idea how you get to this idea that it should be the other way around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I already mentioned that the old behavior is incorrect on the theoretical level. We already tested the new code and in practice we get the same WER. I am fine with having this parameter false, but it should be clear that the default behavior does not correspond to a proper transition model when leaving silence states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how you get to this idea that it should be the other way around

Again, I am stating in my opinion we should (in this specific case) change this flag to default True as:

  • It does not affect our regular hybrid training (if it does in a relevant manner, then surely it should be False)
  • It is (as Tina said) definitely a wrong implementation, not only a different, so I would consider this a bug-fix and not a behavioural change.
  • I do not know of anyone using the Github master branch of RASR in experiments, and we are currently in the process of making the master usable for i6, so I would want the correct behavior as default when I switch.
  • We have multiple incompatible RASR versions, and the master branch here will be the one where we try to unify at least some parts of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not affect our regular hybrid training

I thought it does affect realignments, full-sum training, and maybe also search?

It is (as Tina said) definitely a wrong implementation, not only a different, so I would consider this a bug-fix and not a behavioural change.

A change in behavior is a change in behavior.

I do not know of anyone using the Github master branch of RASR in experiments, and we are currently in the process of making the master usable for i6, so I would want the correct behavior as default when I switch.

I don't understand. I would want that when I switch RASR versions that my setups behave as before, even if it was incorrect before.

We have multiple incompatible RASR versions, and the master branch here will be the one where we try to unify at least some parts of them.

Yes, and this makes it even more important that people can just switch to the master branch and don't have to expect that their setups will suddenly work differently.

But anyway, I think this is a decision for @curufinwe.

In RETURNN, we definitely also handle it like that, that even if there are bugs, we introduce a new behavior version, such that people can always safely stay on master branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the default batch-norm settings are far from "correct" in the sense of what you would expect, and this was fixed with a new "behavior" version, but I don't know of anything else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes batch norm settings is maybe the most popular example. But see here for a list: https://returnn.readthedocs.io/en/latest/configuration_reference/behavior_version.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look at the link above. In my opinion, these are all real behavior changes. Also the case of batch normalization it seems to me that this is more of a different implementation choice, involving hyperparameters and aspects regarding complexity optimization. How is this comparable with our example here where we have a conceptually wrong modeling approach, i.e. we have an automaton that has the probability of leaving silence equal to the probability of leaving speech plus exiting silence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exactly the same. In all cases, the behavior was wrong before and is now correct. E.g. the batch norm settings before were really totally wrong, conceptually wrong, did not make any sense. The other examples are all similar.

You could also call this option you introduce here just a hyper param.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it differently. But I think there is no point in continuing this discussion. The real issue we need to discuss is whether in rasr we can consider similar behavior-monitoring as in returnn. And this should solve the issue.

@@ -641,6 +641,431 @@ Fsa::ConstAutomatonRef TransitionModel::apply(Fsa::ConstAutomatonRef in,
return ap.apply(in);
}



class TransitionModel::ApplicatorWithContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of duplication of code in this class. Please refactor this into a common base class and have Applicator and ApplicatorWithContext inherit from it.

Copy link
Contributor Author

@Marvin84 Marvin84 Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this might be indeed possible. But still: I am arguing that the Applicator has a buggy behavior. Do you actually want to keep the buggy code still as an option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@@ -121,6 +122,11 @@ public:
Fsa::LabelId silenceLabel,
bool applyExitTransitionToFinalStates) const;

Fsa::ConstAutomatonRef applyWithContext(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation describing how this function is different from apply.

@Marvin84
Copy link
Contributor Author

With the help of @curufinwe the architecture refactoring following the implemented logic is carried out.
@SimBe195 @JackTemaki @curufinwe the current code has been already tested. Please add your review.

"tying-type",
&choiceTyingType,
"type of tying scheme",
global);

Core::Choice Am::TransitionModel::choiceApplicatorType(
"legacy-buggy", LegacyBuggyApplicator,
"corrected", CorrectedApplicator,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe just call these "legacy" and "corrected".

Core::ParameterChoice Am::TransitionModel::paramApplicatorType(
"applicator-type",
&choiceApplicatorType,
"The applicator used for adding weights on the FSA, legacy version has a bug",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short 1-sentence description what's wrong with the legacy version could be added here.

};

template<class AppState>
class AbstractApplicator : public Applicator {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have the split into Applicator and AbstractApplicator? Can they be merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the AbstractApplicator is a template-type and we later on need to have a single base class in line 899.

ap.reset(new StandardApplicator(*this));
break;
case CorrectedApplicator:
ap.reset(new ApplicatorWithContext(*this));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like that the ApplicatorTypes refer to classes which are named differently and have names which themselves sound like subclasses of AbstractApplicator. I would perhaps somewhat unify the naming scheme like this:

  • StandardApplicator -> LegacyApplicator
  • LegacyBuggyApplicator -> Legacy or LegacyType
  • "legacy-buggy" -> "legacy"
  • ApplicatorWithContext -> CorrectedApplicator
  • CorrectedApplicator -> Corrected or CorrectedType

That way it's clearer which ones belong together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with renaming, but would propose that we make the enum scoped and give the enum values the same name as the classes. E.g. we would then have LegacyApplicator and ApplicatorType::LegacyApplicator.
Then the full renaming is:
StandardApplicator -> LegacyApplicator
"legacy-buggy" -> "legacy"
ApplicatorWithContext -> CorrectedApplicator

};

template<class AppState>
class AbstractApplicator : public Applicator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the AbstractApplicator is a template-type and we later on need to have a single base class in line 899.

enum TyingType { global,
globalPlusNonWord,
cart };
enum ApplicatorType { LegacyBuggyApplicator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this one a scoped enum:

enum class ApplicatorType {
    LegacyApplicator = 1,
    CorrectedApplicator = 2
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: we then should probably also do the same change for the TyingType to be consistent.

ap.reset(new StandardApplicator(*this));
break;
case CorrectedApplicator:
ap.reset(new ApplicatorWithContext(*this));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with renaming, but would propose that we make the enum scoped and give the enum values the same name as the classes. E.g. we would then have LegacyApplicator and ApplicatorType::LegacyApplicator.
Then the full renaming is:
StandardApplicator -> LegacyApplicator
"legacy-buggy" -> "legacy"
ApplicatorWithContext -> CorrectedApplicator

@@ -77,20 +77,23 @@ public:
phone1,
nStateTypes
};
class Applicator;
enum TyingType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not scoped enums. Scoped enums are defined by adding either the class or struct keyword after the enum keyword. This is why you could not name the enum values LegacyApplicator and CorrectedApplicator as in an unscoped enum these names are added to the surrounding namespace which already contains these names for the classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please open another pull request for this? This does not have any effect on the config side and can be fixed later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants