Allophone State Fsa bug for arc's weight leaving silence #14

Marvin84 · 2022-04-13T11:17:36Z

Given a static allophone Finite State Acceptor (FSA), the ClassicTransducerBuilder, applies the transition penalties on the arcs. The Fsa at this level makes use of Epsilon arcs in order to distinguish between different word ends, including silence. In the current implementation, any arc coming after Epsilon arc will have speech forward penalty. This means that after trimming the Epsilon arcs, the transition penalty for leaving silence and final speech state is always the respective exit penalties plus speech forward. This should be corrected in the doExit() and doForward() functions in order to appkly correctly the silence forward, when leaving silence.

…e level fsa

curufinwe · 2022-04-13T12:20:50Z

We don't use this much anymore, but does this code also work with skip connections? What if we have state-repetition set to 2 (because you work with phon0 here, but there is also phon1)?

Marvin84 · 2022-04-13T12:39:36Z

What if we have state-repetition set to 2 (because you work with phon0 here, but there is also phon1)?

As far as I see here, this is a transition-specific state type, and is not about the distinction between different state labels but just speech and silence. There is actually no use of phone1 in any part of the code.
[Edit] current.weights here is actually silence. I need to check which state type is used on the epsilon arc. Since this is the additional weight that we need to subtract

Marvin84 · 2022-04-13T12:58:27Z

but does this code also work with skip connections?

Good point, this should be also corrected for the skip, where we have a similar problem. I will do the changes.

albertz · 2022-04-13T13:11:54Z

Should we introduce an option to enable the new fixed behavior? I assume people who upgrade RASR would not expect that the behavior suddenly is different from before.

In RETURNN, we introduced the "behavior version" (rwth-i6/returnn#508) such that the default settings can be reasonable when people use a recent behavior version while not breaking any older setup which uses an older behavior version. I think RASR doesn't have such a concept yet. Not sure if we want to introduce it (but this should maybe be a separate discussion).

In any case, I think the behavior should not just suddenly change, even if the old behavior was somehow wrong. For other things we also have always introduced options (@Marvin84 remember the other options regarding FSA stuff).

Marvin84 · 2022-04-13T13:18:40Z

the behavior should not just suddenly change

Here, in my opinion, we are not dealing with a behavior change due to the introduction of a new feature or functionality, but with a pure algorithmic bug. Surely, everybody should be notified about this, but continuing to use the 'old behavior' means applying a transition model that is incorrect.

albertz · 2022-04-13T13:22:00Z

... but continuing to use the 'old behavior' means applying a transition model that is incorrect.

Yes. But we also have this case for some of the other options I introduced before.

"incorrect" is maybe also a strong word. Maybe "unexpected" is better. Because it "worked" already before (since many years).

People who would update RASR now suddenly might even get worse results when they don't check and maybe adapt the config?

Marvin84 · 2022-04-13T13:24:05Z

"incorrect" is maybe also a strong word. Maybe "unexpected" is better.

leaving silence state with probability of speech forward is not unexpected is just incorrect

Marvin84 · 2022-04-13T13:25:13Z

People who would update RASR now suddenly might even get worse results when they don't check and maybe adapt the config?

Yes, this should be tested. But this is independent of fixing the bug.

albertz · 2022-04-13T13:28:23Z

People who would update RASR now suddenly might even get worse results when they don't check and maybe adapt the config?

Yes, this should be tested. But this is independent of fixing the bug.

Well, I personally would not introduce some new changed behavior (even when the old behavior was "wrong") without having an explicit option to enable the new behavior.

At least in RETURNN, we are very strict about never breaking old setups or changing the behavior of old setups. I think this is a good policy in general, also for RASR. But maybe @curufinwe, @michelwi, or also @ZhouW321 have different opinions on this? @JackTemaki maybe?

albertz · 2022-04-13T13:30:19Z

src/Am/TransitionModel.cc

@@ -532,11 +532,25 @@ class TransitionModel::Applicator {
    void doExit(const StackItem& current, const Fsa::Arc* ra) {
        require(alphabet_->isDisambiguator(ra->input()));
        verify(!applyExitTransitionToFinalStates_);
+        Fsa::Weight correctedWeight;
+        if (current.emission == silenceLabel_){
+            //we need to correct for the algorithmic bug introduced by epsilon arcs in case of exit,


The comment-style is inconsistent to the remaining comment-style in this file. I think there should be a space after //.

albertz · 2022-04-13T13:30:52Z

src/Am/TransitionModel.cc

@@ -532,11 +532,25 @@ class TransitionModel::Applicator {
    void doExit(const StackItem& current, const Fsa::Arc* ra) {
        require(alphabet_->isDisambiguator(ra->input()));
        verify(!applyExitTransitionToFinalStates_);
+        Fsa::Weight correctedWeight;
+        if (current.emission == silenceLabel_){


There should be a space before { (consistency).

albertz · 2022-04-13T13:37:41Z

src/Am/TransitionModel.cc

+            StateType silType    = TransitionModel::silence;
+            Fsa::Weight speechFwdToSubtract = t_->semiring()->invert(Fsa::Weight((*transitionModel_[speechType])[StateTransitionModel::forward]));
+            Fsa::Weight correctedTdp = t_->semiring()->extend(ra->weight(),  speechFwdToSubtract);
+            correctedWeight = t_->semiring()->extend(correctedTdp, Fsa::Weight((*transitionModel_[silType])[StateTransitionModel::forward]));


I wonder if this is the best place to have this. (It's long ago that I worked with this code so I don't really remember anymore.) For example, can't we directly do this when we introduce the eps arcs? It's sounds a bit like we need this here as a workaround for another bug.

Yes, this solution is kind of hacky, because it corrects for this specific algorithmic solution and this specific fsa.
The clean solution would be to extend the State struct to have also the information about the emission label of the previous arc. But this would be even a bigger change and would actually need probably a separate module.

because you work with phon0 here

I just saw that the statetype(Fsa::LabelId emission) function here in this class also distinguishes only between silence and phone0.

Marvin84 · 2022-04-17T10:46:47Z

Hi everybody,
Following @albertz 's suggestion, I am proposing a less hacky version, where a proper fsa implementation with context is provided. Furthermore, I introduced a flag that will decide between the default and the new behavior.
Looking forward to your suggestions.

JackTemaki · 2022-05-27T12:36:44Z

In my opinion, as we were previously not using this particular version of RASR in experiments, I think it is fine to directly make the new behavior the default.

albertz · 2022-05-27T13:02:31Z

In my opinion, as we were previously not using this particular version of RASR in experiments, I think it is fine to directly make the new behavior the default.

I don't understand this argument. We would expect that all versions of RASR behave the same, or not? Otherwise it complicates everything to switch over to another RASR versions.

JackTemaki · 2022-05-27T13:48:25Z

In my opinion, as we were previously not using this particular version of RASR in experiments, I think it is fine to directly make the new behavior the default.

I don't understand this argument. We would expect that all versions of RASR behave the same, or not? Otherwise it complicates everything to switch over to another RASR versions.

I do not consider them behaving the same. It only complicates anything it if this affects not only full-sum training. Because in that case we anyway have multiple different versions of RASR. Also changes to RASR are very rare (the last algorithmic change was more than one year ago), So we could even just start (or rather continue) giving version numbers for RASR itself.

Marvin84 · 2022-05-27T15:50:06Z

I already changed the first implementation proposed in the original pull request with the option of having a flag that selects new or old behavior. I would appreciate if you could give feedback on the specific implementation. @curufinwe @JackTemaki

JackTemaki · 2022-05-27T16:51:51Z

src/Am/ClassicTransducerBuilder.cc

+        "fix-tdp-leaving-epsilon-arc",
+        "Calls a different applicator that keeps track of the emission label at previous step "
+        "This solved the inconsistent transition model application due to epsilon arcs ",
+        false);


What I meant in my comments is changing this to true.

I very much disagree here. I think there should never be behavior changes between RASR versions. Users should be able to switch or update RASR versions without being afraid that this would break their setup.

I have no idea how you get to this idea that it should be the other way around.

I think I already mentioned that the old behavior is incorrect on the theoretical level. We already tested the new code and in practice we get the same WER. I am fine with having this parameter false, but it should be clear that the default behavior does not correspond to a proper transition model when leaving silence states.

I have no idea how you get to this idea that it should be the other way around

Again, I am stating in my opinion we should (in this specific case) change this flag to default True as:

It does not affect our regular hybrid training (if it does in a relevant manner, then surely it should be False)

It is (as Tina said) definitely a wrong implementation, not only a different, so I would consider this a bug-fix and not a behavioural change.

I do not know of anyone using the Github master branch of RASR in experiments, and we are currently in the process of making the master usable for i6, so I would want the correct behavior as default when I switch.

We have multiple incompatible RASR versions, and the master branch here will be the one where we try to unify at least some parts of them.

It does not affect our regular hybrid training

I thought it does affect realignments, full-sum training, and maybe also search?

It is (as Tina said) definitely a wrong implementation, not only a different, so I would consider this a bug-fix and not a behavioural change.

A change in behavior is a change in behavior.

I do not know of anyone using the Github master branch of RASR in experiments, and we are currently in the process of making the master usable for i6, so I would want the correct behavior as default when I switch.

I don't understand. I would want that when I switch RASR versions that my setups behave as before, even if it was incorrect before.

We have multiple incompatible RASR versions, and the master branch here will be the one where we try to unify at least some parts of them.

Yes, and this makes it even more important that people can just switch to the master branch and don't have to expect that their setups will suddenly work differently.

But anyway, I think this is a decision for @curufinwe.

In RETURNN, we definitely also handle it like that, that even if there are bugs, we introduce a new behavior version, such that people can always safely stay on master branch.

I know that the default batch-norm settings are far from "correct" in the sense of what you would expect, and this was fixed with a new "behavior" version, but I don't know of anything else.

Yes batch norm settings is maybe the most popular example. But see here for a list: https://returnn.readthedocs.io/en/latest/configuration_reference/behavior_version.html

I took a quick look at the link above. In my opinion, these are all real behavior changes. Also the case of batch normalization it seems to me that this is more of a different implementation choice, involving hyperparameters and aspects regarding complexity optimization. How is this comparable with our example here where we have a conceptually wrong modeling approach, i.e. we have an automaton that has the probability of leaving silence equal to the probability of leaving speech plus exiting silence.

It's exactly the same. In all cases, the behavior was wrong before and is now correct. E.g. the batch norm settings before were really totally wrong, conceptually wrong, did not make any sense. The other examples are all similar.

You could also call this option you introduce here just a hyper param.

I see it differently. But I think there is no point in continuing this discussion. The real issue we need to discuss is whether in rasr we can consider similar behavior-monitoring as in returnn. And this should solve the issue.

curufinwe · 2022-06-08T09:22:22Z

src/Am/TransitionModel.cc

@@ -641,6 +641,431 @@ Fsa::ConstAutomatonRef TransitionModel::apply(Fsa::ConstAutomatonRef in,
    return ap.apply(in);
 }

+
+
+class TransitionModel::ApplicatorWithContext {


There is a lot of duplication of code in this class. Please refactor this into a common base class and have Applicator and ApplicatorWithContext inherit from it.

Ok, this might be indeed possible. But still: I am arguing that the Applicator has a buggy behavior. Do you actually want to keep the buggy code still as an option?

curufinwe · 2022-06-08T09:22:54Z

src/Am/TransitionModel.hh

@@ -121,6 +122,11 @@ public:
            Fsa::LabelId           silenceLabel,
            bool                   applyExitTransitionToFinalStates) const;

+    Fsa::ConstAutomatonRef applyWithContext(


Add documentation describing how this function is different from apply.

Marvin84 · 2022-11-21T18:09:37Z

With the help of @curufinwe the architecture refactoring following the implemented logic is carried out.
@SimBe195 @JackTemaki @curufinwe the current code has been already tested. Please add your review.

SimBe195 · 2022-11-24T10:54:49Z

src/Am/TransitionModel.cc

        "tying-type",
        &choiceTyingType,
        "type of tying scheme",
        global);

+Core::Choice Am::TransitionModel::choiceApplicatorType(
+        "legacy-buggy", LegacyBuggyApplicator,
+        "corrected", CorrectedApplicator,


I would maybe just call these "legacy" and "corrected".

SimBe195 · 2022-11-24T10:55:36Z

src/Am/TransitionModel.cc

+Core::ParameterChoice Am::TransitionModel::paramApplicatorType(
+        "applicator-type",
+        &choiceApplicatorType,
+        "The applicator used for adding weights on the FSA, legacy version has a bug",


A short 1-sentence description what's wrong with the legacy version could be added here.

SimBe195 · 2022-11-28T14:15:19Z

src/Am/TransitionModel.cc

+};
+
+template<class AppState>
+class AbstractApplicator : public Applicator {


Why do we have the split into Applicator and AbstractApplicator? Can they be merged?

This is because the AbstractApplicator is a template-type and we later on need to have a single base class in line 899.

SimBe195 · 2022-11-28T17:05:35Z

src/Am/TransitionModel.cc

+            ap.reset(new StandardApplicator(*this));
+            break;
+        case CorrectedApplicator:
+            ap.reset(new ApplicatorWithContext(*this));


I don't really like that the ApplicatorTypes refer to classes which are named differently and have names which themselves sound like subclasses of AbstractApplicator. I would perhaps somewhat unify the naming scheme like this:

StandardApplicator -> LegacyApplicator

LegacyBuggyApplicator -> Legacy or LegacyType

"legacy-buggy" -> "legacy"

ApplicatorWithContext -> CorrectedApplicator

CorrectedApplicator -> Corrected or CorrectedType

That way it's clearer which ones belong together.

I agree with renaming, but would propose that we make the enum scoped and give the enum values the same name as the classes. E.g. we would then have LegacyApplicator and ApplicatorType::LegacyApplicator.
Then the full renaming is:
StandardApplicator -> LegacyApplicator
"legacy-buggy" -> "legacy"
ApplicatorWithContext -> CorrectedApplicator

curufinwe · 2022-11-29T10:30:39Z

src/Am/TransitionModel.cc

+};
+
+template<class AppState>
+class AbstractApplicator : public Applicator {


This is because the AbstractApplicator is a template-type and we later on need to have a single base class in line 899.

curufinwe · 2022-11-29T10:41:34Z

src/Am/TransitionModel.hh

    enum TyingType { global,
                     globalPlusNonWord,
                     cart };
+    enum ApplicatorType { LegacyBuggyApplicator,


let's make this one a scoped enum:

enum class ApplicatorType { LegacyApplicator = 1, CorrectedApplicator = 2 }

PS: we then should probably also do the same change for the TyingType to be consistent.

curufinwe · 2022-11-29T10:44:57Z

src/Am/TransitionModel.cc

+            ap.reset(new StandardApplicator(*this));
+            break;
+        case CorrectedApplicator:
+            ap.reset(new ApplicatorWithContext(*this));


I agree with renaming, but would propose that we make the enum scoped and give the enum values the same name as the classes. E.g. we would then have LegacyApplicator and ApplicatorType::LegacyApplicator.
Then the full renaming is:
StandardApplicator -> LegacyApplicator
"legacy-buggy" -> "legacy"
ApplicatorWithContext -> CorrectedApplicator

curufinwe · 2022-11-29T18:08:00Z

src/Am/TransitionModel.hh

@@ -77,20 +77,23 @@ public:
        phone1,
        nStateTypes
    };
-    class Applicator;
+    enum TyingType {


These are not scoped enums. Scoped enums are defined by adding either the class or struct keyword after the enum keyword. This is why you could not name the enum values LegacyApplicator and CorrectedApplicator as in an unscoped enum these names are added to the surrounding namespace which already contains these names for the classes.

Could we please open another pull request for this? This does not have any effect on the config side and can be fixed later.

correcting the arc's weight for leaving silence in the allophone stat…

75ada4a

…e level fsa

Marvin84 requested review from albertz, curufinwe and michelwi April 13, 2022 11:17

Marvin84 self-assigned this Apr 13, 2022

Marvin84 closed this Apr 13, 2022

Marvin84 reopened this Apr 13, 2022

albertz reviewed Apr 13, 2022

View reviewed changes

new tdp applicator with context

158f193

JackTemaki reviewed May 27, 2022

View reviewed changes

JackTemaki requested a review from SimBe195 May 31, 2022 16:35

JackTemaki added this to the Making RASR ready for the new Pipelines milestone Jun 2, 2022

curufinwe requested changes Jun 8, 2022

View reviewed changes

architecture refactoring for TransitionModel to support fsa bug fix

647b3a1

Marvin84 requested review from curufinwe and JackTemaki November 21, 2022 18:05

Marvin84 added 2 commits November 24, 2022 14:25

corrected comment

b7c6949

minor comment change

eb57c4b

SimBe195 requested changes Nov 28, 2022

View reviewed changes

curufinwe requested changes Nov 29, 2022

View reviewed changes

better naming and scoped enum for applicator and tying type

da605ad

Marvin84 requested review from curufinwe and SimBe195 November 29, 2022 14:22

curufinwe reviewed Nov 29, 2022

View reviewed changes

reversed back eo classic enum

dd692ab

Marvin84 requested a review from curufinwe November 30, 2022 18:44

SimBe195 approved these changes Dec 1, 2022

View reviewed changes

corrected scoped enums for transition type and applicator

c0113c3

curufinwe approved these changes Dec 2, 2022

View reviewed changes

curufinwe merged commit d506533 into rwth-i6:master Dec 2, 2022

SimBe195 mentioned this pull request Mar 24, 2023

Fix TransitionModel Applicator memory leak #47

Merged

Marvin84 mentioned this pull request May 11, 2023

add corrected applicator to linear alignment rwth-i6/i6_core#407

Merged

Allophone State Fsa bug for arc's weight leaving silence #14

Allophone State Fsa bug for arc's weight leaving silence #14

Conversation

Marvin84 commented Apr 13, 2022 • edited Loading

curufinwe commented Apr 13, 2022

Marvin84 commented Apr 13, 2022 • edited Loading

Marvin84 commented Apr 13, 2022

albertz commented Apr 13, 2022

Marvin84 commented Apr 13, 2022

albertz commented Apr 13, 2022

Marvin84 commented Apr 13, 2022

Marvin84 commented Apr 13, 2022

albertz commented Apr 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Marvin84 Apr 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Marvin84 commented Apr 17, 2022 • edited Loading

JackTemaki commented May 27, 2022

albertz commented May 27, 2022

JackTemaki commented May 27, 2022

Marvin84 commented May 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Marvin84 Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Marvin84 commented Nov 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Marvin84 commented Apr 13, 2022 •

edited

Loading

Marvin84 commented Apr 13, 2022 •

edited

Loading

Marvin84 Apr 13, 2022 •

edited

Loading

Marvin84 commented Apr 17, 2022 •

edited

Loading

Marvin84 Jun 8, 2022 •

edited

Loading