-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RangeInAxisLayer, allow axis
to be the rec time dim
#820
Comments
Well yea that would be nice. But why wouldn't you just use layer |
I'm not sure if this maybe makes |
My use case would be like this: I have some relatively complicated self-attention code, which I want to work non-recurrent and recurrently. Currently, for the recurrent case, we just simply use
If
because that works in no matter if I'm inside a rec layer or not. But that would also be (slightly?) less efficient in the non-recurrent case because of this "no-op gather" then. So maybe this is a bad idea anyway. |
In this a case where rec automatic optimization breaks it? I.e. it works without rec optimization but then breaks when moved out? Then yes, this must be fixed. But I'm not sure if you have that. Or is this for sharing code between having an auto-regressive formulation and a (non-autoregressive) sequence-level formulation? I don't see the concern for that too much, as you also have other differences like
This sounds like what you do in auto-regressive self attention as well. But then when you do the attention, i.e. calculate the weighted sum (or doing matmul, using So now instead of attention, you do sth different (e.g. LSTM or so) and then take the last step? But why would you not use (Btw, maybe even a bit better would be to use Isn't the problem just that |
Yes, this is the case (this is not an issue with the rec optimizations, but just about this).
Yeah well, conceptually yes. My "attention function" calculation depends not only on a single query, but on all queries (I am aware that this means that rec optimization changes the semantics of my network if I'm not careful).
Using
Ah thanks, good point, yeah, that's a bit dangerous.
I don't think gather layer is part of my problem? |
Hm, yeah, that is a bit weird. I would also find it misleading if I will close this issue for now, I think |
Similar to
CumsumLayer
andWindowLayer
, we could support this.This would then have the same behavior as
RecStepInfoLayer
(:i
), i.e.RangeInAxisLayer
would just return a single int if it operates step-by-step.The text was updated successfully, but these errors were encountered: