Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CumConcatLayer #589

Merged
merged 2 commits into from
Sep 12, 2021
Merged

CumConcatLayer #589

merged 2 commits into from
Sep 12, 2021

Conversation

albertz
Copy link
Member

@albertz albertz commented Aug 26, 2021

This is for #391.

@albertz albertz mentioned this pull request Aug 26, 2021
@albertz albertz force-pushed the albert-generalized-self-att branch from 265929c to 38ef10b Compare August 26, 2021 16:14
@albertz albertz marked this pull request as ready for review August 26, 2021 17:39
@albertz

This comment has been minimized.

@Zettelkasten

This comment has been minimized.

@albertz albertz force-pushed the albert-generalized-self-att branch 2 times, most recently from 27504f3 to 580fcee Compare August 26, 2021 21:23
@albertz albertz requested a review from a team as a code owner August 26, 2021 21:23
@albertz
Copy link
Member Author

albertz commented Aug 26, 2021

Implementing a test case is a bit tricky here.

I cannot use DotLayer at the moment to reduce the new dim tag (see this comment).

I also cannot really use any other layer to reduce the new dim tag, as they all would not correctly handle the extended dynamic size. Also, operations using the extended dynamic size would usually also expect all axes from the ext dyn size to be present (rec_time_dim), which is not the case inside the loop. Only DotLayer would introduce this as outlined in #391.

I also currently can not return the CumConcatLayer directly as output inside the rec layer because the rec layer logic expects that the output template inside the rec layer will be the final output with additional rec_time_dim. But this is not true for the CumConcatLayer where rec_time_dim is not added (except in the ext dyn size).

@albertz

This comment has been minimized.

Copy link
Member

@Zettelkasten Zettelkasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I only had minor comments.
Like you suggested, I think we can merge this then and go from there. EDIT: After we fix the other issues..

returnn/tf/util/data.py Outdated Show resolved Hide resolved
returnn/tf/util/data.py Outdated Show resolved Hide resolved
returnn/tf/layers/rec.py Outdated Show resolved Hide resolved
returnn/tf/layers/rec.py Outdated Show resolved Hide resolved
@albertz albertz force-pushed the albert-generalized-self-att branch from f53d68e to 4791acd Compare August 27, 2021 09:07
@albertz
Copy link
Member Author

albertz commented Aug 27, 2021

Maybe I will directly implement all needed things in this PR here (as minimal as possible) such that the test can run. I.e. also:

@albertz albertz force-pushed the albert-generalized-self-att branch 3 times, most recently from c9d9d8b to e99be05 Compare August 27, 2021 11:46
@albertz

This comment has been minimized.

@albertz

This comment has been minimized.

@albertz albertz force-pushed the albert-generalized-self-att branch 2 times, most recently from 2ed67bb to 2384c1b Compare September 1, 2021 13:09
@albertz
Copy link
Member Author

albertz commented Sep 1, 2021

I wonder a bit how to best handle such huge PRs (e.g. how to effectively do the cleanup).
I wrote that down here on Reddit as a question but not sure if that is the best place to ask such things. But I also don't really know any good place where I could ask such question. Any recommendation where to ask such things?

@albertz
Copy link
Member Author

albertz commented Sep 1, 2021

I will probably now first move everything out into a new branch, except the two new test cases and also the CumConcatLayer (which is what this PR here is about + further things we need for the test cases to pass). Because all the remaining things should always pass all the tests.

@albertz

This comment has been minimized.

@albertz
Copy link
Member Author

albertz commented Sep 2, 2021

Wow, this was quite some effort, to get the basics ready, to continue working on this itself.

Basically:
#592 #599 #600 #601 #602 #603 #604 #605 #606 #607 #608 #609 #610 #611 #612 #613 #614 #615 #616 #617 #618 #619 #620 #621 #622 #623 #624

albertz and others added 2 commits September 12, 2021 02:40
This is for generalized self attention (#391).
Fixes #391.

Co-authored-by: Frithjof <[email protected]>
@albertz
Copy link
Member Author

albertz commented Sep 12, 2021

The test case is a full simple test case for auto-regressive self attention, so it covers #391. CumConcatLayer is the final missing piece now, so this finally fixes #391 (at least the basic support).

@albertz albertz merged commit 3ab8667 into master Sep 12, 2021
albertz added a commit that referenced this pull request Sep 12, 2021
This is for generalized self attention (#391).
Fixes #391.

Co-authored-by: Frithjof <[email protected]>
@albertz albertz deleted the albert-generalized-self-att branch September 12, 2021 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants