Skip to content

Is It Necessary to Transpose Dimensions in Multi-Head Attention or Can We Reshape Directly? #399

Answered by rasbt
rohanwinsor asked this question in Q&A
Discussion options

You must be logged in to vote

Hey there,

this is a really good question. At first glance, it looks like this should work because the dimensions would be the same. But note that reshaping and transposing are slightly different in terms of how the matrices get arranged for the matrix multiplication that follows. So, no, those are not interchangeable. Actually, there was the same question in #167, where the answer may give a bit more concrete insights.

Anyways, thanks for asking!

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by rasbt
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants