Question: What are some recommended better alternatives for RBP ? #105

fzimmermann89 · 2024-06-04T12:13:20Z

First of all, thank you for providing this library.

I want to move a 2D Swin image->image model to neighbourhood attention. So for, I have been using the relative positional embeddings as in the original Swin repo.

Both in issues as well as the documentation of the fused attention, you mention that there will most likely never be an implementation of RBP in the fused kernels, and that there are better alternatives.
... Could you maybe give me some pointers to techniques that work in you experience well with neighborhood attention?

Cheers
Felix

alihassanijr · 2024-06-04T13:04:31Z

Thank you; I'm very glad you found it useful.

With regard to RPB, yes, there actually are very good alternatives that bias the inputs to the attention operator instead of attention weights, and they not only provide similar or better accuracy than RPB, they're easier to train, and are (usually) cheaper. This is actually what made us not bother with RPB / attention bias, because it usually defeats the purpose of kernel fusion, and further bottlenecks an already complicated backwards kernel.

We're going to push out a new preprint in the coming weeks that directly addresses this, and of course everything will be open sourced at that time.

zaptrem · 2024-07-06T18:52:38Z

Thank you; I'm very glad you found it useful.

With regard to RPB, yes, there actually are very good alternatives that bias the inputs to the attention operator instead of attention weights, and they not only provide similar or better accuracy than RPB, they're easier to train, and are (usually) cheaper. This is actually what made us not bother with RPB / attention bias, because it usually defeats the purpose of kernel fusion, and further bottlenecks an already complicated backwards kernel.

We're going to push out a new preprint in the coming weeks that directly addresses this, and of course everything will be open sourced at that time.

Are you saying there are existing techniques that are better (in which case could you name them explicitly so we could use them?) or that you have invented a new one (which you'd understandably like to publish at the same time as your preprint?)

Also, do these techniques support inference on unseen sequence lengths (like ConvNeXT)? Thanks!

alihassanijr · 2024-07-08T00:58:54Z

Yes, rotary embeddings, if tuned correctly, often outperform RPB, and they are easier to implement and performance optimize in a lot of ways.
And I can't speak to ConvNeXt, but we've found that rotary embeddings are also more stable than RPB when dealing with varying sequence lengths.

mliuschi · 2024-09-12T20:19:29Z

We're going to push out a new preprint in the coming weeks that directly addresses this, and of course everything will be open sourced at that time.

@alihassanijr Would you happen to have a link to the preprint? I'm also curious to learn more about alternatives to RPB for neighborhood attention. Thanks!

alihassanijr · 2024-10-01T18:29:24Z

I moved this issue here since this issue is more related to NAT/DiNAT as opposed to NATTEN.

We'll be updating this thread soon.

fzimmermann89 closed this as completed Jun 11, 2024

AdityaKane2001 mentioned this issue Aug 3, 2024

Alternatives to Relative Positional Biases? #104

Open

alihassanijr transferred this issue from SHI-Labs/NATTEN Oct 1, 2024

alihassanijr reopened this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: What are some recommended better alternatives for RBP ? #105

Question: What are some recommended better alternatives for RBP ? #105

fzimmermann89 commented Jun 4, 2024

alihassanijr commented Jun 4, 2024

zaptrem commented Jul 6, 2024

alihassanijr commented Jul 8, 2024

mliuschi commented Sep 12, 2024

alihassanijr commented Oct 1, 2024

Question: What are some recommended better alternatives for RBP ? #105

Question: What are some recommended better alternatives for RBP ? #105

Comments

fzimmermann89 commented Jun 4, 2024

alihassanijr commented Jun 4, 2024

zaptrem commented Jul 6, 2024

alihassanijr commented Jul 8, 2024

mliuschi commented Sep 12, 2024

alihassanijr commented Oct 1, 2024