Implement bi-directionality #52

yair-schiff · 2023-12-13T04:01:42Z

Edit:

Implement bi-directionality by applying Mamba module twice: (1) to the forward sequence and (2) to the backward sequence.
Implement 3 2 strategies for combining forward / backward Mamba hidden states:
1. add: Add the states.
2. ~~concat: Concatenate the states. This doubles the hidden dimension,d_model, which also prevents weight tying between embedding and lm_head weights.~~
3. ew_multiply: perform element-wise multiplication between the states.

mamba_ssm/models/mixer_seq_simple.py

Skylion007

Left some nits

mamba_ssm/models/mixer_seq_simple.py

sentialx · 2023-12-24T11:57:20Z

What if the sequences have paddings? E.g.
Input is
[1 2 3 0 0 0]
So flipped input would be
[0 0 0 3 2 1].
Shouldn't it be
[3 2 1 0 0 0]?

yair-schiff · 2023-12-24T16:01:43Z

@sentialx , agreed. That's a good catch.

jimmieliu · 2024-01-02T08:29:28Z

how the speed compares to uni-directional?

yair-schiff · 2024-01-03T15:53:25Z

how the speed compares to uni-directional?

@jimmieliu, it's about 2x

pengzhangzhi · 2024-01-24T02:32:02Z

@yair-schiff I am just curious, did you solve the

What if the sequences have paddings? E.g. Input is [1 2 3 0 0 0] So flipped input would be [0 0 0 3 2 1]. Shouldn't it be [3 2 1 0 0 0]?

Just curious, is this problem solved?

pengzhangzhi · 2024-01-25T19:30:59Z

I came up with a solution to the padding issue. Say a tensor [1,2,3,0,0], where 0 is the padding token. We flip it to get [0,0,1,2,3], pass it to the network and flip it back. Therefore, the flipped tensor information matches the original tensor order as we apply double flips.

given: x
out = x + f(x.flip()).flip()

xuanwuji · 2024-07-13T03:15:01Z

I came up with a solution to the padding issue. Say a tensor [1,2,3,0,0], where 0 is the padding token. We flip it to get [0,0,1,2,3], pass it to the network and flip it back. Therefore, the flipped tensor information matches the original tensor order as we apply double flips.
given: x
out = x + f(x.flip()).flip()

Hi, Your approach is clever! But I have a question: if you flip the input to [0,0,1,2,3], does the padding in front of it affect sequence hidden features learning? i.e., does it produce a different result(bad repersentation of sequence) than the input of [3,2,1,0,0]?
I don't know enough about it, could you possibly give me some guidance? This will help me a lot. Thank you very much!

Museum7432 · 2024-07-14T10:29:24Z

@xuanwuji well, you can remove the leading paddings by shifting each row of x before flipping x. As for its effect, since the hidden state is initialized with 0, it should still be filled with 0 after scanning through the paddings. So, those padding shouldn't have any effect on the result. However, you can use the following function just to be sure.

def flip_padded_hidden_states(hidden_states, seq_lens):
    batch_size, seq_len, hidden_dim = hidden_states.shape

    indices = torch.arange(batch_size * seq_len, device=hidden_states.device).reshape(
        batch_size, seq_len
    )

    indices_offset = seq_len - seq_lens

    indices = (indices - indices_offset.unsqueeze(1)) % (seq_len * batch_size)

    indices = indices.flip(1)

    return hidden_states.reshape(batch_size * seq_len, hidden_dim)[indices]

To check the effect of paddings:

import torch
from mamba_ssm import Mamba2, Mamba
from torch.nn import functional as F

batch, length, dim = 2, 64, 16

model = Mamba(
    d_model=dim, # Model dimension d_model
    d_state=16,  # SSM state expansion factor
    d_conv=4,    # Local convolution width
    expand=2,    # Block expansion factor
).to("cuda")

x = torch.randn(batch, length, dim).to("cuda")
padded_x = F.pad(x, (0,0, 4,0))

y = model(x)
padded_y = model(padded_x)

unpadded_y = padded_y[:,4:]

print(f'Output max diff: {(unpadded_y - y).abs().max().item()}')
print(f'Output mean diff: {(unpadded_y - y).abs().mean().item()}')

However, these errors do stack after multiple layers, so you should use the flip_padded_hidden_states function just to be certain.

yair-schiff added 5 commits December 12, 2023 22:44

Implement bi-directionality

8d84478

Flip strategy checks

107eb60

Refactor bi-directionality to use Mamba thin wrapper

e6ca69c

Move wrapper to models and remove concat strategy

1ce8522

Remove unused import

52f57d6

yair-schiff force-pushed the bidirectional branch from 7692f59 to 52f57d6 Compare December 20, 2023 21:23

Skylion007 reviewed Dec 20, 2023

View reviewed changes

mamba_ssm/models/mixer_seq_simple.py Show resolved Hide resolved

Skylion007 reviewed Dec 20, 2023

View reviewed changes

mamba_ssm/models/mixer_seq_simple.py Show resolved Hide resolved

Use kwargs to avoid boilerplate

f45df27

albertfgu mentioned this pull request Jan 11, 2024

Bidirectional model? #99

Open

zigzagcai mentioned this pull request Mar 19, 2024

[Feature] Support variable-length sequences for mamba block #244

Open

albertfgu force-pushed the main branch 2 times, most recently from 6d45666 to 41d30ce Compare June 3, 2024 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement bi-directionality #52

Implement bi-directionality #52

yair-schiff commented Dec 13, 2023 •

edited

Loading

Skylion007 left a comment

sentialx commented Dec 24, 2023

yair-schiff commented Dec 24, 2023

jimmieliu commented Jan 2, 2024

yair-schiff commented Jan 3, 2024 •

edited

Loading

pengzhangzhi commented Jan 24, 2024

pengzhangzhi commented Jan 25, 2024

xuanwuji commented Jul 13, 2024

Museum7432 commented Jul 14, 2024 •

edited

Loading

Implement bi-directionality #52

Are you sure you want to change the base?

Implement bi-directionality #52

Conversation

yair-schiff commented Dec 13, 2023 • edited Loading

Skylion007 left a comment

Choose a reason for hiding this comment

sentialx commented Dec 24, 2023

yair-schiff commented Dec 24, 2023

jimmieliu commented Jan 2, 2024

yair-schiff commented Jan 3, 2024 • edited Loading

pengzhangzhi commented Jan 24, 2024

pengzhangzhi commented Jan 25, 2024

xuanwuji commented Jul 13, 2024

Museum7432 commented Jul 14, 2024 • edited Loading

yair-schiff commented Dec 13, 2023 •

edited

Loading

yair-schiff commented Jan 3, 2024 •

edited

Loading

Museum7432 commented Jul 14, 2024 •

edited

Loading