Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The definition of WHISPER_AHEADS_LARGE_V3_TURBO is missing in the whisper_alignment_heads_preset enum variable. #2462

Open
ppcfan opened this issue Oct 7, 2024 · 4 comments

Comments

@ppcfan
Copy link

ppcfan commented Oct 7, 2024

whisper.h:

enum whisper_alignment_heads_preset {
WHISPER_AHEADS_NONE,
WHISPER_AHEADS_N_TOP_MOST, // All heads from the N-top-most text-layers
WHISPER_AHEADS_CUSTOM,
WHISPER_AHEADS_TINY_EN,
WHISPER_AHEADS_TINY,
WHISPER_AHEADS_BASE_EN,
WHISPER_AHEADS_BASE,
WHISPER_AHEADS_SMALL_EN,
WHISPER_AHEADS_SMALL,
WHISPER_AHEADS_MEDIUM_EN,
WHISPER_AHEADS_MEDIUM,
WHISPER_AHEADS_LARGE_V1,
WHISPER_AHEADS_LARGE_V2,
WHISPER_AHEADS_LARGE_V3,
};

@lithium0003
Copy link

Needed value is

static const whisper_ahead g_aheads_large_v3_turbo[]  = { {2, 4}, {2, 11}, {3, 3}, {3, 6}, {3, 11}, {3, 14} };

static const std::map<whisper_alignment_heads_preset, whisper_aheads> g_aheads {
...
    { WHISPER_AHEADS_LARGE_V3_TURBO,  { 6, g_aheads_large_v3_turbo  } },
};

@ppcfan
Copy link
Author

ppcfan commented Oct 7, 2024

Needed value is

static const whisper_ahead g_aheads_large_v3_turbo[]  = { {2, 4}, {2, 11}, {3, 3}, {3, 6}, {3, 11}, {3, 14} };

static const std::map<whisper_alignment_heads_preset, whisper_aheads> g_aheads {
...
    { WHISPER_AHEADS_LARGE_V3_TURBO,  { 6, g_aheads_large_v3_turbo  } },
};

@lithium0003 Thank you so much! Could you explain how do you get this value?

@lithium0003
Copy link

Original source says,
https://github.com/openai/whisper/blob/25639fc17ddc013d56c594bfbf7644f2185fad84/whisper/__init__.py#L49

    "large-v3-turbo": b"ABzY8j^C+e0{>%RARaKHP%t(lGR*)0g!tONPyhe`",

https://github.com/openai/whisper/blob/25639fc17ddc013d56c594bfbf7644f2185fad84/whisper/model.py#L278

        array = np.frombuffer(
            gzip.decompress(base64.b85decode(dump)), dtype=bool
        ).copy()

So decode it like this,

import gzip, base64
import numpy as np
array = np.frombuffer(gzip.decompress(base64.b85decode(b"ABzY8j^C+e0{>%RARaKHP%t(lGR*)0g!tONPyhe`")), dtype=bool)
idx = np.where(array)[0]
n_text_head = 20
idx_pair = np.array(list(zip(idx // 20, idx % 20)))
idx
array([44, 51, 63, 66, 71, 74])

idx_pair
array([[ 2,  4],
       [ 2, 11],
       [ 3,  3],
       [ 3,  6],
       [ 3, 11],
       [ 3, 14]])

@chnbr
Copy link

chnbr commented Oct 10, 2024

Needed value is

static const whisper_ahead g_aheads_large_v3_turbo[]  = { {2, 4}, {2, 11}, {3, 3}, {3, 6}, {3, 11}, {3, 14} };

static const std::map<whisper_alignment_heads_preset, whisper_aheads> g_aheads {
...
    { WHISPER_AHEADS_LARGE_V3_TURBO,  { 6, g_aheads_large_v3_turbo  } },
};

@lithium0003 Thank you so much! Could you explain how do you get this value?

Is it sufficient to put these lines into the code ? I mean, how can we be sure that they are used ?
So far, I was able to load the v3-turbo model without these lines in code and it worked. The question is what it really did, though. I think the changes should be incorporated into the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants