Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper : add batched decoding #1486

Merged
merged 14 commits into from
Nov 15, 2023
Merged

whisper : add batched decoding #1486

merged 14 commits into from
Nov 15, 2023

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Nov 14, 2023

ref #1048

Description

This PR implements efficient batched decoding. With CUDA, the speed with 5 beams is the same as with 1 beam so there is likely no reason to ever use 1 beam. With Metal, using more than 1 beam results in some slowdown since the Metal kernels do not scale as well with the batch size. Still, it is much faster compared to what was on master and the improved transcription quality might be worth it.

Also, this PR:

  • enables beam search with 5 beams by default
  • GPU usage can be completely disabled for CUDA builds using CUDA_VISIBLE_DEVICES=-1
  • add batched decoding speed to the bench tool
  • bench tool now displays time/tok
  • multi-threaded sampling and logits processing
  • temperature step is now 0.2f instead of 0.4f

Tests

V100 bs=1 `master`
WHISPER_CUBLAS=1 make -j && ./main -m ./models-mnt/ggml-large-v2.bin -f ./samples/gb0.wav -bs 1

system_info: n_threads = 4 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/gb0.wav' (2037760 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.240]   Good morning. This Tuesday is election day.
[00:00:03.240 --> 00:00:06.000]   After months of spirited debate and vigorous campaigning,
[00:00:06.000 --> 00:00:08.640]   the time has come for Americans to make important decisions
[00:00:08.640 --> 00:00:10.120]   about our nation's future.
[00:00:10.120 --> 00:00:13.760]   I encourage all Americans to go to the polls and vote.
[00:00:13.760 --> 00:00:16.120]   Election season brings out the spirit of competition
[00:00:16.120 --> 00:00:18.080]   between our political parties,
[00:00:18.080 --> 00:00:20.280]   and that competition is an essential part
[00:00:20.280 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:26.000]   Republicans, Democrats, and Independents
[00:00:26.000 --> 00:00:29.140]   can find common ground on at least one point,
[00:00:29.140 --> 00:00:31.560]   our system of representative democracy
[00:00:31.560 --> 00:00:34.440]   is one of America's greatest strengths.
[00:00:34.440 --> 00:00:36.280]   The United States was founded on the belief
[00:00:36.280 --> 00:00:38.280]   that all men are created equal.
[00:00:38.280 --> 00:00:41.440]   Every election day, millions of Americans of all races,
[00:00:41.440 --> 00:00:43.920]   religions, and backgrounds step into voting booths
[00:00:43.920 --> 00:00:45.320]   throughout the nation.
[00:00:45.320 --> 00:00:47.780]   Whether they are rich or poor, old or young,
[00:00:47.780 --> 00:00:50.680]   each of them has an equal share in choosing the path
[00:00:50.680 --> 00:00:52.440]   that our country will take.
[00:00:52.440 --> 00:00:54.880]   And every ballot they cast is a reminder
[00:00:54.880 --> 00:00:58.300]   that our founding principles are alive and well.
[00:00:58.300 --> 00:00:59.760]   Voting is one of the great privileges
[00:00:59.760 --> 00:01:01.760]   of American citizenship,
[00:01:01.760 --> 00:01:04.520]   and it has always required brave defenders.
[00:01:04.520 --> 00:01:06.040]   As you head to the polls next week,
[00:01:06.040 --> 00:01:08.420]   remember the sacrifices that have been made
[00:01:08.420 --> 00:01:11.040]   by generations of Americans in uniform
[00:01:11.040 --> 00:01:13.000]   to preserve our way of life.
[00:01:13.000 --> 00:01:14.840]   From Bunker Hill to Baghdad,
[00:01:14.840 --> 00:01:16.720]   the men and women of American Armed Forces
[00:01:16.720 --> 00:01:19.960]   have been devoted guardians of our democracy.
[00:01:19.960 --> 00:01:21.800]   All of us owe them and their families
[00:01:21.800 --> 00:01:25.240]   a special debt of gratitude on Election Day.
[00:01:25.240 --> 00:01:27.560]   Americans should also remember the important example
[00:01:27.560 --> 00:01:30.080]   that our election set throughout the world.
[00:01:30.080 --> 00:01:32.100]   Young democracies from Georgia and Ukraine
[00:01:32.100 --> 00:01:34.560]   to Afghanistan and Iraq can look to the United States
[00:01:34.560 --> 00:01:37.560]   for proof that self-government can endure,
[00:01:37.560 --> 00:01:40.440]   and nations that still live under tyranny and oppression
[00:01:40.440 --> 00:01:44.120]   can find hope and inspiration in our commitment to liberty.
[00:01:44.120 --> 00:01:45.200]   For more than two centuries,
[00:01:45.200 --> 00:01:47.800]   Americans have demonstrated the ability of free people
[00:01:47.800 --> 00:01:49.640]   to choose their own leaders.
[00:01:49.640 --> 00:01:51.920]   Our nation has flourished because of its commitment
[00:01:51.920 --> 00:01:54.680]   to trusting the wisdom of our citizenry.
[00:01:54.680 --> 00:01:56.120]   In this year's election,
[00:01:56.120 --> 00:01:58.480]   we will see this tradition continue,
[00:01:58.480 --> 00:02:00.320]   and we will be reminded once again
[00:02:00.320 --> 00:02:02.680]   that we are blessed to live in a free nation
[00:02:02.680 --> 00:02:05.560]   guided by the will of the people.
[00:02:05.560 --> 00:02:06.760]   Thank you for listening.


whisper_print_timings:     load time =  1413.19 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   146.51 ms
whisper_print_timings:   sample time =   356.33 ms /   548 runs (    0.65 ms per run)
whisper_print_timings:   encode time =   892.13 ms /     5 runs (  178.43 ms per run)
whisper_print_timings:   decode time =  8836.33 ms /   543 runs (   16.27 ms per run)
whisper_print_timings:   prompt time =   266.43 ms /     5 runs (   53.29 ms per run)
whisper_print_timings:    total time = 11927.28 ms
V100 bs=5 `master`
WHISPER_CUBLAS=1 make -j && ./main -m ./models-mnt/ggml-large-v2.bin -f ./samples/gb0.wav -bs 5

system_info: n_threads = 4 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/gb0.wav' (2037760 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.260]   Good morning. This Tuesday is election day.
[00:00:03.260 --> 00:00:06.020]   After months of spirited debate and vigorous campaigning,
[00:00:06.020 --> 00:00:08.640]   the time has come for Americans to make important decisions
[00:00:08.640 --> 00:00:10.120]   about our nation's future.
[00:00:10.120 --> 00:00:13.760]   I encourage all Americans to go to the polls and vote.
[00:00:13.760 --> 00:00:16.120]   Election season brings out the spirit of competition
[00:00:16.120 --> 00:00:18.080]   between our political parties.
[00:00:18.080 --> 00:00:20.260]   And that competition is an essential part
[00:00:20.260 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:26.000]   Republicans, Democrats, and Independents
[00:00:26.000 --> 00:00:29.140]   can find common ground on at least one point.
[00:00:29.140 --> 00:00:31.560]   Our system of representative democracy
[00:00:31.560 --> 00:00:34.440]   is one of America's greatest strengths.
[00:00:34.440 --> 00:00:36.280]   The United States was founded on the belief
[00:00:36.280 --> 00:00:38.260]   that all men are created equal.
[00:00:38.260 --> 00:00:40.760]   Every election day, millions of Americans
[00:00:40.760 --> 00:00:42.680]   of all races, religions, and backgrounds
[00:00:42.680 --> 00:00:45.280]   step into voting booths throughout the nation.
[00:00:45.280 --> 00:00:47.760]   Whether they are rich or poor, old or young,
[00:00:47.760 --> 00:00:50.720]   each of them has an equal share in choosing the path
[00:00:50.720 --> 00:00:52.440]   that our country will take.
[00:00:52.440 --> 00:00:54.920]   And every ballot they cast is a reminder
[00:00:54.920 --> 00:00:58.280]   that our founding principles are alive and well.
[00:00:58.280 --> 00:00:59.780]   Voting is one of the great privileges
[00:00:59.780 --> 00:01:01.760]   of American citizenship.
[00:01:01.760 --> 00:01:04.520]   And it has always required brave defenders.
[00:01:04.520 --> 00:01:06.040]   As you head to the polls next week,
[00:01:06.040 --> 00:01:08.440]   remember the sacrifices that have been made
[00:01:08.440 --> 00:01:11.020]   by generations of Americans in uniform
[00:01:11.020 --> 00:01:12.980]   to preserve our way of life.
[00:01:12.980 --> 00:01:14.820]   From Bunker Hill to Baghdad,
[00:01:14.820 --> 00:01:16.740]   the men and women of American Armed Forces
[00:01:16.740 --> 00:01:19.940]   have been devoted guardians of our democracy.
[00:01:19.940 --> 00:01:21.780]   All of us owe them and their families
[00:01:21.780 --> 00:01:24.260]   a special debt of gratitude on election day.
[00:01:24.260 --> 00:01:27.540]   Americans should also remember the important example
[00:01:27.540 --> 00:01:30.060]   that our election set throughout the world.
[00:01:30.060 --> 00:01:32.100]   Young democracies from Georgia and Ukraine
[00:01:32.100 --> 00:01:34.580]   to Afghanistan and Iraq can look to the United States
[00:01:34.580 --> 00:01:37.520]   for proof that self-government can endure.
[00:01:37.520 --> 00:01:40.400]   And nations that still live under tyranny and oppression
[00:01:40.400 --> 00:01:44.080]   can find hope and inspiration in our commitment to liberty.
[00:01:44.080 --> 00:01:45.200]   For more than two centuries,
[00:01:45.200 --> 00:01:47.820]   Americans have demonstrated the ability of free people
[00:01:47.820 --> 00:01:49.600]   to choose their own leaders.
[00:01:49.600 --> 00:01:51.900]   Our nation has flourished because of its commitment
[00:01:51.900 --> 00:01:54.620]   to trusting the wisdom of our citizenry.
[00:01:54.620 --> 00:01:56.060]   In this year's election,
[00:01:56.060 --> 00:01:58.440]   we will see this tradition continue.
[00:01:58.440 --> 00:02:00.260]   And we will be reminded once again
[00:02:00.260 --> 00:02:02.620]   that we are blessed to live in a free nation
[00:02:02.620 --> 00:02:05.500]   guided by the will of the people.
[00:02:05.500 --> 00:02:06.700]   Thank you for listening.


whisper_print_timings:     load time =  1480.84 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   141.40 ms
whisper_print_timings:   sample time = 13516.33 ms /  2744 runs (    4.93 ms per run)
whisper_print_timings:   encode time =   898.57 ms /     5 runs (  179.71 ms per run)
whisper_print_timings:   decode time = 43944.78 ms /  2719 runs (   16.16 ms per run)
whisper_print_timings:   prompt time =   265.78 ms /     5 runs (   53.16 ms per run)
whisper_print_timings:    total time = 60280.66 ms
V100 bs=5 `batched`
WHISPER_CUBLAS=1 make -j && ./main -m ./models-mnt/ggml-large-v2.bin -f ./samples/gb0.wav -bs 5 -t 6

system_info: n_threads = 6 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/gb0.wav' (2037760 samples, 127.4 sec), 6 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.240]   Good morning. This Tuesday is election day.
[00:00:03.240 --> 00:00:06.000]   After months of spirited debate and vigorous campaigning,
[00:00:06.000 --> 00:00:08.640]   the time has come for Americans to make important decisions
[00:00:08.640 --> 00:00:10.120]   about our nation's future.
[00:00:10.120 --> 00:00:13.740]   I encourage all Americans to go to the polls and vote.
[00:00:13.740 --> 00:00:16.120]   Election season brings out the spirit of competition
[00:00:16.120 --> 00:00:18.080]   between our political parties.
[00:00:18.080 --> 00:00:20.260]   And that competition is an essential part
[00:00:20.260 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:26.000]   Republicans, Democrats, and Independents
[00:00:26.000 --> 00:00:29.140]   can find common ground on at least one point.
[00:00:29.140 --> 00:00:31.560]   Our system of representative democracy
[00:00:31.560 --> 00:00:34.440]   is one of America's greatest strengths.
[00:00:34.440 --> 00:00:36.240]   The United States was founded on the belief
[00:00:36.240 --> 00:00:38.280]   that all men are created equal.
[00:00:38.280 --> 00:00:41.440]   Every election day, millions of Americans of all races,
[00:00:41.440 --> 00:00:43.920]   religions, and backgrounds step into voting booths
[00:00:43.920 --> 00:00:45.320]   throughout the nation.
[00:00:45.320 --> 00:00:47.780]   Whether they are rich or poor, old or young,
[00:00:47.780 --> 00:00:50.680]   each of them has an equal share in choosing the path
[00:00:50.680 --> 00:00:52.420]   that our country will take.
[00:00:52.420 --> 00:00:54.900]   And every ballot they cast is a reminder
[00:00:54.900 --> 00:00:58.300]   that our founding principles are alive and well.
[00:00:58.300 --> 00:00:59.760]   Voting is one of the great privileges
[00:00:59.760 --> 00:01:01.760]   of American citizenship.
[00:01:01.760 --> 00:01:04.540]   And it has always required brave defenders.
[00:01:04.540 --> 00:01:06.040]   As you head to the polls next week,
[00:01:06.040 --> 00:01:08.400]   remember the sacrifices that have been made
[00:01:08.400 --> 00:01:11.040]   by generations of Americans in uniform
[00:01:11.040 --> 00:01:13.000]   to preserve our way of life.
[00:01:13.000 --> 00:01:14.840]   From Bunker Hill to Baghdad,
[00:01:14.840 --> 00:01:16.720]   the men and women of American Armed Forces
[00:01:16.720 --> 00:01:19.940]   have been devoted guardians of our democracy.
[00:01:19.940 --> 00:01:21.820]   All of us owe them and their families
[00:01:21.820 --> 00:01:25.260]   a special debt of gratitude on Election Day.
[00:01:25.260 --> 00:01:27.560]   Americans should also remember the important example
[00:01:27.560 --> 00:01:30.060]   that our election set throughout the world.
[00:01:30.060 --> 00:01:32.100]   Young democracies from Georgia and Ukraine
[00:01:32.100 --> 00:01:34.560]   to Afghanistan and Iraq can look to the United States
[00:01:34.560 --> 00:01:37.540]   for proof that self-government can endure.
[00:01:37.540 --> 00:01:40.400]   And nations that still live under tyranny and oppression
[00:01:40.400 --> 00:01:44.100]   can find hope and inspiration in our commitment to liberty.
[00:01:44.100 --> 00:01:45.240]   For more than two centuries,
[00:01:45.240 --> 00:01:47.140]   Americans have demonstrated the ability
[00:01:47.140 --> 00:01:49.620]   of free people to choose their own leaders.
[00:01:49.620 --> 00:01:51.920]   Our nation has flourished because of its commitment
[00:01:51.920 --> 00:01:54.660]   to trusting the wisdom of our citizenry.
[00:01:54.660 --> 00:01:56.080]   In this year's election,
[00:01:56.080 --> 00:01:58.460]   we will see this tradition continue.
[00:01:58.460 --> 00:02:00.260]   And we will be reminded once again
[00:02:00.260 --> 00:02:02.620]   that we are blessed to live in a free nation
[00:02:02.620 --> 00:02:05.500]   guided by the will of the people.
[00:02:05.500 --> 00:02:06.760]   Thank you for listening.


whisper_print_timings:     load time =  1402.55 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   114.69 ms
whisper_print_timings:   sample time =  1066.75 ms /  2739 runs (    0.39 ms per run)
whisper_print_timings:   encode time =   892.24 ms /     5 runs (  178.45 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   prompt time =  8071.69 ms /   548 runs (   14.73 ms per run)
whisper_print_timings:    total time = 11570.88 ms

In this case, we observe beam search with 5 beams is the same speed as 1 beam


Benches

GPU Config Model Th Enc. Dec. Bch5 PP Commit
V100 AVX2 BLAS CUDA tiny 1 9.00 1.80 0.36 0.02 ae1bd69
V100 AVX2 BLAS CUDA tiny-q5_0 1 8.53 1.51 0.37 0.02 ae1bd69
V100 AVX2 BLAS CUDA tiny-q5_1 1 8.52 1.38 0.31 0.02 ae1bd69
V100 AVX2 BLAS CUDA base 1 14.89 2.56 0.53 0.03 ae1bd69
V100 AVX2 BLAS CUDA base-q5_0 1 15.16 1.81 0.46 0.03 ae1bd69
V100 AVX2 BLAS CUDA base-q5_1 1 15.18 1.85 0.42 0.03 ae1bd69
V100 AVX2 BLAS CUDA small 1 40.56 5.00 1.01 0.05 ae1bd69
V100 AVX2 BLAS CUDA small-q5_0 1 41.49 3.47 0.99 0.05 ae1bd69
V100 AVX2 BLAS CUDA small-q5_1 1 41.32 3.39 0.86 0.05 ae1bd69
V100 AVX2 BLAS CUDA medium 1 105.38 10.39 1.88 0.11 ae1bd69
V100 AVX2 BLAS CUDA medium-q5_0 1 107.73 6.56 2.17 0.12 ae1bd69
V100 AVX2 BLAS CUDA medium-q5_1 1 107.83 6.58 1.85 0.12 ae1bd69
V100 AVX2 BLAS CUDA large 1 172.64 15.81 2.70 0.17 ae1bd69
V100 AVX2 BLAS CUDA large-q5_0 1 177.72 9.38 3.25 0.19 ae1bd69
V100 AVX2 BLAS CUDA large-q5_1 1 177.50 8.95 2.66 0.19 ae1bd69
GPU Config Model Th Enc. Dec. Bch5 PP Commit
M2 Ultra NEON BLAS METAL tiny 1 12.66 1.43 0.50 0.01 ae1bd69
M2 Ultra NEON BLAS METAL tiny-q5_0 1 10.87 1.39 0.53 0.01 ae1bd69
M2 Ultra NEON BLAS METAL tiny-q5_1 1 10.97 1.40 0.52 0.01 ae1bd69
M2 Ultra NEON BLAS METAL base 1 18.55 2.02 0.77 0.02 ae1bd69
M2 Ultra NEON BLAS METAL base-q5_0 1 21.15 1.98 0.82 0.02 ae1bd69
M2 Ultra NEON BLAS METAL base-q5_1 1 20.59 1.96 0.82 0.02 ae1bd69
M2 Ultra NEON BLAS METAL small 1 51.49 4.01 1.73 0.05 ae1bd69
M2 Ultra NEON BLAS METAL small-q5_0 1 56.95 4.12 1.88 0.06 ae1bd69
M2 Ultra NEON BLAS METAL small-q5_1 1 56.95 4.12 1.85 0.06 ae1bd69
M2 Ultra NEON BLAS METAL medium 1 140.89 8.46 3.96 0.12 ae1bd69
M2 Ultra NEON BLAS METAL medium-q5_0 1 160.02 8.29 4.12 0.14 ae1bd69
M2 Ultra NEON BLAS METAL medium-q5_1 1 159.64 8.42 4.15 0.14 ae1bd69
M2 Ultra NEON BLAS METAL large 1 247.73 12.00 5.94 0.22 ae1bd69
M2 Ultra NEON BLAS METAL large-q5_0 1 286.62 12.03 6.59 0.26 ae1bd69
M2 Ultra NEON BLAS METAL large-q5_1 1 286.16 11.88 6.36 0.26 ae1bd69

@ggerganov ggerganov marked this pull request as ready for review November 14, 2023 17:12
@ggerganov
Copy link
Owner Author

This should be ready to merge - will do so tomorrow

@bobqianic
Copy link
Collaborator

bobqianic commented Nov 14, 2023

This should be ready to merge - will do so tomorrow

During testing, I encountered a specific error which exhibits a 100% reproducibility rate.
whisper_full_with_state: failed to decode

Audio: L'Océan et l'Humanité, destins liés ! _ Lamya Essemlali _ TEDxOrléans (320 kbps).zip

root@imperial-88694897d9-ee131614:~/whisper.cpp-ae1bd690419032c95406940c8533a905cb1ae026# ./main -m ./ggml-large-v2.bin -f "../L’Océan et l’Humanité, destins liés ! _ Lamya Essemlali _ TEDxOrléans (320 kbps).wav" -bs 5 -l auto
whisper_init_from_file_with_params_no_state: loading model from './ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla V100-PCIE-32GB, compute capability 7.0
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  2973.08 MB
whisper_model_load: model size    = 2972.62 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  140.00 MB
whisper_init_state: kv cross size =  234.38 MB
whisper_init_state: compute buffer (conv)   =   29.49 MB
whisper_init_state: compute buffer (encode) =  202.52 MB
whisper_init_state: compute buffer (cross)  =    8.89 MB
whisper_init_state: compute buffer (decode) =   94.57 MB

system_info: n_threads = 4 / 52 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing '../L’Océan et l’Humanité, destins liés ! _ Lamya Essemlali _ TEDxOrléans (320 kbps).wav' (12786022 samples, 799.1 sec), 4 threads, 1 processors, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: fr (p = 0.989031)

[00:00:00.000 --> 00:00:13.820]   [battements de coeur]
[00:00:13.820 --> 00:00:18.200]   Bonsoir.
[00:00:18.200 --> 00:00:24.480]   Notre planète est recouverte à 70 % d'océans,
[00:00:24.480 --> 00:00:28.400]   et pourtant, étrangement, on a choisi de l'appeler la Terre.
[00:00:29.600 --> 00:00:32.400]   Le poète Edcott Williams a une vision bien plus objective
[00:00:32.400 --> 00:00:34.160]   et moins anthropocentrique,
[00:00:34.160 --> 00:00:36.880]   quand il dit que vu de l'espace, la planète est bleue,
[00:00:36.880 --> 00:00:41.640]   vu de l'espace, elle est le territoire non pas des hommes, mais des baleines.
[00:00:41.640 --> 00:00:47.040]   Et pourtant, on vient tous de l'océan, c'est le berceau de la vie,
[00:00:47.040 --> 00:00:48.960]   même si on l'a oublié.
[00:00:48.960 --> 00:00:51.920]   L'océan est partout, dans les glaciers, dans les rivières,
[00:00:51.920 --> 00:00:55.520]   dans les nappes phréatiques, dans les cellules des êtres vivants,
[00:00:55.520 --> 00:00:57.160]   et dans nos veines.
whisper_full_with_state: failed to decode
./main: failed to process audio
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-PCIE-32GB           On  | 00000000:65:01.0 Off |                  Off |
| N/A   31C    P0              27W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

@ggerganov
Copy link
Owner Author

ggerganov commented Nov 15, 2023

@bobqianic Thanks for the feedback. If increase the factor to 5 here, does it fix the issue?

whisper.cpp/whisper.cpp

Lines 3043 to 3046 in 6c8a003

// at this point, we don't know yet how many decoders will be used, so we overallocate 3x ctx
// in theory, there can be a case where this is not enough, but in practice it should always be enough
const int factor = 3;

Edit: I just noticed you weren't using the latest version of this branch, so you had factor = 2 instead of factor = 3.
With commit 6c8a003, the issue should be fixed

@bobqianic
Copy link
Collaborator

@bobqianic Thanks for the feedback. If increase the factor to 5 here, does it fix the issue?

whisper.cpp/whisper.cpp

Lines 3043 to 3046 in 6c8a003

// at this point, we don't know yet how many decoders will be used, so we overallocate 3x ctx
// in theory, there can be a case where this is not enough, but in practice it should always be enough
const int factor = 3;

Edit: I just noticed you weren't using the latest version of this branch, so you had factor = 2 instead of factor = 3. With commit 6c8a003, the issue should be fixed

Sorry for the late reply, I was busy with some lab work this morning. Just checked, and yeah, you're right. The test version I used had factor = 2. Just now, I tested 4c245ea on both Windows and Linux. Looks like there's still something not quite right. Like, the transcription runs fine on Windows but the output looks weird. See this è. On Linux, it's not even running, just keeps looping and doesn't output anything.

whisper_full_with_state: auto-detected language: fr (p = 0.988806)

[00:00:00.000 --> 00:00:03.000]   (Tic tac de l'escalier)
[00:00:03.000 --> 00:00:06.200]   (Tic tac de l'escalier)
[00:00:06.200 --> 00:00:09.400]   (Tic tac de l'escalier)
[00:00:09.400 --> 00:00:12.600]   (Tic tac de l'escalier)
[00:00:12.600 --> 00:00:15.800]   (Tic tac de l'escalier)
[00:00:15.800 --> 00:00:18.200]   Bonsoir.
[00:00:18.200 --> 00:00:24.600]   Notre planète est recouverte à 70 % d'océans,
[00:00:24.600 --> 00:00:28.600]   et pourtant, étrangement, on a choisi de l'appeler la Terre.
[00:00:29.600 --> 00:00:32.400]   Le poète Hedcote Williams a une vision bien plus objective
[00:00:32.400 --> 00:00:34.200]   et moins anthropocentrique,
[00:00:34.200 --> 00:00:36.800]   quand il dit que, vu de l'espace, la planète est bleue,
[00:00:36.800 --> 00:00:40.200]   vu de l'espace, elle est le territoire non pas des hommes,
[00:00:40.200 --> 00:00:41.600]   mais des baleines.
[00:00:41.600 --> 00:00:45.600]   Et pourtant, on vient tous de l'océan,
[00:00:45.600 --> 00:00:48.800]   c'est le berceau de la vie, même si on l'a oublié.
[00:00:48.800 --> 00:00:51.800]   L'océan est partout, dans les glaciers, dans les rivières,
[00:00:51.800 --> 00:00:55.400]   dans les nappes phréatiques, dans les cellules des êtres vivants,
[00:00:55.400 --> 00:00:57.000]   et dans nos veines.

@ggerganov
Copy link
Owner Author

Are you using large v2?

@bobqianic
Copy link
Collaborator

Are you using large v2?

Yes

@ggerganov
Copy link
Owner Author

ggerganov commented Nov 15, 2023

Which command are you using? It works on my end:

WHISPER_CUBLAS=1 make -j && ./main -m ./models-mnt/ggml-large-v2.bin -f ./samples/fr0.wav -bs 5 -t 6 -l auto
I whisper.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS:  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC:       cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX:      g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c whisper.cpp -o whisper.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o main -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/bench/bench.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o bench -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/quantize/quantize.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o quantize -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
./main -h

usage: ./main [options] file0.wav file1.wav ...

options:
  -h,        --help              [default] show this help message and exit
  -t N,      --threads N         [4      ] number of threads to use during computation
  -p N,      --processors N      [1      ] number of processors to use during computation
  -ot N,     --offset-t N        [0      ] time offset in milliseconds
  -on N,     --offset-n N        [0      ] segment index offset
  -d  N,     --duration N        [0      ] duration of audio to process in milliseconds
  -mc N,     --max-context N     [-1     ] maximum number of text context tokens to store
  -ml N,     --max-len N         [0      ] maximum segment length in characters
  -sow,      --split-on-word     [false  ] split on word rather than on token
  -bo N,     --best-of N         [5      ] number of best candidates to keep
  -bs N,     --beam-size N       [5      ] beam size for beam search
  -wt N,     --word-thold N      [0.01   ] word timestamp probability threshold
  -et N,     --entropy-thold N   [2.40   ] entropy threshold for decoder fail
  -lpt N,    --logprob-thold N   [-1.00  ] log probability threshold for decoder fail
  -debug,    --debug-mode        [false  ] enable debug mode (eg. dump log_mel)
  -tr,       --translate         [false  ] translate from source language to english
  -di,       --diarize           [false  ] stereo audio diarization
  -tdrz,     --tinydiarize       [false  ] enable tinydiarize (requires a tdrz model)
  -nf,       --no-fallback       [false  ] do not use temperature fallback while decoding
  -otxt,     --output-txt        [false  ] output result in a text file
  -ovtt,     --output-vtt        [false  ] output result in a vtt file
  -osrt,     --output-srt        [false  ] output result in a srt file
  -olrc,     --output-lrc        [false  ] output result in a lrc file
  -owts,     --output-words      [false  ] output script for generating karaoke video
  -fp,       --font-path         [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
  -ocsv,     --output-csv        [false  ] output result in a CSV file
  -oj,       --output-json       [false  ] output result in a JSON file
  -ojf,      --output-json-full  [false  ] include more information in the JSON file
  -of FNAME, --output-file FNAME [       ] output file path (without file extension)
  -ps,       --print-special     [false  ] print special tokens
  -pc,       --print-colors      [false  ] print colors
  -pp,       --print-progress    [false  ] print progress
  -nt,       --no-timestamps     [false  ] do not print timestamps
  -l LANG,   --language LANG     [en     ] spoken language ('auto' for auto-detect)
  -dl,       --detect-language   [false  ] exit after automatically detecting language
             --prompt PROMPT     [       ] initial prompt
  -m FNAME,  --model FNAME       [models/ggml-base.en.bin] model path
  -f FNAME,  --file FNAME        [       ] input WAV file path
  -oved D,   --ov-e-device DNAME [CPU    ] the OpenVINO device used for encode inference
  -ls,       --log-score         [false  ] log best decoder scores of tokens
  -ng,       --no-gpu            [false  ] disable GPU

whisper_init_from_file_with_params_no_state: loading model from './models-mnt/ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla V100-PCIE-16GB, compute capability 7.0
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  2973.08 MB
whisper_model_load: model size    = 2972.62 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  210.00 MB
whisper_init_state: kv cross size =  234.38 MB
whisper_init_state: compute buffer (conv)   =   29.49 MB
whisper_init_state: compute buffer (encode) =  202.52 MB
whisper_init_state: compute buffer (cross)  =    8.89 MB
whisper_init_state: compute buffer (decode) =   94.57 MB

system_info: n_threads = 6 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/fr0.wav' (12786022 samples, 799.1 sec), 6 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: fr (p = 0.989031)

[00:00:00.000 --> 00:00:13.820]   [battements de coeur]
[00:00:13.820 --> 00:00:18.200]   Bonsoir.
[00:00:18.200 --> 00:00:24.480]   Notre planète est recouverte à 70 % d'océans,
[00:00:24.480 --> 00:00:28.400]   et pourtant, étrangement, on a choisi de l'appeler la Terre.
[00:00:29.600 --> 00:00:32.400]   Le poète Edcott Williams a une vision bien plus objective
[00:00:32.400 --> 00:00:34.160]   et moins anthropocentrique,
[00:00:34.160 --> 00:00:36.880]   quand il dit que vu de l'espace, la planète est bleue,
[00:00:36.880 --> 00:00:41.640]   vu de l'espace, elle est le territoire non pas des hommes, mais des baleines.
[00:00:41.640 --> 00:00:47.040]   Et pourtant, on vient tous de l'océan, c'est le berceau de la vie,
[00:00:47.040 --> 00:00:48.960]   même si on l'a oublié.
[00:00:48.960 --> 00:00:51.920]   L'océan est partout, dans les glaciers, dans les rivières,
[00:00:51.920 --> 00:00:55.520]   dans les nappes phréatiques, dans les cellules des êtres vivants,
[00:00:55.520 --> 00:00:57.160]   et dans nos veines.
[00:00:58.760 --> 00:01:01.520]   Etrangement, c'est John Fitzgerald Kennedy
[00:01:01.520 --> 00:01:04.440]   qui l'a assez bien illustré dans cette citation.
[00:01:04.440 --> 00:01:06.960]   "Il est un fait biologique intéressant que chacun d'entre nous
[00:01:06.960 --> 00:01:09.880]   "ait dans les veines un pourcentage identique de sel dans le sang
[00:01:09.880 --> 00:01:11.880]   "à celui qui existe dans les océans.
[00:01:11.880 --> 00:01:14.280]   "Nous avons donc tous du sel dans notre sang,
[00:01:14.280 --> 00:01:16.120]   "dans notre sueur, dans nos larmes.
[00:01:16.120 --> 00:01:19.080]   "Nous sommes liés à l'océan, et quand nous retournons à la mer,
[00:01:19.080 --> 00:01:22.240]   "que ce soit pour naviguer ou pour la regarder,
[00:01:22.240 --> 00:01:24.680]   "nous retournons d'où nous venons."
[00:01:26.200 --> 00:01:29.400]   Et pourtant, cet océan, on le connaît très, très mal.
[00:01:29.400 --> 00:01:33.560]   Ça reste un monde assez étrange et étranger,

@ggerganov
Copy link
Owner Author

Try 270b1e4 and let me know if the issue is resolved.

@bobqianic
Copy link
Collaborator

bobqianic commented Nov 15, 2023

Try 270b1e4 and let me know if the issue is resolved.

It works!

Edit: The strange characters mentioned earlier are likely caused by Windows not setting the terminal to UTF-8.

Linux

root@imperial-5fcb458f92-7e2f426a:~/whisper.cpp-270b1e48dbdcb68679b86ccf073455c506907809# ./main -m ../ggml-large-v2.bin -f ../testfrench.wav -bs 5 -l auto
whisper_init_from_file_with_params_no_state: loading model from '../ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla V100-PCIE-32GB, compute capability 7.0
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  2973.08 MB
whisper_model_load: model size    = 2972.62 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  210.00 MB
whisper_init_state: kv cross size =  234.38 MB
whisper_init_state: compute buffer (conv)   =   29.49 MB
whisper_init_state: compute buffer (encode) =  202.52 MB
whisper_init_state: compute buffer (cross)  =    8.89 MB
whisper_init_state: compute buffer (decode) =   94.57 MB

system_info: n_threads = 4 / 52 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing '../testfrench.wav' (12786022 samples, 799.1 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: fr (p = 0.989031)

[00:00:00.000 --> 00:00:13.820]   [battements de coeur]
[00:00:13.820 --> 00:00:18.200]   Bonsoir.
[00:00:18.200 --> 00:00:24.480]   Notre planète est recouverte à 70 % d'océans,
[00:00:24.480 --> 00:00:28.400]   et pourtant, étrangement, on a choisi de l'appeler la Terre.
[00:00:29.600 --> 00:00:32.400]   Le poète Edcott Williams a une vision bien plus objective
[00:00:32.400 --> 00:00:34.160]   et moins anthropocentrique,
[00:00:34.160 --> 00:00:36.880]   quand il dit que vu de l'espace, la planète est bleue,
[00:00:36.880 --> 00:00:41.640]   vu de l'espace, elle est le territoire non pas des hommes, mais des baleines.
[00:00:41.640 --> 00:00:47.040]   Et pourtant, on vient tous de l'océan, c'est le berceau de la vie,
[00:00:47.040 --> 00:00:48.960]   même si on l'a oublié.
[00:00:48.960 --> 00:00:51.920]   L'océan est partout, dans les glaciers, dans les rivières,
[00:00:51.920 --> 00:00:55.520]   dans les nappes phréatiques, dans les cellules des êtres vivants,
[00:00:55.520 --> 00:00:57.160]   et dans nos veines.
[00:00:58.760 --> 00:01:01.520]   Etrangement, c'est John Fitzgerald Kennedy
[00:01:01.520 --> 00:01:04.440]   qui l'a assez bien illustré dans cette citation.
[00:01:04.440 --> 00:01:06.960]   "Il est un fait biologique intéressant que chacun d'entre nous
[00:01:06.960 --> 00:01:09.880]   "ait dans les veines un pourcentage identique de sel dans le sang
[00:01:09.880 --> 00:01:11.880]   "à celui qui existe dans les océans.
[00:01:11.880 --> 00:01:14.280]   "Nous avons donc tous du sel dans notre sang,
[00:01:14.280 --> 00:01:16.120]   "dans notre sueur, dans nos larmes.
[00:01:16.120 --> 00:01:19.080]   "Nous sommes liés à l'océan, et quand nous retournons à la mer,
[00:01:19.080 --> 00:01:22.240]   "que ce soit pour naviguer ou pour la regarder,
[00:01:22.240 --> 00:01:24.680]   "nous retournons d'où nous venons."
[00:01:26.200 --> 00:01:29.400]   Et pourtant, cet océan, on le connaît très, très mal.
[00:01:29.400 --> 00:01:33.560]   Ça reste un monde assez étrange et étranger,
[00:01:33.560 --> 00:01:36.760]   et qui fait peur, parfois.
[00:01:36.760 --> 00:01:45.680]   Cela dit, en fait, c'est loin d'être une masse d'eau inerte,
[00:01:45.680 --> 00:01:51.640]   une étendue d'eau qui ne sert à rien, en fait.
[00:01:51.840 --> 00:01:56.400]   C'est la vie marine qui fait que cet océan
[00:01:56.400 --> 00:02:00.520]   est un principal régulateur du climat,
[00:02:00.520 --> 00:02:03.360]   le principal puits de carbone,
[00:02:03.360 --> 00:02:11.000]   et aussi le principal producteur d'oxygène.
[00:02:11.000 --> 00:02:15.440]   Entre 50 et 70 % de l'oxygène vient de l'océan.
[00:02:15.440 --> 00:02:19.480]   En fait, plus d'une inspiration sur deux que vous prenez,
[00:02:19.480 --> 00:02:21.600]   vous la devez à l'océan.
[00:02:21.800 --> 00:02:25.160]   Pas tout à fait à l'océan, vous la devez à la vie marine.
[00:02:25.160 --> 00:02:29.200]   La vie marine, c'est elle qui permet à cette masse d'eau, justement,
[00:02:29.200 --> 00:02:32.040]   de ne pas être juste une masse d'eau inerte,
[00:02:32.040 --> 00:02:35.080]   mais d'être cette machinerie qui nous permet de vivre.
[00:02:35.080 --> 00:02:38.120]   Dans la vie marine, ça part du phytoplankton,
[00:02:38.120 --> 00:02:41.720]   qui lui fournit l'oxygène, jusqu'aux grandes baleines,
[00:02:41.720 --> 00:02:47.080]   en passant par les grands prédateurs, les thons, les dauphins.

@ggerganov ggerganov merged commit b6c5f49 into master Nov 15, 2023
70 of 74 checks passed
felrock pushed a commit to felrock/whisper.cpp that referenced this pull request Nov 18, 2023
* whisper : add whisper_batch

* whisper : move kv_self to whisper_state

* whisper : full batched decoding support

* whisper : fix memory leak in whisper_batch

* whisper : fix mem leak again + remove oboslete function

* whisper : clear kv cache when using whisper_decode API

* whisper : speed-up sampling

* whisper : fix decoders initializer

* bench : add batch size 5 bench

* whisper : add comment about the KV cache size

* whisper : add check for max number of decoders

* whisper : avoid starting sampling threads with bs=1

* whisper : enable beam-search by default

* cuda : sync llama.cpp fixes
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
* whisper : add whisper_batch

* whisper : move kv_self to whisper_state

* whisper : full batched decoding support

* whisper : fix memory leak in whisper_batch

* whisper : fix mem leak again + remove oboslete function

* whisper : clear kv cache when using whisper_decode API

* whisper : speed-up sampling

* whisper : fix decoders initializer

* bench : add batch size 5 bench

* whisper : add comment about the KV cache size

* whisper : add check for max number of decoders

* whisper : avoid starting sampling threads with bs=1

* whisper : enable beam-search by default

* cuda : sync llama.cpp fixes
josharian added a commit to josharian/whisper.cpp that referenced this pull request Mar 10, 2024
As of ggerganov#1486, whisper.cpp uses a unified KV cache with KQ masking.
As a result, depending on their location in the batch,
identical sequences in a batch can have slightly different outputs
due to floating point rounding errors during reduction.
See the discussion in ggerganov#1941 for more details.

The beam search code used "has identical sum of log probabilities"
as a shorthand for "is an identical token sequence". However, per above,
identical tokens do not necessarily result in identical probabilities.

Instead, explicitly compare on sequences.
This is linear in cost when they are identical,
but the lengths are always small and the comparisons are cheap.

This increases diversity during beam search.

This improves output quality for some short samples I've been working
with, at no detectable performance cost.
I haven't checked against larger corpuses.

Fixes ggerganov#1941
ggerganov pushed a commit that referenced this pull request Mar 10, 2024
As of #1486, whisper.cpp uses a unified KV cache with KQ masking.
As a result, depending on their location in the batch,
identical sequences in a batch can have slightly different outputs
due to floating point rounding errors during reduction.
See the discussion in #1941 for more details.

The beam search code used "has identical sum of log probabilities"
as a shorthand for "is an identical token sequence". However, per above,
identical tokens do not necessarily result in identical probabilities.

Instead, explicitly compare on sequences.
This is linear in cost when they are identical,
but the lengths are always small and the comparisons are cheap.

This increases diversity during beam search.

This improves output quality for some short samples I've been working
with, at no detectable performance cost.
I haven't checked against larger corpuses.

Fixes #1941
jiahansu pushed a commit to WiseSync/whisper.cpp that referenced this pull request Apr 17, 2024
As of ggerganov#1486, whisper.cpp uses a unified KV cache with KQ masking.
As a result, depending on their location in the batch,
identical sequences in a batch can have slightly different outputs
due to floating point rounding errors during reduction.
See the discussion in ggerganov#1941 for more details.

The beam search code used "has identical sum of log probabilities"
as a shorthand for "is an identical token sequence". However, per above,
identical tokens do not necessarily result in identical probabilities.

Instead, explicitly compare on sequences.
This is linear in cost when they are identical,
but the lengths are always small and the comparisons are cheap.

This increases diversity during beam search.

This improves output quality for some short samples I've been working
with, at no detectable performance cost.
I haven't checked against larger corpuses.

Fixes ggerganov#1941
viktor-silakov pushed a commit to viktor-silakov/whisper_node_mic.cpp that referenced this pull request May 11, 2024
As of ggerganov#1486, whisper.cpp uses a unified KV cache with KQ masking.
As a result, depending on their location in the batch,
identical sequences in a batch can have slightly different outputs
due to floating point rounding errors during reduction.
See the discussion in ggerganov#1941 for more details.

The beam search code used "has identical sum of log probabilities"
as a shorthand for "is an identical token sequence". However, per above,
identical tokens do not necessarily result in identical probabilities.

Instead, explicitly compare on sequences.
This is linear in cost when they are identical,
but the lengths are always small and the comparisons are cheap.

This increases diversity during beam search.

This improves output quality for some short samples I've been working
with, at no detectable performance cost.
I haven't checked against larger corpuses.

Fixes ggerganov#1941
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
* whisper : add whisper_batch

* whisper : move kv_self to whisper_state

* whisper : full batched decoding support

* whisper : fix memory leak in whisper_batch

* whisper : fix mem leak again + remove oboslete function

* whisper : clear kv cache when using whisper_decode API

* whisper : speed-up sampling

* whisper : fix decoders initializer

* bench : add batch size 5 bench

* whisper : add comment about the KV cache size

* whisper : add check for max number of decoders

* whisper : avoid starting sampling threads with bs=1

* whisper : enable beam-search by default

* cuda : sync llama.cpp fixes
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
As of ggerganov#1486, whisper.cpp uses a unified KV cache with KQ masking.
As a result, depending on their location in the batch,
identical sequences in a batch can have slightly different outputs
due to floating point rounding errors during reduction.
See the discussion in ggerganov#1941 for more details.

The beam search code used "has identical sum of log probabilities"
as a shorthand for "is an identical token sequence". However, per above,
identical tokens do not necessarily result in identical probabilities.

Instead, explicitly compare on sequences.
This is linear in cost when they are identical,
but the lengths are always small and the comparisons are cheap.

This increases diversity during beam search.

This improves output quality for some short samples I've been working
with, at no detectable performance cost.
I haven't checked against larger corpuses.

Fixes ggerganov#1941
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
As of ggerganov#1486, whisper.cpp uses a unified KV cache with KQ masking.
As a result, depending on their location in the batch,
identical sequences in a batch can have slightly different outputs
due to floating point rounding errors during reduction.
See the discussion in ggerganov#1941 for more details.

The beam search code used "has identical sum of log probabilities"
as a shorthand for "is an identical token sequence". However, per above,
identical tokens do not necessarily result in identical probabilities.

Instead, explicitly compare on sequences.
This is linear in cost when they are identical,
but the lengths are always small and the comparisons are cheap.

This increases diversity during beam search.

This improves output quality for some short samples I've been working
with, at no detectable performance cost.
I haven't checked against larger corpuses.

Fixes ggerganov#1941
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants