whisper : add batched decoding #1486

ggerganov · 2023-11-14T08:25:26Z

Description

This PR implements efficient batched decoding. With CUDA, the speed with 5 beams is the same as with 1 beam so there is likely no reason to ever use 1 beam. With Metal, using more than 1 beam results in some slowdown since the Metal kernels do not scale as well with the batch size. Still, it is much faster compared to what was on master and the improved transcription quality might be worth it.

Also, this PR:

enables beam search with 5 beams by default
GPU usage can be completely disabled for CUDA builds using CUDA_VISIBLE_DEVICES=-1
add batched decoding speed to the bench tool
bench tool now displays time/tok
multi-threaded sampling and logits processing
temperature step is now 0.2f instead of 0.4f

Tests

V100 bs=1 `master`

WHISPER_CUBLAS=1 make -j && ./main -m ./models-mnt/ggml-large-v2.bin -f ./samples/gb0.wav -bs 1

system_info: n_threads = 4 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/gb0.wav' (2037760 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.240]   Good morning. This Tuesday is election day.
[00:00:03.240 --> 00:00:06.000]   After months of spirited debate and vigorous campaigning,
[00:00:06.000 --> 00:00:08.640]   the time has come for Americans to make important decisions
[00:00:08.640 --> 00:00:10.120]   about our nation's future.
[00:00:10.120 --> 00:00:13.760]   I encourage all Americans to go to the polls and vote.
[00:00:13.760 --> 00:00:16.120]   Election season brings out the spirit of competition
[00:00:16.120 --> 00:00:18.080]   between our political parties,
[00:00:18.080 --> 00:00:20.280]   and that competition is an essential part
[00:00:20.280 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:26.000]   Republicans, Democrats, and Independents
[00:00:26.000 --> 00:00:29.140]   can find common ground on at least one point,
[00:00:29.140 --> 00:00:31.560]   our system of representative democracy
[00:00:31.560 --> 00:00:34.440]   is one of America's greatest strengths.
[00:00:34.440 --> 00:00:36.280]   The United States was founded on the belief
[00:00:36.280 --> 00:00:38.280]   that all men are created equal.
[00:00:38.280 --> 00:00:41.440]   Every election day, millions of Americans of all races,
[00:00:41.440 --> 00:00:43.920]   religions, and backgrounds step into voting booths
[00:00:43.920 --> 00:00:45.320]   throughout the nation.
[00:00:45.320 --> 00:00:47.780]   Whether they are rich or poor, old or young,
[00:00:47.780 --> 00:00:50.680]   each of them has an equal share in choosing the path
[00:00:50.680 --> 00:00:52.440]   that our country will take.
[00:00:52.440 --> 00:00:54.880]   And every ballot they cast is a reminder
[00:00:54.880 --> 00:00:58.300]   that our founding principles are alive and well.
[00:00:58.300 --> 00:00:59.760]   Voting is one of the great privileges
[00:00:59.760 --> 00:01:01.760]   of American citizenship,
[00:01:01.760 --> 00:01:04.520]   and it has always required brave defenders.
[00:01:04.520 --> 00:01:06.040]   As you head to the polls next week,
[00:01:06.040 --> 00:01:08.420]   remember the sacrifices that have been made
[00:01:08.420 --> 00:01:11.040]   by generations of Americans in uniform
[00:01:11.040 --> 00:01:13.000]   to preserve our way of life.
[00:01:13.000 --> 00:01:14.840]   From Bunker Hill to Baghdad,
[00:01:14.840 --> 00:01:16.720]   the men and women of American Armed Forces
[00:01:16.720 --> 00:01:19.960]   have been devoted guardians of our democracy.
[00:01:19.960 --> 00:01:21.800]   All of us owe them and their families
[00:01:21.800 --> 00:01:25.240]   a special debt of gratitude on Election Day.
[00:01:25.240 --> 00:01:27.560]   Americans should also remember the important example
[00:01:27.560 --> 00:01:30.080]   that our election set throughout the world.
[00:01:30.080 --> 00:01:32.100]   Young democracies from Georgia and Ukraine
[00:01:32.100 --> 00:01:34.560]   to Afghanistan and Iraq can look to the United States
[00:01:34.560 --> 00:01:37.560]   for proof that self-government can endure,
[00:01:37.560 --> 00:01:40.440]   and nations that still live under tyranny and oppression
[00:01:40.440 --> 00:01:44.120]   can find hope and inspiration in our commitment to liberty.
[00:01:44.120 --> 00:01:45.200]   For more than two centuries,
[00:01:45.200 --> 00:01:47.800]   Americans have demonstrated the ability of free people
[00:01:47.800 --> 00:01:49.640]   to choose their own leaders.
[00:01:49.640 --> 00:01:51.920]   Our nation has flourished because of its commitment
[00:01:51.920 --> 00:01:54.680]   to trusting the wisdom of our citizenry.
[00:01:54.680 --> 00:01:56.120]   In this year's election,
[00:01:56.120 --> 00:01:58.480]   we will see this tradition continue,
[00:01:58.480 --> 00:02:00.320]   and we will be reminded once again
[00:02:00.320 --> 00:02:02.680]   that we are blessed to live in a free nation
[00:02:02.680 --> 00:02:05.560]   guided by the will of the people.
[00:02:05.560 --> 00:02:06.760]   Thank you for listening.


whisper_print_timings:     load time =  1413.19 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   146.51 ms
whisper_print_timings:   sample time =   356.33 ms /   548 runs (    0.65 ms per run)
whisper_print_timings:   encode time =   892.13 ms /     5 runs (  178.43 ms per run)
whisper_print_timings:   decode time =  8836.33 ms /   543 runs (   16.27 ms per run)
whisper_print_timings:   prompt time =   266.43 ms /     5 runs (   53.29 ms per run)
whisper_print_timings:    total time = 11927.28 ms

V100 bs=5 `master`

WHISPER_CUBLAS=1 make -j && ./main -m ./models-mnt/ggml-large-v2.bin -f ./samples/gb0.wav -bs 5

system_info: n_threads = 4 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/gb0.wav' (2037760 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.260]   Good morning. This Tuesday is election day.
[00:00:03.260 --> 00:00:06.020]   After months of spirited debate and vigorous campaigning,
[00:00:06.020 --> 00:00:08.640]   the time has come for Americans to make important decisions
[00:00:08.640 --> 00:00:10.120]   about our nation's future.
[00:00:10.120 --> 00:00:13.760]   I encourage all Americans to go to the polls and vote.
[00:00:13.760 --> 00:00:16.120]   Election season brings out the spirit of competition
[00:00:16.120 --> 00:00:18.080]   between our political parties.
[00:00:18.080 --> 00:00:20.260]   And that competition is an essential part
[00:00:20.260 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:26.000]   Republicans, Democrats, and Independents
[00:00:26.000 --> 00:00:29.140]   can find common ground on at least one point.
[00:00:29.140 --> 00:00:31.560]   Our system of representative democracy
[00:00:31.560 --> 00:00:34.440]   is one of America's greatest strengths.
[00:00:34.440 --> 00:00:36.280]   The United States was founded on the belief
[00:00:36.280 --> 00:00:38.260]   that all men are created equal.
[00:00:38.260 --> 00:00:40.760]   Every election day, millions of Americans
[00:00:40.760 --> 00:00:42.680]   of all races, religions, and backgrounds
[00:00:42.680 --> 00:00:45.280]   step into voting booths throughout the nation.
[00:00:45.280 --> 00:00:47.760]   Whether they are rich or poor, old or young,
[00:00:47.760 --> 00:00:50.720]   each of them has an equal share in choosing the path
[00:00:50.720 --> 00:00:52.440]   that our country will take.
[00:00:52.440 --> 00:00:54.920]   And every ballot they cast is a reminder
[00:00:54.920 --> 00:00:58.280]   that our founding principles are alive and well.
[00:00:58.280 --> 00:00:59.780]   Voting is one of the great privileges
[00:00:59.780 --> 00:01:01.760]   of American citizenship.
[00:01:01.760 --> 00:01:04.520]   And it has always required brave defenders.
[00:01:04.520 --> 00:01:06.040]   As you head to the polls next week,
[00:01:06.040 --> 00:01:08.440]   remember the sacrifices that have been made
[00:01:08.440 --> 00:01:11.020]   by generations of Americans in uniform
[00:01:11.020 --> 00:01:12.980]   to preserve our way of life.
[00:01:12.980 --> 00:01:14.820]   From Bunker Hill to Baghdad,
[00:01:14.820 --> 00:01:16.740]   the men and women of American Armed Forces
[00:01:16.740 --> 00:01:19.940]   have been devoted guardians of our democracy.
[00:01:19.940 --> 00:01:21.780]   All of us owe them and their families
[00:01:21.780 --> 00:01:24.260]   a special debt of gratitude on election day.
[00:01:24.260 --> 00:01:27.540]   Americans should also remember the important example
[00:01:27.540 --> 00:01:30.060]   that our election set throughout the world.
[00:01:30.060 --> 00:01:32.100]   Young democracies from Georgia and Ukraine
[00:01:32.100 --> 00:01:34.580]   to Afghanistan and Iraq can look to the United States
[00:01:34.580 --> 00:01:37.520]   for proof that self-government can endure.
[00:01:37.520 --> 00:01:40.400]   And nations that still live under tyranny and oppression
[00:01:40.400 --> 00:01:44.080]   can find hope and inspiration in our commitment to liberty.
[00:01:44.080 --> 00:01:45.200]   For more than two centuries,
[00:01:45.200 --> 00:01:47.820]   Americans have demonstrated the ability of free people
[00:01:47.820 --> 00:01:49.600]   to choose their own leaders.
[00:01:49.600 --> 00:01:51.900]   Our nation has flourished because of its commitment
[00:01:51.900 --> 00:01:54.620]   to trusting the wisdom of our citizenry.
[00:01:54.620 --> 00:01:56.060]   In this year's election,
[00:01:56.060 --> 00:01:58.440]   we will see this tradition continue.
[00:01:58.440 --> 00:02:00.260]   And we will be reminded once again
[00:02:00.260 --> 00:02:02.620]   that we are blessed to live in a free nation
[00:02:02.620 --> 00:02:05.500]   guided by the will of the people.
[00:02:05.500 --> 00:02:06.700]   Thank you for listening.


whisper_print_timings:     load time =  1480.84 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   141.40 ms
whisper_print_timings:   sample time = 13516.33 ms /  2744 runs (    4.93 ms per run)
whisper_print_timings:   encode time =   898.57 ms /     5 runs (  179.71 ms per run)
whisper_print_timings:   decode time = 43944.78 ms /  2719 runs (   16.16 ms per run)
whisper_print_timings:   prompt time =   265.78 ms /     5 runs (   53.16 ms per run)
whisper_print_timings:    total time = 60280.66 ms

V100 bs=5 `batched`

WHISPER_CUBLAS=1 make -j && ./main -m ./models-mnt/ggml-large-v2.bin -f ./samples/gb0.wav -bs 5 -t 6

system_info: n_threads = 6 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/gb0.wav' (2037760 samples, 127.4 sec), 6 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.240]   Good morning. This Tuesday is election day.
[00:00:03.240 --> 00:00:06.000]   After months of spirited debate and vigorous campaigning,
[00:00:06.000 --> 00:00:08.640]   the time has come for Americans to make important decisions
[00:00:08.640 --> 00:00:10.120]   about our nation's future.
[00:00:10.120 --> 00:00:13.740]   I encourage all Americans to go to the polls and vote.
[00:00:13.740 --> 00:00:16.120]   Election season brings out the spirit of competition
[00:00:16.120 --> 00:00:18.080]   between our political parties.
[00:00:18.080 --> 00:00:20.260]   And that competition is an essential part
[00:00:20.260 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:26.000]   Republicans, Democrats, and Independents
[00:00:26.000 --> 00:00:29.140]   can find common ground on at least one point.
[00:00:29.140 --> 00:00:31.560]   Our system of representative democracy
[00:00:31.560 --> 00:00:34.440]   is one of America's greatest strengths.
[00:00:34.440 --> 00:00:36.240]   The United States was founded on the belief
[00:00:36.240 --> 00:00:38.280]   that all men are created equal.
[00:00:38.280 --> 00:00:41.440]   Every election day, millions of Americans of all races,
[00:00:41.440 --> 00:00:43.920]   religions, and backgrounds step into voting booths
[00:00:43.920 --> 00:00:45.320]   throughout the nation.
[00:00:45.320 --> 00:00:47.780]   Whether they are rich or poor, old or young,
[00:00:47.780 --> 00:00:50.680]   each of them has an equal share in choosing the path
[00:00:50.680 --> 00:00:52.420]   that our country will take.
[00:00:52.420 --> 00:00:54.900]   And every ballot they cast is a reminder
[00:00:54.900 --> 00:00:58.300]   that our founding principles are alive and well.
[00:00:58.300 --> 00:00:59.760]   Voting is one of the great privileges
[00:00:59.760 --> 00:01:01.760]   of American citizenship.
[00:01:01.760 --> 00:01:04.540]   And it has always required brave defenders.
[00:01:04.540 --> 00:01:06.040]   As you head to the polls next week,
[00:01:06.040 --> 00:01:08.400]   remember the sacrifices that have been made
[00:01:08.400 --> 00:01:11.040]   by generations of Americans in uniform
[00:01:11.040 --> 00:01:13.000]   to preserve our way of life.
[00:01:13.000 --> 00:01:14.840]   From Bunker Hill to Baghdad,
[00:01:14.840 --> 00:01:16.720]   the men and women of American Armed Forces
[00:01:16.720 --> 00:01:19.940]   have been devoted guardians of our democracy.
[00:01:19.940 --> 00:01:21.820]   All of us owe them and their families
[00:01:21.820 --> 00:01:25.260]   a special debt of gratitude on Election Day.
[00:01:25.260 --> 00:01:27.560]   Americans should also remember the important example
[00:01:27.560 --> 00:01:30.060]   that our election set throughout the world.
[00:01:30.060 --> 00:01:32.100]   Young democracies from Georgia and Ukraine
[00:01:32.100 --> 00:01:34.560]   to Afghanistan and Iraq can look to the United States
[00:01:34.560 --> 00:01:37.540]   for proof that self-government can endure.
[00:01:37.540 --> 00:01:40.400]   And nations that still live under tyranny and oppression
[00:01:40.400 --> 00:01:44.100]   can find hope and inspiration in our commitment to liberty.
[00:01:44.100 --> 00:01:45.240]   For more than two centuries,
[00:01:45.240 --> 00:01:47.140]   Americans have demonstrated the ability
[00:01:47.140 --> 00:01:49.620]   of free people to choose their own leaders.
[00:01:49.620 --> 00:01:51.920]   Our nation has flourished because of its commitment
[00:01:51.920 --> 00:01:54.660]   to trusting the wisdom of our citizenry.
[00:01:54.660 --> 00:01:56.080]   In this year's election,
[00:01:56.080 --> 00:01:58.460]   we will see this tradition continue.
[00:01:58.460 --> 00:02:00.260]   And we will be reminded once again
[00:02:00.260 --> 00:02:02.620]   that we are blessed to live in a free nation
[00:02:02.620 --> 00:02:05.500]   guided by the will of the people.
[00:02:05.500 --> 00:02:06.760]   Thank you for listening.


whisper_print_timings:     load time =  1402.55 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   114.69 ms
whisper_print_timings:   sample time =  1066.75 ms /  2739 runs (    0.39 ms per run)
whisper_print_timings:   encode time =   892.24 ms /     5 runs (  178.45 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   prompt time =  8071.69 ms /   548 runs (   14.73 ms per run)
whisper_print_timings:    total time = 11570.88 ms

In this case, we observe beam search with 5 beams is the same speed as 1 beam

Benches

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
V100	AVX2 BLAS CUDA	tiny	1	9.00	1.80	0.36	0.02	`ae1bd69`
V100	AVX2 BLAS CUDA	tiny-q5_0	1	8.53	1.51	0.37	0.02	`ae1bd69`
V100	AVX2 BLAS CUDA	tiny-q5_1	1	8.52	1.38	0.31	0.02	`ae1bd69`
V100	AVX2 BLAS CUDA	base	1	14.89	2.56	0.53	0.03	`ae1bd69`
V100	AVX2 BLAS CUDA	base-q5_0	1	15.16	1.81	0.46	0.03	`ae1bd69`
V100	AVX2 BLAS CUDA	base-q5_1	1	15.18	1.85	0.42	0.03	`ae1bd69`
V100	AVX2 BLAS CUDA	small	1	40.56	5.00	1.01	0.05	`ae1bd69`
V100	AVX2 BLAS CUDA	small-q5_0	1	41.49	3.47	0.99	0.05	`ae1bd69`
V100	AVX2 BLAS CUDA	small-q5_1	1	41.32	3.39	0.86	0.05	`ae1bd69`
V100	AVX2 BLAS CUDA	medium	1	105.38	10.39	1.88	0.11	`ae1bd69`
V100	AVX2 BLAS CUDA	medium-q5_0	1	107.73	6.56	2.17	0.12	`ae1bd69`
V100	AVX2 BLAS CUDA	medium-q5_1	1	107.83	6.58	1.85	0.12	`ae1bd69`
V100	AVX2 BLAS CUDA	large	1	172.64	15.81	2.70	0.17	`ae1bd69`
V100	AVX2 BLAS CUDA	large-q5_0	1	177.72	9.38	3.25	0.19	`ae1bd69`
V100	AVX2 BLAS CUDA	large-q5_1	1	177.50	8.95	2.66	0.19	`ae1bd69`

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	NEON BLAS METAL	tiny	1	12.66	1.43	0.50	0.01	`ae1bd69`
M2 Ultra	NEON BLAS METAL	tiny-q5_0	1	10.87	1.39	0.53	0.01	`ae1bd69`
M2 Ultra	NEON BLAS METAL	tiny-q5_1	1	10.97	1.40	0.52	0.01	`ae1bd69`
M2 Ultra	NEON BLAS METAL	base	1	18.55	2.02	0.77	0.02	`ae1bd69`
M2 Ultra	NEON BLAS METAL	base-q5_0	1	21.15	1.98	0.82	0.02	`ae1bd69`
M2 Ultra	NEON BLAS METAL	base-q5_1	1	20.59	1.96	0.82	0.02	`ae1bd69`
M2 Ultra	NEON BLAS METAL	small	1	51.49	4.01	1.73	0.05	`ae1bd69`
M2 Ultra	NEON BLAS METAL	small-q5_0	1	56.95	4.12	1.88	0.06	`ae1bd69`
M2 Ultra	NEON BLAS METAL	small-q5_1	1	56.95	4.12	1.85	0.06	`ae1bd69`
M2 Ultra	NEON BLAS METAL	medium	1	140.89	8.46	3.96	0.12	`ae1bd69`
M2 Ultra	NEON BLAS METAL	medium-q5_0	1	160.02	8.29	4.12	0.14	`ae1bd69`
M2 Ultra	NEON BLAS METAL	medium-q5_1	1	159.64	8.42	4.15	0.14	`ae1bd69`
M2 Ultra	NEON BLAS METAL	large	1	247.73	12.00	5.94	0.22	`ae1bd69`
M2 Ultra	NEON BLAS METAL	large-q5_0	1	286.62	12.03	6.59	0.26	`ae1bd69`
M2 Ultra	NEON BLAS METAL	large-q5_1	1	286.16	11.88	6.36	0.26	`ae1bd69`

ggerganov · 2023-11-14T21:01:52Z

This should be ready to merge - will do so tomorrow

bobqianic · 2023-11-14T23:59:47Z

This should be ready to merge - will do so tomorrow

During testing, I encountered a specific error which exhibits a 100% reproducibility rate.
whisper_full_with_state: failed to decode

Audio: L'Océan et l'Humanité, destins liés ! _ Lamya Essemlali _ TEDxOrléans (320 kbps).zip

root@imperial-88694897d9-ee131614:~/whisper.cpp-ae1bd690419032c95406940c8533a905cb1ae026# ./main -m ./ggml-large-v2.bin -f "../L’Océan et l’Humanité, destins liés ! _ Lamya Essemlali _ TEDxOrléans (320 kbps).wav" -bs 5 -l auto
whisper_init_from_file_with_params_no_state: loading model from './ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla V100-PCIE-32GB, compute capability 7.0
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  2973.08 MB
whisper_model_load: model size    = 2972.62 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  140.00 MB
whisper_init_state: kv cross size =  234.38 MB
whisper_init_state: compute buffer (conv)   =   29.49 MB
whisper_init_state: compute buffer (encode) =  202.52 MB
whisper_init_state: compute buffer (cross)  =    8.89 MB
whisper_init_state: compute buffer (decode) =   94.57 MB

system_info: n_threads = 4 / 52 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing '../L’Océan et l’Humanité, destins liés ! _ Lamya Essemlali _ TEDxOrléans (320 kbps).wav' (12786022 samples, 799.1 sec), 4 threads, 1 processors, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: fr (p = 0.989031)

[00:00:00.000 --> 00:00:13.820]   [battements de coeur]
[00:00:13.820 --> 00:00:18.200]   Bonsoir.
[00:00:18.200 --> 00:00:24.480]   Notre planète est recouverte à 70 % d'océans,
[00:00:24.480 --> 00:00:28.400]   et pourtant, étrangement, on a choisi de l'appeler la Terre.
[00:00:29.600 --> 00:00:32.400]   Le poète Edcott Williams a une vision bien plus objective
[00:00:32.400 --> 00:00:34.160]   et moins anthropocentrique,
[00:00:34.160 --> 00:00:36.880]   quand il dit que vu de l'espace, la planète est bleue,
[00:00:36.880 --> 00:00:41.640]   vu de l'espace, elle est le territoire non pas des hommes, mais des baleines.
[00:00:41.640 --> 00:00:47.040]   Et pourtant, on vient tous de l'océan, c'est le berceau de la vie,
[00:00:47.040 --> 00:00:48.960]   même si on l'a oublié.
[00:00:48.960 --> 00:00:51.920]   L'océan est partout, dans les glaciers, dans les rivières,
[00:00:51.920 --> 00:00:55.520]   dans les nappes phréatiques, dans les cellules des êtres vivants,
[00:00:55.520 --> 00:00:57.160]   et dans nos veines.
whisper_full_with_state: failed to decode
./main: failed to process audio

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-PCIE-32GB           On  | 00000000:65:01.0 Off |                  Off |
| N/A   31C    P0              27W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

ggerganov · 2023-11-15T09:11:23Z

@bobqianic Thanks for the feedback. If increase the factor to 5 here, does it fix the issue?

whisper.cpp/whisper.cpp

Lines 3043 to 3046 in 6c8a003

    
           // at this point, we don't know yet how many decoders will be used, so we overallocate 3x ctx 
        
           // in theory, there can be a case where this is not enough, but in practice it should always be enough 
        
           const int factor = 3;

Edit: I just noticed you weren't using the latest version of this branch, so you had factor = 2 instead of factor = 3.
With commit 6c8a003, the issue should be fixed

bobqianic · 2023-11-15T13:31:59Z

@bobqianic Thanks for the feedback. If increase the factor to 5 here, does it fix the issue?

whisper.cpp/whisper.cpp

Lines 3043 to 3046 in 6c8a003

// at this point, we don't know yet how many decoders will be used, so we overallocate 3x ctx

// in theory, there can be a case where this is not enough, but in practice it should always be enough

const int factor = 3;

Edit: I just noticed you weren't using the latest version of this branch, so you had factor = 2 instead of factor = 3. With commit 6c8a003, the issue should be fixed

Sorry for the late reply, I was busy with some lab work this morning. Just checked, and yeah, you're right. The test version I used had factor = 2. Just now, I tested 4c245ea on both Windows and Linux. Looks like there's still something not quite right. Like, the transcription runs fine on Windows but the output looks weird. See this ├¿. On Linux, it's not even running, just keeps looping and doesn't output anything.

whisper_full_with_state: auto-detected language: fr (p = 0.988806)

[00:00:00.000 --> 00:00:03.000]   (Tic tac de l'escalier)
[00:00:03.000 --> 00:00:06.200]   (Tic tac de l'escalier)
[00:00:06.200 --> 00:00:09.400]   (Tic tac de l'escalier)
[00:00:09.400 --> 00:00:12.600]   (Tic tac de l'escalier)
[00:00:12.600 --> 00:00:15.800]   (Tic tac de l'escalier)
[00:00:15.800 --> 00:00:18.200]   Bonsoir.
[00:00:18.200 --> 00:00:24.600]   Notre plan├¿te est recouverte ├á 70 % d'oc├⌐ans,
[00:00:24.600 --> 00:00:28.600]   et pourtant, ├⌐trangement, on a choisi de l'appeler la Terre.
[00:00:29.600 --> 00:00:32.400]   Le po├¿te Hedcote Williams a une vision bien plus objective
[00:00:32.400 --> 00:00:34.200]   et moins anthropocentrique,
[00:00:34.200 --> 00:00:36.800]   quand il dit que, vu de l'espace, la plan├¿te est bleue,
[00:00:36.800 --> 00:00:40.200]   vu de l'espace, elle est le territoire non pas des hommes,
[00:00:40.200 --> 00:00:41.600]   mais des baleines.
[00:00:41.600 --> 00:00:45.600]   Et pourtant, on vient tous de l'oc├⌐an,
[00:00:45.600 --> 00:00:48.800]   c'est le berceau de la vie, m├¬me si on l'a oubli├⌐.
[00:00:48.800 --> 00:00:51.800]   L'oc├⌐an est partout, dans les glaciers, dans les rivi├¿res,
[00:00:51.800 --> 00:00:55.400]   dans les nappes phr├⌐atiques, dans les cellules des ├¬tres vivants,
[00:00:55.400 --> 00:00:57.000]   et dans nos veines.

ggerganov · 2023-11-15T13:40:24Z

Are you using large v2?

bobqianic · 2023-11-15T13:41:35Z

Are you using large v2?

Yes

ggerganov · 2023-11-15T13:43:42Z

Which command are you using? It works on my end:

WHISPER_CUBLAS=1 make -j && ./main -m ./models-mnt/ggml-large-v2.bin -f ./samples/fr0.wav -bs 5 -t 6 -l auto

I whisper.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS:  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC:       cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX:      g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c whisper.cpp -o whisper.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o main -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/bench/bench.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o bench -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/quantize/quantize.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o quantize -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
./main -h

usage: ./main [options] file0.wav file1.wav ...

options:
  -h,        --help              [default] show this help message and exit
  -t N,      --threads N         [4      ] number of threads to use during computation
  -p N,      --processors N      [1      ] number of processors to use during computation
  -ot N,     --offset-t N        [0      ] time offset in milliseconds
  -on N,     --offset-n N        [0      ] segment index offset
  -d  N,     --duration N        [0      ] duration of audio to process in milliseconds
  -mc N,     --max-context N     [-1     ] maximum number of text context tokens to store
  -ml N,     --max-len N         [0      ] maximum segment length in characters
  -sow,      --split-on-word     [false  ] split on word rather than on token
  -bo N,     --best-of N         [5      ] number of best candidates to keep
  -bs N,     --beam-size N       [5      ] beam size for beam search
  -wt N,     --word-thold N      [0.01   ] word timestamp probability threshold
  -et N,     --entropy-thold N   [2.40   ] entropy threshold for decoder fail
  -lpt N,    --logprob-thold N   [-1.00  ] log probability threshold for decoder fail
  -debug,    --debug-mode        [false  ] enable debug mode (eg. dump log_mel)
  -tr,       --translate         [false  ] translate from source language to english
  -di,       --diarize           [false  ] stereo audio diarization
  -tdrz,     --tinydiarize       [false  ] enable tinydiarize (requires a tdrz model)
  -nf,       --no-fallback       [false  ] do not use temperature fallback while decoding
  -otxt,     --output-txt        [false  ] output result in a text file
  -ovtt,     --output-vtt        [false  ] output result in a vtt file
  -osrt,     --output-srt        [false  ] output result in a srt file
  -olrc,     --output-lrc        [false  ] output result in a lrc file
  -owts,     --output-words      [false  ] output script for generating karaoke video
  -fp,       --font-path         [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
  -ocsv,     --output-csv        [false  ] output result in a CSV file
  -oj,       --output-json       [false  ] output result in a JSON file
  -ojf,      --output-json-full  [false  ] include more information in the JSON file
  -of FNAME, --output-file FNAME [       ] output file path (without file extension)
  -ps,       --print-special     [false  ] print special tokens
  -pc,       --print-colors      [false  ] print colors
  -pp,       --print-progress    [false  ] print progress
  -nt,       --no-timestamps     [false  ] do not print timestamps
  -l LANG,   --language LANG     [en     ] spoken language ('auto' for auto-detect)
  -dl,       --detect-language   [false  ] exit after automatically detecting language
             --prompt PROMPT     [       ] initial prompt
  -m FNAME,  --model FNAME       [models/ggml-base.en.bin] model path
  -f FNAME,  --file FNAME        [       ] input WAV file path
  -oved D,   --ov-e-device DNAME [CPU    ] the OpenVINO device used for encode inference
  -ls,       --log-score         [false  ] log best decoder scores of tokens
  -ng,       --no-gpu            [false  ] disable GPU

whisper_init_from_file_with_params_no_state: loading model from './models-mnt/ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla V100-PCIE-16GB, compute capability 7.0
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  2973.08 MB
whisper_model_load: model size    = 2972.62 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  210.00 MB
whisper_init_state: kv cross size =  234.38 MB
whisper_init_state: compute buffer (conv)   =   29.49 MB
whisper_init_state: compute buffer (encode) =  202.52 MB
whisper_init_state: compute buffer (cross)  =    8.89 MB
whisper_init_state: compute buffer (decode) =   94.57 MB

system_info: n_threads = 6 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/fr0.wav' (12786022 samples, 799.1 sec), 6 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: fr (p = 0.989031)

[00:00:00.000 --> 00:00:13.820]   [battements de coeur]
[00:00:13.820 --> 00:00:18.200]   Bonsoir.
[00:00:18.200 --> 00:00:24.480]   Notre planète est recouverte à 70 % d'océans,
[00:00:24.480 --> 00:00:28.400]   et pourtant, étrangement, on a choisi de l'appeler la Terre.
[00:00:29.600 --> 00:00:32.400]   Le poète Edcott Williams a une vision bien plus objective
[00:00:32.400 --> 00:00:34.160]   et moins anthropocentrique,
[00:00:34.160 --> 00:00:36.880]   quand il dit que vu de l'espace, la planète est bleue,
[00:00:36.880 --> 00:00:41.640]   vu de l'espace, elle est le territoire non pas des hommes, mais des baleines.
[00:00:41.640 --> 00:00:47.040]   Et pourtant, on vient tous de l'océan, c'est le berceau de la vie,
[00:00:47.040 --> 00:00:48.960]   même si on l'a oublié.
[00:00:48.960 --> 00:00:51.920]   L'océan est partout, dans les glaciers, dans les rivières,
[00:00:51.920 --> 00:00:55.520]   dans les nappes phréatiques, dans les cellules des êtres vivants,
[00:00:55.520 --> 00:00:57.160]   et dans nos veines.
[00:00:58.760 --> 00:01:01.520]   Etrangement, c'est John Fitzgerald Kennedy
[00:01:01.520 --> 00:01:04.440]   qui l'a assez bien illustré dans cette citation.
[00:01:04.440 --> 00:01:06.960]   "Il est un fait biologique intéressant que chacun d'entre nous
[00:01:06.960 --> 00:01:09.880]   "ait dans les veines un pourcentage identique de sel dans le sang
[00:01:09.880 --> 00:01:11.880]   "à celui qui existe dans les océans.
[00:01:11.880 --> 00:01:14.280]   "Nous avons donc tous du sel dans notre sang,
[00:01:14.280 --> 00:01:16.120]   "dans notre sueur, dans nos larmes.
[00:01:16.120 --> 00:01:19.080]   "Nous sommes liés à l'océan, et quand nous retournons à la mer,
[00:01:19.080 --> 00:01:22.240]   "que ce soit pour naviguer ou pour la regarder,
[00:01:22.240 --> 00:01:24.680]   "nous retournons d'où nous venons."
[00:01:26.200 --> 00:01:29.400]   Et pourtant, cet océan, on le connaît très, très mal.
[00:01:29.400 --> 00:01:33.560]   Ça reste un monde assez étrange et étranger,

ggerganov · 2023-11-15T13:53:45Z

Try 270b1e4 and let me know if the issue is resolved.

bobqianic · 2023-11-15T14:07:35Z

Try 270b1e4 and let me know if the issue is resolved.

It works!

Edit: The strange characters mentioned earlier are likely caused by Windows not setting the terminal to UTF-8.

Linux

root@imperial-5fcb458f92-7e2f426a:~/whisper.cpp-270b1e48dbdcb68679b86ccf073455c506907809# ./main -m ../ggml-large-v2.bin -f ../testfrench.wav -bs 5 -l auto
whisper_init_from_file_with_params_no_state: loading model from '../ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla V100-PCIE-32GB, compute capability 7.0
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  2973.08 MB
whisper_model_load: model size    = 2972.62 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  210.00 MB
whisper_init_state: kv cross size =  234.38 MB
whisper_init_state: compute buffer (conv)   =   29.49 MB
whisper_init_state: compute buffer (encode) =  202.52 MB
whisper_init_state: compute buffer (cross)  =    8.89 MB
whisper_init_state: compute buffer (decode) =   94.57 MB

system_info: n_threads = 4 / 52 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing '../testfrench.wav' (12786022 samples, 799.1 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: fr (p = 0.989031)

[00:00:00.000 --> 00:00:13.820]   [battements de coeur]
[00:00:13.820 --> 00:00:18.200]   Bonsoir.
[00:00:18.200 --> 00:00:24.480]   Notre planète est recouverte à 70 % d'océans,
[00:00:24.480 --> 00:00:28.400]   et pourtant, étrangement, on a choisi de l'appeler la Terre.
[00:00:29.600 --> 00:00:32.400]   Le poète Edcott Williams a une vision bien plus objective
[00:00:32.400 --> 00:00:34.160]   et moins anthropocentrique,
[00:00:34.160 --> 00:00:36.880]   quand il dit que vu de l'espace, la planète est bleue,
[00:00:36.880 --> 00:00:41.640]   vu de l'espace, elle est le territoire non pas des hommes, mais des baleines.
[00:00:41.640 --> 00:00:47.040]   Et pourtant, on vient tous de l'océan, c'est le berceau de la vie,
[00:00:47.040 --> 00:00:48.960]   même si on l'a oublié.
[00:00:48.960 --> 00:00:51.920]   L'océan est partout, dans les glaciers, dans les rivières,
[00:00:51.920 --> 00:00:55.520]   dans les nappes phréatiques, dans les cellules des êtres vivants,
[00:00:55.520 --> 00:00:57.160]   et dans nos veines.
[00:00:58.760 --> 00:01:01.520]   Etrangement, c'est John Fitzgerald Kennedy
[00:01:01.520 --> 00:01:04.440]   qui l'a assez bien illustré dans cette citation.
[00:01:04.440 --> 00:01:06.960]   "Il est un fait biologique intéressant que chacun d'entre nous
[00:01:06.960 --> 00:01:09.880]   "ait dans les veines un pourcentage identique de sel dans le sang
[00:01:09.880 --> 00:01:11.880]   "à celui qui existe dans les océans.
[00:01:11.880 --> 00:01:14.280]   "Nous avons donc tous du sel dans notre sang,
[00:01:14.280 --> 00:01:16.120]   "dans notre sueur, dans nos larmes.
[00:01:16.120 --> 00:01:19.080]   "Nous sommes liés à l'océan, et quand nous retournons à la mer,
[00:01:19.080 --> 00:01:22.240]   "que ce soit pour naviguer ou pour la regarder,
[00:01:22.240 --> 00:01:24.680]   "nous retournons d'où nous venons."
[00:01:26.200 --> 00:01:29.400]   Et pourtant, cet océan, on le connaît très, très mal.
[00:01:29.400 --> 00:01:33.560]   Ça reste un monde assez étrange et étranger,
[00:01:33.560 --> 00:01:36.760]   et qui fait peur, parfois.
[00:01:36.760 --> 00:01:45.680]   Cela dit, en fait, c'est loin d'être une masse d'eau inerte,
[00:01:45.680 --> 00:01:51.640]   une étendue d'eau qui ne sert à rien, en fait.
[00:01:51.840 --> 00:01:56.400]   C'est la vie marine qui fait que cet océan
[00:01:56.400 --> 00:02:00.520]   est un principal régulateur du climat,
[00:02:00.520 --> 00:02:03.360]   le principal puits de carbone,
[00:02:03.360 --> 00:02:11.000]   et aussi le principal producteur d'oxygène.
[00:02:11.000 --> 00:02:15.440]   Entre 50 et 70 % de l'oxygène vient de l'océan.
[00:02:15.440 --> 00:02:19.480]   En fait, plus d'une inspiration sur deux que vous prenez,
[00:02:19.480 --> 00:02:21.600]   vous la devez à l'océan.
[00:02:21.800 --> 00:02:25.160]   Pas tout à fait à l'océan, vous la devez à la vie marine.
[00:02:25.160 --> 00:02:29.200]   La vie marine, c'est elle qui permet à cette masse d'eau, justement,
[00:02:29.200 --> 00:02:32.040]   de ne pas être juste une masse d'eau inerte,
[00:02:32.040 --> 00:02:35.080]   mais d'être cette machinerie qui nous permet de vivre.
[00:02:35.080 --> 00:02:38.120]   Dans la vie marine, ça part du phytoplankton,
[00:02:38.120 --> 00:02:41.720]   qui lui fournit l'oxygène, jusqu'aux grandes baleines,
[00:02:41.720 --> 00:02:47.080]   en passant par les grands prédateurs, les thons, les dauphins.

* whisper : add whisper_batch * whisper : move kv_self to whisper_state * whisper : full batched decoding support * whisper : fix memory leak in whisper_batch * whisper : fix mem leak again + remove oboslete function * whisper : clear kv cache when using whisper_decode API * whisper : speed-up sampling * whisper : fix decoders initializer * bench : add batch size 5 bench * whisper : add comment about the KV cache size * whisper : add check for max number of decoders * whisper : avoid starting sampling threads with bs=1 * whisper : enable beam-search by default * cuda : sync llama.cpp fixes

As of ggerganov#1486, whisper.cpp uses a unified KV cache with KQ masking. As a result, depending on their location in the batch, identical sequences in a batch can have slightly different outputs due to floating point rounding errors during reduction. See the discussion in ggerganov#1941 for more details. The beam search code used "has identical sum of log probabilities" as a shorthand for "is an identical token sequence". However, per above, identical tokens do not necessarily result in identical probabilities. Instead, explicitly compare on sequences. This is linear in cost when they are identical, but the lengths are always small and the comparisons are cheap. This increases diversity during beam search. This improves output quality for some short samples I've been working with, at no detectable performance cost. I haven't checked against larger corpuses. Fixes ggerganov#1941

As of #1486, whisper.cpp uses a unified KV cache with KQ masking. As a result, depending on their location in the batch, identical sequences in a batch can have slightly different outputs due to floating point rounding errors during reduction. See the discussion in #1941 for more details. The beam search code used "has identical sum of log probabilities" as a shorthand for "is an identical token sequence". However, per above, identical tokens do not necessarily result in identical probabilities. Instead, explicitly compare on sequences. This is linear in cost when they are identical, but the lengths are always small and the comparisons are cheap. This increases diversity during beam search. This improves output quality for some short samples I've been working with, at no detectable performance cost. I haven't checked against larger corpuses. Fixes #1941

As of ggerganov#1486, whisper.cpp uses a unified KV cache with KQ masking. As a result, depending on their location in the batch, identical sequences in a batch can have slightly different outputs due to floating point rounding errors during reduction. See the discussion in ggerganov#1941 for more details. The beam search code used "has identical sum of log probabilities" as a shorthand for "is an identical token sequence". However, per above, identical tokens do not necessarily result in identical probabilities. Instead, explicitly compare on sequences. This is linear in cost when they are identical, but the lengths are always small and the comparisons are cheap. This increases diversity during beam search. This improves output quality for some short samples I've been working with, at no detectable performance cost. I haven't checked against larger corpuses. Fixes ggerganov#1941

* whisper : add whisper_batch * whisper : move kv_self to whisper_state * whisper : full batched decoding support * whisper : fix memory leak in whisper_batch * whisper : fix mem leak again + remove oboslete function * whisper : clear kv cache when using whisper_decode API * whisper : speed-up sampling * whisper : fix decoders initializer * bench : add batch size 5 bench * whisper : add comment about the KV cache size * whisper : add check for max number of decoders * whisper : avoid starting sampling threads with bs=1 * whisper : enable beam-search by default * cuda : sync llama.cpp fixes

As of ggerganov#1486, whisper.cpp uses a unified KV cache with KQ masking. As a result, depending on their location in the batch, identical sequences in a batch can have slightly different outputs due to floating point rounding errors during reduction. See the discussion in ggerganov#1941 for more details. The beam search code used "has identical sum of log probabilities" as a shorthand for "is an identical token sequence". However, per above, identical tokens do not necessarily result in identical probabilities. Instead, explicitly compare on sequences. This is linear in cost when they are identical, but the lengths are always small and the comparisons are cheap. This increases diversity during beam search. This improves output quality for some short samples I've been working with, at no detectable performance cost. I haven't checked against larger corpuses. Fixes ggerganov#1941

ggerganov added 5 commits November 14, 2023 10:24

whisper : add whisper_batch

3cbaaed

whisper : move kv_self to whisper_state

8b943f9

whisper : full batched decoding support

91096da

whisper : fix memory leak in whisper_batch

3d24e35

whisper : fix mem leak again + remove oboslete function

b2123cb

ggerganov marked this pull request as ready for review November 14, 2023 17:12

ggerganov added 5 commits November 14, 2023 21:04

whisper : clear kv cache when using whisper_decode API

d776035

whisper : speed-up sampling

9006946

whisper : fix decoders initializer

3ed9af3

bench : add batch size 5 bench

ae1bd69

whisper : add comment about the KV cache size

6c8a003

ggerganov added 2 commits November 15, 2023 13:28

whisper : add check for max number of decoders

820f458

whisper : avoid starting sampling threads with bs=1

4c245ea

whisper : enable beam-search by default

b7c82a3

cuda : sync llama.cpp fixes

270b1e4

ggerganov merged commit b6c5f49 into master Nov 15, 2023
70 of 74 checks passed

bobqianic mentioned this pull request Nov 15, 2023

Set default temperature_inc to 0.2f #1462

Closed

BBC-Esq mentioned this pull request Mar 4, 2024

other backends such as whisper.cpp? shashikg/WhisperS2T#33

Open

josharian mentioned this pull request Mar 10, 2024

whisper : improve beam search candidate diversity #1947

Merged

Linux13524 mentioned this pull request May 2, 2024

CPU Performance Regression? (Older version much faster) #2099

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper : add batched decoding #1486

whisper : add batched decoding #1486

ggerganov commented Nov 14, 2023 •

edited

Loading

ggerganov commented Nov 14, 2023

bobqianic commented Nov 14, 2023 •

edited

Loading

ggerganov commented Nov 15, 2023 •

edited

Loading

bobqianic commented Nov 15, 2023

ggerganov commented Nov 15, 2023

bobqianic commented Nov 15, 2023

ggerganov commented Nov 15, 2023 •

edited

Loading

ggerganov commented Nov 15, 2023

bobqianic commented Nov 15, 2023 •

edited

Loading

whisper : add batched decoding #1486

whisper : add batched decoding #1486

Conversation

ggerganov commented Nov 14, 2023 • edited Loading

Description

Tests

Benches

ggerganov commented Nov 14, 2023

bobqianic commented Nov 14, 2023 • edited Loading

ggerganov commented Nov 15, 2023 • edited Loading

bobqianic commented Nov 15, 2023

ggerganov commented Nov 15, 2023

bobqianic commented Nov 15, 2023

ggerganov commented Nov 15, 2023 • edited Loading

ggerganov commented Nov 15, 2023

bobqianic commented Nov 15, 2023 • edited Loading

ggerganov commented Nov 14, 2023 •

edited

Loading

bobqianic commented Nov 14, 2023 •

edited

Loading

ggerganov commented Nov 15, 2023 •

edited

Loading

ggerganov commented Nov 15, 2023 •

edited

Loading

bobqianic commented Nov 15, 2023 •

edited

Loading