V3 turbo aborts with error (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim) #2470

chnbr · 2024-10-10T05:48:33Z

I am experimenting with the new v3-turbo model (metal implementation):

On iPhone 12 mini after a few segments I get the error(s):

Execution of the command buffer was aborted due to an error during execution. Discarded (victim of GPU error/recovery) (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim)
ggml_metal_graph_compute: command buffer 0 failed with status 5
error: Discarded (victim of GPU error/recovery) (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim)

Any ideas what could cause the problem ? Bug ?
BTW: On macOS and iPad M1 the model is running fine !

The text was updated successfully, but these errors were encountered:

chnbr · 2024-10-10T06:04:58Z

Here is the full output from whisper before the error occurs:

whisper_init_from_file_with_params_no_state: loading model from '/var/mobile/Containers/.../ggml-large-v3-turbo-q5_0.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 8
whisper_model_load: qntvr         = 2
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:    Metal total size =   573.40 MB
whisper_model_load: model size    =  573.40 MB
whisper_backend_init_gpu: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A14 GPU
ggml_metal_init: loading '/var/containers/Bundle/Application/.../default.metallib'
ggml_metal_init: GPU name:   Apple A14 GPU
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  2863.32 MB
whisper_init_state: kv self size  =   10.49 MB
whisper_init_state: kv cross size =   31.46 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   37.67 MB
whisper_init_state: compute buffer (encode) =  212.29 MB
whisper_init_state: compute buffer (cross)  =    9.25 MB
whisper_init_state: compute buffer (decode) =  100.03 MB

chnbr · 2024-10-11T09:19:54Z

I can add further observations to the above problem with v3-turbo:

It is running reliable on M1 machines (tested on M1 Mac and M1 iPad)
It is running reliable on older iPhones, that fall back to CPU
It is failing on iPhone12mini (where it decides to use GPU) after few segments, sometimes it survives 30 segments, sometimes it fails at the first segment. The failures occur indeterministic, but at a certain time they occur.

I don't think it is a memory problem because until the error occurs there are more than 1,6MB free RAM available according to debugger.

Are there any tests I could perform to provide information about that. Please instruct me ! Thanks.

chnbr · 2024-10-14T20:12:57Z

Just before the encoder crash I get the following messages in debug mode:

ggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: src 0 (KQ_mask) of node KQ_mask (view) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
Execution of the command buffer was aborted due to an error during execution. Discarded (victim of GPU error/recovery) (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim)
ggml_metal_graph_compute: command buffer 0 failed with status 5
error: Discarded (victim of GPU error/recovery) (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim)
whisper_full_with_state: failed to encode

ggerganov · 2024-10-15T18:03:53Z

Hm, I am not sure what could be the issue here. I would need to take a deeper look, but cannot give you any timeframe. Hoping someone else digs into this as well and share some more information.

Btw, can your iPhone12mini run the small model for example? It is of similar size as v3-torbu

chnbr · 2024-10-15T18:16:46Z

Btw, can your iPhone12mini run the small model for example? It is of similar size as v3-torbu

Yes of course, it can even run the medium model without problems. Only the new turbo model shows these abort problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V3 turbo aborts with error (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim) #2470

V3 turbo aborts with error (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim) #2470

chnbr commented Oct 10, 2024

chnbr commented Oct 10, 2024

chnbr commented Oct 11, 2024 •

edited

Loading

chnbr commented Oct 14, 2024 •

edited

Loading

ggerganov commented Oct 15, 2024

chnbr commented Oct 15, 2024

V3 turbo aborts with error (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim) #2470

V3 turbo aborts with error (00000005:kIOGPUCommandBufferCallbackErrorInnocentVictim) #2470

Comments

chnbr commented Oct 10, 2024

chnbr commented Oct 10, 2024

chnbr commented Oct 11, 2024 • edited Loading

chnbr commented Oct 14, 2024 • edited Loading

ggerganov commented Oct 15, 2024

chnbr commented Oct 15, 2024

chnbr commented Oct 11, 2024 •

edited

Loading

chnbr commented Oct 14, 2024 •

edited

Loading