Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MKL build issue by correctly finding and linking MKL libraries #2272

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

green-dalii
Copy link

Fix MKL Build Issue

Description

This pull request fixes an issue with building whisper.cpp using Intel MKL. The changes ensure that CMake correctly finds and links the MKL libraries.

Changes Made

  • Added find_package(MKL REQUIRED) to locate MKL.
  • Set MKL_INCLUDE_DIRS and MKL_LIBRARIES to ensure proper include and link paths.
  • Included MKL headers and libraries only if WHISPER_MKL is enabled.

Testing

Successfully built whisper.cpp with the MKL option enabled on an Intel Xeon CPU without GPU.

Related Issues

Fixes # (if applicable)

@lukaskwkw
Copy link

I was able to build this either by this way #2295 (comment)
or following yours but with slight change like:

if (WHISPER_MKL)
    find_package(MKL REQUIRED)
    set(MKL_INCLUDE_DIRS "${MKLROOT}/include")
    set(MKL_LIBRARIES "${MKLROOT}/lib")
endif()

if (WHISPER_MKL)
    include_directories(${MKL_INCLUDE_DIRS})
    link_directories(${MKL_LIBRARIES})
    target_link_libraries(whisper PRIVATE  "D:/actuall/path/to/mkl_rt.lib")
endif()

on Windows 10 with i5-10400F CPU. Even with successful build I haven't notice any difference in speed. Have you?

@green-dalii
Copy link
Author

I was able to build this either by this way #2295 (comment) or following yours but with slight change like:

if (WHISPER_MKL)
    find_package(MKL REQUIRED)
    set(MKL_INCLUDE_DIRS "${MKLROOT}/include")
    set(MKL_LIBRARIES "${MKLROOT}/lib")
endif()

if (WHISPER_MKL)
    include_directories(${MKL_INCLUDE_DIRS})
    link_directories(${MKL_LIBRARIES})
    target_link_libraries(whisper PRIVATE  "D:/actuall/path/to/mkl_rt.lib")
endif()

on Windows 10 with i5-10400F CPU. Even with successful build I haven't notice any difference in speed. Have you?

I tested this on Linux, so I don't know the difference on Windows. You can check the output of whisper.cpp.

@thewh1teagle
Copy link
Contributor

Works for me too on Windows 11. but it's slow, OpenCL 10 times faster...

Remove-Item -Recurse -Force build
cmake -B build . -DWHISPER_MKL=ON
cmake --build build -j8
Copy-Item "C:\Program Files (x86)\Intel\oneAPI\mkl\2024.2\bin\*.dll" -Destination "build\bin\Debug"
wget.exe -nc "https://github.com/thewh1teagle/vibe/raw/main/samples/multi.wav"
.\build\bin\Debug\main.exe -m "c:\Users\User\AppData\Local\github.com.thewh1teagle.vibe\ggml-medium.bin" -f multi.wav

@Just-Explode
Copy link

Just-Explode commented Aug 17, 2024

I tried this modification, but doesn't change much (30s) (other test is in here. so far only OpenBLAS make significant speedup. am i doing anything wrong?)

also here's my linux modification

if (WHISPER_MKL)
    find_package(MKL REQUIRED)
    set(MKL_INCLUDE_DIRS "${MKLROOT}/include")
    set(MKL_LIBRARIES "${MKLROOT}/lib")
endif()

if (WHISPER_MKL)
    include_directories(${MKL_INCLUDE_DIRS})
    link_directories(${MKL_LIBRARIES})
    target_link_libraries(whisper PRIVATE  "/opt/intel/oneapi/mkl/2024.2/lib/libmkl_rt.so")
endif()
$ ./main -m /mnt/Deb-Data/ggml-medium.bin -f /mnt/Deb-Data/whisper.cpp/samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from '/mnt/Deb-Data/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  594.09 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  141.96 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '/mnt/Deb-Data/whisper.cpp/samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =  1064.36 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    30.41 ms
whisper_print_timings:   sample time =    98.56 ms /   140 runs (    0.70 ms per run)
whisper_print_timings:   encode time = 23189.34 ms /     1 runs (23189.34 ms per run)
whisper_print_timings:   decode time =   136.07 ms /     2 runs (   68.04 ms per run)
whisper_print_timings:   batchd time =  4325.21 ms /   136 runs (   31.80 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 29034.84 ms
$ ./main -m /mnt/Deb-Data/ggml-medium.bin -f /mnt/Deb-Data/whisper.cpp/samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from '/mnt/Deb-Data/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  594.09 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  141.96 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '/mnt/Deb-Data/whisper.cpp/samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =  1072.18 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    20.21 ms
whisper_print_timings:   sample time =  3702.51 ms /   140 runs (   26.45 ms per run)
whisper_print_timings:   encode time = 28507.66 ms /     1 runs (28507.66 ms per run)
whisper_print_timings:   decode time =   148.92 ms /     2 runs (   74.46 ms per run)
whisper_print_timings:   batchd time =  4012.05 ms /   136 runs (   29.50 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 37604.63 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants