Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcription error related to BatchedInferencePipeline and numpy #1102

Closed
shkstar opened this issue Oct 29, 2024 · 3 comments
Closed

Transcription error related to BatchedInferencePipeline and numpy #1102

shkstar opened this issue Oct 29, 2024 · 3 comments

Comments

@shkstar
Copy link

shkstar commented Oct 29, 2024

I am using BatchedInferencePipeline of faster whisper in Google Colab by

! pip install --force-reinstall "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/refs/heads/master.tar.gz"
! pip install ctranslate2==4.4.0

Today when I execute the transcription it showed below error msg:

[/usr/local/lib/python3.10/dist-packages/faster_whisper/transcribe.py](https://eq31t7k3e4m-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20241025-060057_RC00_689738598#) in transcribe(self, audio, language, task, beam_size, best_of, patience, length_penalty, repetition_penalty, no_repeat_ngram_size, temperature, compression_ratio_threshold, log_prob_threshold, log_prob_low_threshold, no_speech_threshold, condition_on_previous_text, prompt_reset_on_temperature, initial_prompt, prefix, suppress_blank, suppress_tokens, without_timestamps, max_initial_timestamp, word_timestamps, prepend_punctuations, append_punctuations, multilingual, output_language, vad_filter, vad_parameters, max_new_tokens, chunk_length, clip_timestamps, hallucination_silence_threshold, hotwords, language_detection_threshold, language_detection_segments)
    758             audio = torch.from_numpy(audio)
    759         elif not isinstance(audio, torch.Tensor):
--> 760             audio = decode_audio(audio, sampling_rate=sampling_rate)
    761 
    762         duration = audio.shape[0] / sampling_rate

[/usr/local/lib/python3.10/dist-packages/faster_whisper/audio.py](https://eq31t7k3e4m-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20241025-060057_RC00_689738598#) in decode_audio(input_file, sampling_rate, split_stereo)
     75         return torch.from_numpy(left_channel), torch.from_numpy(right_channel)
     76 
---> 77     return torch.from_numpy(audio)
     78 
     79 

TypeError: expected np.ndarray (got numpy.ndarray)

May I ask what is the problem and how to solve? It is weird that it used to work without problems.

@MahmoudAshraf97
Copy link
Collaborator

please use a debugger and check the value of audio or upload the audio file here

@shkstar
Copy link
Author

shkstar commented Oct 30, 2024

I turn youtube video to wav using ! pip install yt-

Type of video_path_local: <class 'str'>
File exists: True
File size: 22384718
Error processing 9ez8lm9I26Y.wav: expected np.ndarray (got numpy.ndarray)

https://app.box.com/s/okmln29g34hdkbsn5r8no7gbg0orb8ny

@MahmoudAshraf97
Copy link
Collaborator

Should be solved after #1106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants