Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Not recording speaker in Windows #87

Open
Slepetys opened this issue Sep 18, 2024 · 1 comment
Open

[BUG] Not recording speaker in Windows #87

Slepetys opened this issue Sep 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Slepetys
Copy link

Slepetys commented Sep 18, 2024

Speech Translate is not recording the speaker in Windows 11
Speech Translate is properly recording, transcribing and translating when the input is set to Microphone, but when I change to Speaker, it gives an error: -9999 Unanticipated host error and does not starts the recording.

To Reproduce
Host API tried:

  • MME
  • Windows Direct Sound
  • Windows WASAPI

Speaker setting (all combinations):

  • ID 0,4 Microsoft Mapper output
  • ID 0,5 Echo cancelling speakerphone (Jabra device)

Screenshots
image

Log
2024-09-18 12:06:42.920 | ERROR | record.py:944 [Thread-50 (record_session)] - [Errno -9999] Unanticipated host error
Traceback (most recent call last):

File "D:\Codes_Projects\Python\Speech-Translate\speech_translate\utils\audio\record.py", line 651, in record_session

File "D:\Codes_Projects\Python\Speech-Translate.venv\Lib\site-packages\pyaudiowpatch_init_.py", line 801, in open

File "D:\Codes_Projects\Python\Speech-Translate.venv\Lib\site-packages\pyaudiowpatch_init_.py", line 467, in init

OSError: [Errno -9999] Unanticipated host error
2024-09-18 12:06:42.920 | ERROR | record.py:945 [Thread-50 (record_session)] - Error in record session

Desktop

  • OS: Windows 10
  • App Installation version: prebuilt CUDA version 1.3.10
  • App / Python version: 3.11
@Slepetys Slepetys added the bug Something isn't working label Sep 18, 2024
@Slepetys
Copy link
Author

Slepetys commented Sep 18, 2024

Additional info:

When I change the settings to HostAPI: MME and Speaker to the default speaker, I got an error in the log which looks like to have a different origin, most likely the audio stream can be captured but not processed:

log

2024-09-18 14:11:11.824 | INFO    | log.py:150 [MainThread] - Log cleared
2024-09-18 14:11:18.544 | DEBUG   | record.py:383 [Thread-94 (set_meter)] - Opening Speaker meter
2024-09-18 14:11:18.591 | DEBUG   | record.py:504 [Dummy-95] - Checking if webrtcvad is possible to use. You can ignore the error log if it fails!
2024-09-18 14:11:18.591 | DEBUG   | record.py:506 [Dummy-95] - Checking if silero is possible to use. You can ignore the error log if it fails!
2024-09-18 14:11:18.592 | ERROR   | record.py:518 [Dummy-95] - Input audio chunk is too short
Traceback (most recent call last):


  File "D:\Codes\_Projects\Python\Speech-Translate\speech_translate\ui\frame\setting\record.py", line 507, in stream_cb

  File "C:\Users\USERNAME\AppData\Local\Programs\Speech Translate\lib\speech_translate\assets\silero-vad\utils_vad.py", line 56, in __call__
    x, sr = self._validate_input(x, sr)
    │       │    │               │  └ 16000
    │       │    │               └ tensor([ 0.0921,  0.1457,  0.1715,  0.2127,  0.2518,  0.2576,  0.2480,  0.2433,
    │       │    │                          0.2564,  0.2626,  0.2543,  0.2321,  ...
    │       │    └ <function OnnxWrapper._validate_input at 0x000001F5EBECC220>
    │       └ <utils_vad.OnnxWrapper object at 0x000001F5EBA59150>
    └ tensor([ 0.0921,  0.1457,  0.1715,  0.2127,  0.2518,  0.2576,  0.2480,  0.2433,
               0.2564,  0.2626,  0.2543,  0.2321,  ...

  File "C:\Users\USERNAME\AppData\Local\Programs\Speech Translate\lib\speech_translate\assets\silero-vad\utils_vad.py", line 44, in _validate_input
    raise ValueError("Input audio chunk is too short")

ValueError: Input audio chunk is too short
2024-09-18 14:11:18.596 | ERROR   | record.py:533 [Dummy-95] - SileroVAD Error!
2024-09-18 14:11:18.596 | WARNING | record.py:535 [Dummy-95] - Not possible to use Silero VAD with the current device config! So it is now disabled

Settings
image

Using pyaudiowpatch to find the loopback speaker
I run the code below in order to find the default loopback speaker, which does not matches any of the options detected from Speech Translate.

import pyaudiowpatch as pyaudio
# Find default Microphone and Speakers:
p = pyaudio.PyAudio()
wasapi_info = p.get_host_api_info_by_type(pyaudio.paWASAPI)
default_speakers   = p.get_device_info_by_index(wasapi_info["defaultOutputDevice"])
default_microphone = p.get_device_info_by_index(wasapi_info["defaultInputDevice"])
if not default_speakers["isLoopbackDevice"]:
    for loopback in p.get_loopback_device_info_generator():
        """
        Try to find loopback device with same name(and [Loopback suffix]).
        Unfortunately, this is the most adequate way at the moment.
        """
        if default_speakers["name"] in loopback["name"]:
            default_speakers = loopback
            break
    else:
        print("Default loopback output device not found.\n\nRun `python -m pyaudiowpatch` to check available devices.\nExiting...\n")
        exit()
        
print(f"""
Input Microphone  : {default_microphone['name']}
Index             : {default_microphone['index']}
Input Channels    : {default_microphone['maxInputChannels']}
Input Latency     : {default_microphone['defaultLowInputLatency']} s
Input Latency(max): {default_microphone['defaultHighInputLatency']} s
Sample Rate       : {default_microphone['defaultSampleRate']} Hz
""")    

print(f"""
Loopback Speakers : {default_speakers['name']}
Index             : {default_speakers['index']}
Channels          : {default_speakers['maxInputChannels']}
Latency           : {default_speakers['defaultLowInputLatency']} s
Latency(max)      : {default_speakers['defaultHighInputLatency']} s
Sample Rate       : {default_speakers['defaultSampleRate']} Hz
""")

resulting in:

Input Microphone  : Echo Cancelling Speakerphone (Jabra SPEAK 510 USB)
Index             : 17
Input Channels    : 1
Input Latency     : 0.003 s
Input Latency(max): 0.01 s
Sample Rate       : 16000.0 Hz


Loopback Speakers : Echo Cancelling Speakerphone (Jabra SPEAK 510 USB) [Loopback]
Index             : 20
Channels          : 2
Latency           : 0.003 s
Latency(max)      : 0.01 s
Sample Rate       : 48000.0 Hz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant