Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead. #510

Open
erkaink opened this issue Sep 6, 2024 · 0 comments

Comments

@erkaink
Copy link

erkaink commented Sep 6, 2024

Hello, I am getting the error below and I can't find a solution. Does anyone have an idea of ​​what I should do? I asked ChatGPT, I tried making the input sound file Stereo, making it Mono, etc. but it still didn't work. Thanks in advance.

----@---- seamless_communication % m4t_predict input/speech.mp3 --task S2ST --tgt_lang FRA --output_path /Users/username/seamless_communication/output/compl.mp3
2024-09-07 01:34:47,221 INFO -- seamless_communication.cli.m4t.predict.predict: Running inference on device=device(type='cpu') with dtype=torch.float32.
Using the cached checkpoint of seamlessM4T_v2_large. Set force to True to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Using the cached checkpoint of vocoder_v2. Set force to True to download again.
/opt/homebrew/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
WeightNorm.apply(module, name, dim)
2024-09-07 01:35:09,103 INFO -- seamless_communication.cli.m4t.predict.predict: text_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(1, 200), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(25, 50), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2024-09-07 01:35:09,105 INFO -- seamless_communication.cli.m4t.predict.predict: unit_generation_ngram_filtering=False
2024-09-07 01:35:09,141 WARNING -- seamless_communication.inference.translator: Transposing audio tensor from (bsz, seq_len) -> (seq_len, bsz).
Traceback (most recent call last):
File "/opt/homebrew/bin/m4t_predict", line 8, in
sys.exit(main())
^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/seamless_communication/cli/m4t/predict/predict.py", line 235, in main
text_output, speech_output = translator.predict(
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/seamless_communication/inference/translator.py", line 293, in predict
src = self.collate(self.convert_to_fbank(decoded_audio))["fbank"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: The input waveform must be two dimensional, but has 102528 dimension(s) instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant