Help required in Deepgram Streaming Inference on Audio File #959
-
Hi We are already using deepgram and have it integrated in our app. Great Product btw. We want to get the transcription of an Audio File in streaming fashion using deepgram. But we have a confusion in what is the proper way to do that? We found this file in this repo from your documentation (https://github.com/deepgram/live-streaming-starter-kit/blob/main/test_suite.py) to stream audio and get transcription in streaming way. Also we notice a difference in the transcription (and hence WER) if we pass complete mp3 or if we pass 16k monochannel wave audio or if we pass using your stream file example or live streaming starter kit example or even when getting transcription through website or using api we see some differences. Basically we want to send the chunks of audio file to deepgram and then keep getting the output in streaming fashion. What is the correct and proper and best way to do that? Thank You |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi @rkchamp25, we offer two transcription modes: streaming and pre-recorded. You will want streaming transcription, which uses a websocket connection rather than a REST API. That is what the first example you linked uses (the live-streaming starter kit). The second example has a confusing name, but it is pre-recorded transcription (transcribes the whole file in one go, as you noted). Here is our doc for getting started with streaming transcription. You can also look at our websocket directory within our code examples. The live-streaming starter kit is also a great resource. |
Beta Was this translation helpful? Give feedback.
@rkchamp25, no, our streaming and pre-recorded endpoints are served by different models, and may have slightly different results. Pre-recorded transcription tends to be about 2% absolute lower word error rate (WER), since it has greater context. For instance, we've benchmarked our English transcription as 8.4% WER for pre-recorded audio, and 10.7% WER for streaming audio.