Real-time Speech To Text - Always Send Finalize Response #1035

QuinnDamerell · 2024-12-28T17:28:27Z

QuinnDamerell
Dec 28, 2024

Hey!

I have a feature request for the real-time speech-to-text web socket API. I'm working on a system that streams audio from a user and converts it to text. Sometimes, the audio messages are long (30 seconds—1 minute), but sometimes, they are as short as one word. I stream the audio in real time, meaning I get back chunks of the transcription. When the audio clip is done, I send the finalized message to ensure the audio stream is flushed.

My problem is that sending the finalized message doesn't always return a finalized response. From looking at the docs, this looks like it's expected. If the finalize message results in more transcript text coming back, the Is From Finalize flag is set.

In my system, I want to return the text as fast as possible. However, since I don't always get a finalized message response, I don't know if the text I already have is all of it or if a finalized response is coming with more text. So right now, I either have to return what I have or wait an arbitrary amount of time to ensure there's no final text coming back.

In summary, it would be fantastic if the server always acked for the finalized message so the client can know for sure that all audio has been flushed and the final text has been returned.

Thanks!

Answered by deepgram-community[bot]

Dec 31, 2024

Thanks for the details. One thing you can try is using CloseStream instead of Finalize to close your audio stream, you'll get back a confirmation message every time you do this. Finalize is only for mid-stream finalization , like if you have your own client side VAD.

Check out our Docs on this topic and let me know if this helps your situation:
https://developers.deepgram.com/docs/close-stream

This message was sent by John Vajda from Deepgram, via our community automation.

View full answer

2024-12-28T17:28:28Z

deepgram-community[bot]
bot Dec 28, 2024

Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently.
_{Consider joining our Discord community for more opportunity to engage with your fellow Deepgram users. You can earn points which can be redeemed for cool stuff by being active in our communities!}

0 replies

2024-12-28T17:28:39Z

deepgram-community[bot]
bot Dec 28, 2024

Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion.

0 replies

QuinnDamerell · 2024-12-28T17:28:40Z

deepgram-community[bot]
bot Dec 28, 2024

It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?

The programming language you are working in (e.g. JavaScript, Python).
A request ID that triggered your error or issue.

1 reply

QuinnDamerell Dec 28, 2024
Author

I'm using the c# SDK; there's no one request ID in the piacular this applies to.

2024-12-31T00:25:28Z

deepgram-community[bot]
bot Dec 31, 2024

thanks for submitting this though I'm not sure if we'd implement this in our product, but I can leave it open to see what others think.

I think the key thing to remember here is what we call out in our Docs:

Upon receiving the Finalize message, the server will process all remaining audio data and return the final results. You may receive a response with the from_finalize attribute set to true, indicating that the finalization process is complete. This response typically occurs when there is a noticeable amount of audio buffered in the server.

So you won't get a response if there is NOT a noticeable amount of audio buffered in the server.

In Simpler Terms

When the server gets the "Finalize" message, it wraps up any remaining audio processing and sends the final transcription or results back. The server lets the client know that this is complete by sending a response with the from_finalize flag set to true. This often happens when there is a backlog of audio data that needs to be processed.

https://developers.deepgram.com/docs/finalize#what-is-the-finalize-message

This message was sent by John Vajda from Deepgram, via our community automation.

0 replies

QuinnDamerell · 2024-12-31T01:57:15Z

QuinnDamerell
Dec 31, 2024
Author

Thanks! Yeah, so my problem is I'm streaming audio up and randomly getting a text back, but there's no relationship I can observe between the amount of audio I stream up and the text I get back. So I have no way of knowing how much data is on the server unprocessed, so when I send the Finalize message, I don't know whether I will get a message back.

My goal is to return the full-text transcript as fast as possible when the audio is done streaming, but I have a problem because I don't know if I need to wait for the finalized message to flush out the remaining audio and return text or if I have all of the audio, and the finalized message will do nothing.

Thus, there's no 100% correct way to implement my system. The safe thing to do is send the fanzine after the audio stream is done and then wait an arbitrary amount of time, like 100ms. That at least allows me some time to get the final text from the finalize if it's coming back. But it seems that 80% of the time, the final message doesn't return audio for me, so 80% of the time, I'm wasting 100ms of my time. I'm using deepgram over Google, Amazon, etc because of it's latency, so 100ms is a lot for me!

Anyways, it would be great if this gets fixed, but no problem if not. I just wanted to clarify my issue and explain why it's impossible to implement my scenario well with the current system.

Thanks!

0 replies

2024-12-31T17:34:29Z

deepgram-community[bot]
bot Dec 31, 2024

Thanks for the details. One thing you can try is using CloseStream instead of Finalize to close your audio stream, you'll get back a confirmation message every time you do this. Finalize is only for mid-stream finalization , like if you have your own client side VAD.

Check out our Docs on this topic and let me know if this helps your situation:
https://developers.deepgram.com/docs/close-stream

This message was sent by John Vajda from Deepgram, via our community automation.

0 replies

QuinnDamerell · 2024-12-31T17:47:22Z

QuinnDamerell
Dec 31, 2024
Author

Oh, that’s a great tip! Originally I was going to keep the websockets open for multiple requests, but my request rate isn't very high, so for now I'm closing them. My main concern was the connection time, but the WS connects fast enough when the stream starts to be done before it ends.

Thanks for the help!

0 replies

2024-12-31T18:04:29Z

deepgram-community[bot]
bot Dec 31, 2024

Great to hear!

This message was sent by John Vajda from Deepgram, via our community automation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Real-time Speech To Text - Always Send Finalize Response #1035

{{title}}

Replies: 8 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deepgram

Real-time Speech To Text - Always Send Finalize Response #1035

QuinnDamerell Dec 28, 2024

Replies: 8 comments · 1 reply

deepgram-community[bot] bot Dec 28, 2024

deepgram-community[bot] bot Dec 28, 2024

deepgram-community[bot] bot Dec 28, 2024

QuinnDamerell Dec 28, 2024 Author

deepgram-community[bot] bot Dec 31, 2024

QuinnDamerell Dec 31, 2024 Author

deepgram-community[bot] bot Dec 31, 2024

QuinnDamerell Dec 31, 2024 Author

deepgram-community[bot] bot Dec 31, 2024

QuinnDamerell
Dec 28, 2024

Replies: 8 comments 1 reply

deepgram-community[bot]
bot Dec 28, 2024

deepgram-community[bot]
bot Dec 28, 2024

deepgram-community[bot]
bot Dec 28, 2024

QuinnDamerell Dec 28, 2024
Author

deepgram-community[bot]
bot Dec 31, 2024

QuinnDamerell
Dec 31, 2024
Author

deepgram-community[bot]
bot Dec 31, 2024

QuinnDamerell
Dec 31, 2024
Author

deepgram-community[bot]
bot Dec 31, 2024