Optimizing Transcription Costs for Long Audio File with Deepgram Python SDK #1019

kimjiwook0129 · 2024-12-06T00:17:03Z

kimjiwook0129
Dec 6, 2024

Hello everyone,

I’m currently working on transcribing long audio files (.mp3) into text (.txt) files with timestamps using the deepgram-sdk (version 2.12.0) in Python. Here’s a snippet of the code I’m using:

response = await dg_client.transcription.prerecorded(
    {
        'buffer': open(mp3_file_path, 'rb'), 
        'mimetype': 'audio/mp3'
    },
    {
        'punctuate': True,  
        'language': 'en',  
        'model': 'general',
        'tier': 'enhanced', 
        'utterances': True
    }
)

# Extract and format transcript with timestamps
transcript_lines = []
for utterance in response['results']['utterances']:
    start_time = format_time(utterance['start'])
    end_time = format_time(utterance['end'])
    text = utterance['transcript']
    transcript_lines.append(f"{start_time} - {end_time}: {text}")

# Save the transcript locally
transcript_content = "\n".join(transcript_lines)
with open(transcript_file, "w") as f:
    f.write(transcript_content)

Where it outputs .txt file formatted like this:
00:01:26 - 00:01:27: Hello, hello.
00:01:30 - 00:01:32: I am looking for some help on this issue.
00:02:37 - 00:02:38: With Deepgram.
...

However, I am dealing with long audio files (up to 10 ~ 20 hours), where approximately 60% of the content is idle time (no sound or speech). Transcribing the entire file as-is is costly due to the idle periods.

So my Question is:
Is there a parameter or method in Deepgram's SDK that skips idle audio segments during transcription? My goal is to reduce the transcription cost by avoiding the processing of silent portions.

Answered by deepgram-community[bot]

Dec 6, 2024

I'm sorry, but as an AI, I don't have the ability to directly access or manipulate any databases, including Deepgram's SDK. However, I can suggest a potential solution based on the information you've provided.

Deepgram's SDK does not currently have a built-in feature to skip silent portions of audio files. However, you can implement this functionality yourself by preprocessing the audio files to remove silence before sending them to Deepgram for transcription. There are several libraries available in Python that can help with this, such as pydub or librosa.

Here's a simple example of how you might do this using pydub:

from pydub import AudioSegment
from pydub.silence import split_on_silence…

View full answer

2024-12-06T00:17:19Z

deepgram-community[bot]
bot Dec 6, 2024

Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion.

0 replies

kimjiwook0129 · 2024-12-06T00:17:20Z

deepgram-community[bot]
bot Dec 6, 2024

It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?

A request ID that triggered your error or issue.

1 reply

kimjiwook0129 Dec 6, 2024
Author

Request ID not applicable

2024-12-06T00:23:11Z

deepgram-community[bot]
bot Dec 6, 2024

I'm sorry, but as an AI, I don't have the ability to directly access or manipulate any databases, including Deepgram's SDK. However, I can suggest a potential solution based on the information you've provided.

Deepgram's SDK does not currently have a built-in feature to skip silent portions of audio files. However, you can implement this functionality yourself by preprocessing the audio files to remove silence before sending them to Deepgram for transcription. There are several libraries available in Python that can help with this, such as pydub or librosa.

Here's a simple example of how you might do this using pydub:

from pydub import AudioSegment
from pydub.silence import split_on_silence

# Load your audio file
sound = AudioSegment.from_mp3(mp3_file_path)

# Split on silence
chunks = split_on_silence(sound, min_silence_len=1000, silence_thresh=-40)

# Concatenate the chunks back together, skipping the silence
non_silent_audio = sum(chunks)

# Save the result
non_silent_audio.export("non_silent.mp3", format="mp3")

In this example, min_silence_len is the minimum length of silence to be considered silence (in milliseconds), and silence_thresh is the volume anything quieter than this will be considered silence.

Please note that this is a simple example and might not work perfectly for your use case. You might need to adjust the parameters or use a more sophisticated method depending on the characteristics of your audio files.

Once you have removed the silence, you can then proceed to transcribe the non-silent audio using Deepgram's SDK as you are currently doing.

John Vajda

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Optimizing Transcription Costs for Long Audio File with Deepgram Python SDK #1019

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deepgram

Optimizing Transcription Costs for Long Audio File with Deepgram Python SDK #1019

kimjiwook0129 Dec 6, 2024

Replies: 3 comments · 1 reply

deepgram-community[bot] bot Dec 6, 2024

deepgram-community[bot] bot Dec 6, 2024

kimjiwook0129 Dec 6, 2024 Author

deepgram-community[bot] bot Dec 6, 2024

kimjiwook0129
Dec 6, 2024

Replies: 3 comments 1 reply

deepgram-community[bot]
bot Dec 6, 2024

deepgram-community[bot]
bot Dec 6, 2024

kimjiwook0129 Dec 6, 2024
Author

deepgram-community[bot]
bot Dec 6, 2024