Optimizing Transcription Costs for Long Audio File with Deepgram Python SDK #1019
-
Hello everyone, I’m currently working on transcribing long audio files (.mp3) into text (.txt) files with timestamps using the deepgram-sdk (version 2.12.0) in Python. Here’s a snippet of the code I’m using: response = await dg_client.transcription.prerecorded(
{
'buffer': open(mp3_file_path, 'rb'),
'mimetype': 'audio/mp3'
},
{
'punctuate': True,
'language': 'en',
'model': 'general',
'tier': 'enhanced',
'utterances': True
}
)
# Extract and format transcript with timestamps
transcript_lines = []
for utterance in response['results']['utterances']:
start_time = format_time(utterance['start'])
end_time = format_time(utterance['end'])
text = utterance['transcript']
transcript_lines.append(f"{start_time} - {end_time}: {text}")
# Save the transcript locally
transcript_content = "\n".join(transcript_lines)
with open(transcript_file, "w") as f:
f.write(transcript_content) Where it outputs .txt file formatted like this: However, I am dealing with long audio files (up to 10 ~ 20 hours), where approximately 60% of the content is idle time (no sound or speech). Transcribing the entire file as-is is costly due to the idle periods. So my Question is: |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
I'm sorry, but as an AI, I don't have the ability to directly access or manipulate any databases, including Deepgram's SDK. However, I can suggest a potential solution based on the information you've provided. Deepgram's SDK does not currently have a built-in feature to skip silent portions of audio files. However, you can implement this functionality yourself by preprocessing the audio files to remove silence before sending them to Deepgram for transcription. There are several libraries available in Python that can help with this, such as pydub or librosa. Here's a simple example of how you might do this using pydub: from pydub import AudioSegment
from pydub.silence import split_on_silence
# Load your audio file
sound = AudioSegment.from_mp3(mp3_file_path)
# Split on silence
chunks = split_on_silence(sound, min_silence_len=1000, silence_thresh=-40)
# Concatenate the chunks back together, skipping the silence
non_silent_audio = sum(chunks)
# Save the result
non_silent_audio.export("non_silent.mp3", format="mp3") In this example, Please note that this is a simple example and might not work perfectly for your use case. You might need to adjust the parameters or use a more sophisticated method depending on the characteristics of your audio files. Once you have removed the silence, you can then proceed to transcribe the non-silent audio using Deepgram's SDK as you are currently doing.
|
Beta Was this translation helpful? Give feedback.
I'm sorry, but as an AI, I don't have the ability to directly access or manipulate any databases, including Deepgram's SDK. However, I can suggest a potential solution based on the information you've provided.
Deepgram's SDK does not currently have a built-in feature to skip silent portions of audio files. However, you can implement this functionality yourself by preprocessing the audio files to remove silence before sending them to Deepgram for transcription. There are several libraries available in Python that can help with this, such as pydub or librosa.
Here's a simple example of how you might do this using pydub: