This project is for the voice transcription task. It define the Transcription Pipeline class, to transcribe the audio file into text.
The prerequisites are listed in the requirements.txt file. You can install them by:
pip install -r requirements.txt
You can also running a container with the dockerfile. To build the image, run:
docker build -t transcription-pipeline .
To run the container with a volume, run:
docker run -it -v /path/to/audio:/audio transcription-pipeline
To run the container with a volume and GPU, run:
docker run -it --gpus all -v /path/to/audio:/audio transcription-pipeline
To use the Transcription Pipeline, you can run the following command:
python transcription_pipeline.py --audio_path /path/to/audio --engine whisper
The output will be saved in the same directory as the audio file, with the same name as the audio file, but with a .txt extension.
The output file will be formated as follows:
{
"audio_path": "/path/to/audio",
"engine": "whisper",
"language": "en",
"transcription": "This is the transcription of the audio file"
}
- Define Transcription Pipeline class
- Implement with Whisper
- Implement with Google Cloud Speech-to-Text API
- Implement with OpenAI API
- Add tests
- Add support for other languages
Sebastián Ignacio Bórquez González
This project is licensed under the MIT License - see the LICENSE.md file