Releases · Dadangdut33/Speech-Translate

08 Nov 14:03

1.3.1

190c257

1.3.1 - Bug Fixes & Large-v3 Whisper Support

This release fixes some bugs and I also added large-v3 support for stable-whisper. Unfortunately, large-v3 is not supported for faster-whisper yet, so i will add it later when its supported/added.

What's Changed

Fix #49
Fix #50

Full Changelog: 1.3.0...1.3.1

Notes

Before downloading / installing please take a look at the wiki and read the getting started section.
If you previously installed speech translate as a module, you can update by doing pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --upgrade --force-reinstall
If you install from installer, you can download and launch the installer below to update
If you have any suggestions or found any bugs please feel free to open a disccussion or open an issue

Requirements

Compatible OS:

OS	Prebuilt binary	As a module
Windows	✔️	✔️
MacOS	❌	✔️
Linux	❌	✔️

* Python 3.8 or later (3.11 is recommended) for installation as module.

Speaker input only work on windows 8 and above.
Internet connection (for translation with API)
FFmpeg is required to be installed and added to the PATH environment variable. You can do it when prompted in the app, or you can download it here and add it to your path manually. Alternatively, you can also download and add it to path automatically by using the following commands:

# on Windows using powershell (Also included in the release page, and can be run by right clicking and selecting "Run with PowerShell")
# Must be run in an elevated PowerShell prompt (Run as administrator)
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # Optional: Needed to run a remote script the first time
& ([scriptblock]::Create(
     (New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/install_ffmpeg.ps1')
  )) -webdl

# on Windows using Winget (Default package manager for Windows 10 and above)
winget install --id=Gyan.FFmpeg  -e

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) to run each model. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the speed will be significantly faster and the required vram size will be reduced depending on the usage, for more information about this please visit faster-whisper repository

Assets 7

07 Nov 18:04

Dadangdut33

1.3.0

ccd7929

1.3.0 - UI Overhaul and New Backend

The 1.3.0 release is finally ready. This release fixes lots of bugs and improve the whole app by a lot. With this release, the backend is now using stable-whisper and should generate a more stable and improved results. The whole user interface has also been changed and improved, so now the user experience should be great.

For this release i also provided app installer instead of 7zip extractable .exe.

Before downloading / installing please take a look at the wiki and read the getting started section.

What's Changed

1.3.0 by @Dadangdut33 in #47, thanks to everyone that submit the bug reports and feature requests
Added word level transcription #10 thanks @MaxHaller91 for the request
Added file process indicator
Added color coded for accuracy
Added faster whisper
Added character limit #44 thanks @LearningJer for the request
Added ways to install ffmpeg inside the app
Added customizable output format #42 thanks @joebinglab for the request
Added refinement, alignment, and translation of result
Added ability to export record session with file like output
Added keyboard support for combobox
Added VAD option to record session
Added audiometer for record session indicator
Added ability to use either ndarray or temp file for record session
Added multiple whisper model to the translation engine combobox
Added copy to clipboard button #36 thanks @MirkoPMC for the request
Changed backend to stable whisper #27 thanks @k566o for the report
Changed vanilla logger to loguru
Changed subtitle window to use tkhtml label
Fixed wrong language code #34 thanks @SugarQuiet for the report
Fixed crash that might happen in record session #31 #40 thanks @yslion @FerriteGiant for the report
Fixed clearing on record session
Fixed subtitle windows dragging
Fixed device query #41 thanks @IcarusAegis for the report
Fixed filename mixed up #32 thanks @Corvalan for the report

Full Changelog: 1.2.3...1.3.0

Requirements

Compatible OS:

OS	Prebuilt binary	As a module
Windows	✔️	✔️
MacOS	❌	✔️
Linux	❌	✔️

* Python 3.8 or later (3.11 is recommended) for installation as module.

Speaker input only work on windows 8 and above.
Internet connection (for translation with API)
FFmpeg is required to be installed and added to the PATH environment variable. You can do it when prompted in the app, or you can download it here and add it to your path manually. Alternatively, you can also download and add it to path automatically by using the following commands:

# on Windows using powershell (Also included in the release page, and can be run by right clicking and selecting "Run with PowerShell")
# Must be run in an elevated PowerShell prompt (Run as administrator)
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # Optional: Needed to run a remote script the first time
& ([scriptblock]::Create(
     (New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/install_ffmpeg.ps1')
  )) -webdl

# on Windows using Winget (Default package manager for Windows 10 and above)
winget install --id=Gyan.FFmpeg  -e

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) to run each model. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

Contributors

yslion, FerriteGiant, and 9 other contributors

Assets 7

13 Apr 17:58

Dadangdut33

1.2.3

6437e18

1.2.3 - Bug Fix

Some quick bug fixes. Post an issue or discussion if you have any questions or found any bugs.

Requirements

FFmpeg is required to be installed and added to the PATH environment variable. You can download it here and add it to your path manually OR you can do it automatically using the following commands:

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Whisper uses vram/gpu to process the audio, so it is recommended to have a CUDA compatible GPU. If there is no compatible GPU, the application will use the CPU to process the audio (This might make it slower). For each model requirement you can check directly at the whisper repository or you can hover over the model selection in the app (there will be a tooltip about the model info).
Speaker input only work on windows 8 and above.

What's Changed

Fix error if no input device is present #30
Added checkbutton for open export folder setting in file process window

Full Changelog: 1.2.2...1.2.3

Assets 4

11 Apr 15:42

Dadangdut33

1.2.2

d1154f8

1.2.2 - Progress indicator and bug fixes

Fixed some bugs and added some improvement, i will focus on adding new feature such as #27 #10 #19 for the next release.

Requirements

FFmpeg is required to be installed and added to the PATH environment variable. You can download it here and add it to your path manually OR you can do it automatically using the following commands:

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Whisper uses vram/gpu to process the audio, so it is recommended to have a CUDA compatible GPU. If there is no compatible GPU, the application will use the CPU to process the audio (This might make it slower). For each model requirement you can check directly at the whisper repository or you can hover over the model selection in the app (there will be a tooltip about the model info).
Speaker input only work on windows 8 and above.

What's Changed

Fix large model #29 #22 with help of @ryanhe312 in #26
Fix wrong variable used #25
Added selection for large-v1 and large-v2
Added window indicating the progress of each process #28
Added pause and cancel confirmation to model downloading
Main window and setting window will now retain its previous size

Full Changelog: 1.2.1...1.2.2

Contributors

ryanhe312

Assets 4

21 Mar 17:31

Dadangdut33

1.2.1

436e152

1.2.1 - More whisper settings, download window, dark theme

This release fixes the previous 1.2.0 release and added some extra feature.

Thanks to everyone that have submitted issue / posting discussion, feel free to let me know if you have any requests or found any bugs. I'm also sorry if the update is slow, i'm a little busy but i will keep on improving this fun project.

Bugs found in this release

Requirements

FFmpeg is required to be installed and added to the PATH environment variable. You can download it here and add it to your path manually OR you can do it automatically using the following commands:

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Whisper uses vram/gpu to process the audio, so it is recommended to have a CUDA compatible GPU. If there is no compatible GPU, the application will use the CPU to process the audio (This might make it slower). For each model requirement you can check directly at the whisper repository or you can hover over the model selection in the app (there will be a tooltip about the model info).
Speaker input only work on windows 8 and above.

What's Changed

fix translation bugs that occured in 1.2.0
fix chinese words lost in translation with timestamps thanks to @ryanhe312 in #20
fix arabic words not rendering properly by using arabic-reshape
Better default value for detached (now subtitle) window
Added ways to use more whisper settings #9 #8 thanks to @galaxea for the suggestion
Added ways to toggle the terminal/console window (might not work if windows terminal is set as default terminal)
Added model download window with progression and cancel button
Added countdown window for setting auto threshold input in setting
Added sv ttk theme with dark and light mode available a944b8d
Cleaner app folder using the new build script c11bc00

New Contributors

@ryanhe312 made their first contribution in #20. Thanks for the PR!

Full Changelog: 1.2.0...1.2.1

Edit: cpu release version was edited because I uploaded the wrong version

Contributors

ryanhe312 and galaxea

Assets 4

21 Feb 12:16

Dadangdut33

1.2.0

14230b4

1.2.0 - More whisper settings and dark theme Pre-release

Pre-release

There is a fatal bug for translation file 8c13dc1 that is left out here so i'll move this release to pre release and upload a fixed one asap.

This release have enable the ability to use more whisper options that are available. Please let me know if it's not working. I also decided to use 7Z Self-Extracting Executable File to distribute the release file from now on.

Next (1.3.0) release i'll be working on #10 to try implementing it.

Thanks to everyone that have submitted issue / posting discussion, feel free to let me know if you have any requests or found any bugs.

Requirements:

FFmpeg is required to be installed and added to the PATH environment variable. You can download it here and add it to your path manually OR you can do it automatically using the following commands:

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Whisper uses vram/gpu to process the audio, so it is recommended to have a CUDA compatible GPU. If there is no compatible GPU, the application will use the CPU to process the audio (This might make it slower). For each model requirement you can check directly at the whisper repository or you can hover over the model selection in the app (there will be a tooltip about the model info).
Speaker input only work on windows 8 and above.

Changelog:

Provided fix instruction for bug #12
Added ways to use more whisper settings #9 #8
Added setting and about window to tray app ce88bdc
sv ttk theme with dark and light mode available, module updates a944b8d
Cleaner app folder using the new build script c11bc00

Full Changelog: 1.1.0...1.2.0

Assets 4

0 Join discussion

09 Jan 11:05

Dadangdut33

1.1.0

c0fcddc

1.1.0 - Real time

This release bring lots of new feature, customization, and bug fixes.

As usual, I can only release on Windows platform for now but the code itself should work on other OS. I might update the release file if I could build it on other OS. You can also clone it and build it yourself or just run the main code directly after setting up the project instructed in readme

Requirements:

FFmpeg is required to be installed and added to the PATH environment variable. You can download it here and add it to your path manually OR you can do it automatically using the following commands:

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Whisper uses vram/gpu to process the audio, so it is recommended to have a CUDA compatible GPU. If there is no compatible GPU, the application will use the CPU to process the audio (This might make it slower). For each model requirement you can check directly at the whisper repository or you can hover over the model selection in the app (there will be a tooltip about the model info).
Speaker input only work on windows 8 and above.

Changelog:

Real time transcription and translation #1
Input threshold setting
Move textbox setting preview to setting window
File batch import #4
SRT timestamp #2
More customization to detached window (it can now look transparent to show only the words just like caption on Youtube) #5
Add right click menu to detached window

Full Changelog: 1.0.0...1.1.0

Preview of the new detached window

Testing the transcription on an english speaking youtube video in real time

Assets 4

14 Dec 09:16

Dadangdut33

1.0.0

874d517

1.0.0 - First Release

This is the first release of speech translate, the size are big so i compressed it using 7zip.

There are 2 release, cpu and gpu version. The CPU version have less size but can only use CPU, while the GPU version will use GPU if you have any compatible GPU. They are all portable so you can move the installation anywhere you want.

You can report any bugs or request a feature by posting an issue. If you are not interested in running through .exe or you want to build the .exe yourself, please take a look at the development in the readme.

Requirements:

Whisper uses vram/gpu to process the audio, so it is recommended to have a CUDA compatible GPU. If there is no compatible GPU, the application will use the CPU to process the audio (This might make it slower). For each model requirement you can check directly at the whisper repository or you can hover over the model selection in the app (there will be a tooltip about the model info).

Future stuff that would be added if possible:

real time transcription #1
more customisation

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Notes

Requirements

What's Changed

Requirements

Contributors

Requirements

What's Changed

Requirements

What's Changed

Contributors

Bugs found in this release

Requirements

What's Changed

New Contributors

Contributors

Requirements:

Future stuff that would be added if possible:

Releases: Dadangdut33/Speech-Translate

1.3.1 - Bug Fixes & Large-v3 Whisper Support

What's Changed

Notes

Requirements

1.3.0 - UI Overhaul and New Backend

What's Changed

Requirements

Contributors

1.2.3 - Bug Fix

Requirements

What's Changed

1.2.2 - Progress indicator and bug fixes

Requirements

What's Changed

Contributors

1.2.1 - More whisper settings, download window, dark theme

Bugs found in this release

Requirements

What's Changed

New Contributors

Contributors

1.2.0 - More whisper settings and dark theme

1.1.0 - Real time

1.0.0 - First Release

Requirements:

Future stuff that would be added if possible: