Create `outetts` JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. #42

xenova · 2024-12-03T12:33:13Z

This PR creates a JavaScript version of outetts, allowing the models to be run 100% locally in the browser using Transformers.js. It also supports running in other JavaScript environments like Node.js, Deno, Bun, etc.

I've tried to keep the API as similar to the python library as possible.

Example code:

import { HFModelConfig_v1, InterfaceHF } from "outetts";

// Configure the model
const model_config = new HFModelConfig_v1({
    model_path: "onnx-community/OuteTTS-0.2-500M",
    language: "en", // Supported languages in v0.2: en, zh, ja, ko
    dtype: "fp32", // Supported dtypes: fp32, q8, q4
});

// Initialize the interface
const tts_interface = await InterfaceHF({ model_version: "0.2", cfg: model_config });

// Print available default speakers
tts_interface.print_default_speakers();

// Load a default speaker
const speaker = tts_interface.load_default_speaker("male_1");

// Generate speech
const output = await tts_interface.generate({
    text: "Speech synthesis is the artificial production of human speech.",
    temperature: 0.1, // Lower temperature values may result in a more stable tone
    repetition_penalty: 1.1,
    max_length: 4096,

    // Optional: Use a speaker profile for consistent voice characteristics
    // Without a speaker profile, the model will generate a voice with random characteristics
    speaker,
});

// Save the synthesized speech to a file
output.save("output.wav");

outetts.js/version/v1/prompt_processor.js

outetts.js/version/v1/interface.js

xenova · 2024-12-03T16:58:56Z

v1 for the web demo:

output.mp4

xenova · 2024-12-03T22:57:13Z

After going down a long rabbit hole of number -> word conversion (and finding bugs in jaraco/inflect#226), I've got it working.

One interesting thing I found while testing is that - even with the python version - dates, like 2016 get read as "twenty sixteen", even though the number is split as "two thousand and sixteen". Also, numbers on their own aren't spoken very well. Might be an improvement for v3.

Example: "Hugging Face was founded in 2016."

2016.mp4

edwko · 2024-12-04T10:19:11Z

After going down a long rabbit hole of number -> word conversion (and finding bugs in jaraco/inflect#226), I've got it working.

One interesting thing I found while testing is that - even with the python version - dates, like 2016 get read as "twenty sixteen", even though the number is split as "two thousand and sixteen". Also, numbers on their own aren't spoken very well. Might be an improvement for v3.

Example: "Hugging Face was founded in 2016."
2016.mp4

Yeah, the numbers definitely need some improvement, and the next version should address this as well.
As for an alternative to inflect, num2words looks like a solid option. It supports modes such as cardinal (default), ordinal, ordinal_num, year, and currency. Not sure how I’d integrate them automatically, though.

It also does not seem to have the issue you mentioned. For example:

from num2words import num2words
for _ in range(10000):
    if num2words(0.000001) != 'zero point zero zero zero zero zero one':
        print("Failed")

edwko · 2024-12-04T10:44:51Z

Ran into an issue:

import en_female_2 from "./default_speakers/en_female_2.json" assert { type: "json" };  
                                                              ^^^^^^

SyntaxError: Unexpected identifier 'assert'
    at compileSourceTextModule (node:internal/modules/esm/utils:338:16)
    at ModuleLoader.moduleStrategy (node:internal/modules/esm/translators:103:18)
    at #translate (node:internal/modules/esm/loader:433:12)
    at ModuleLoader.loadAndTranslate (node:internal/modules/esm/loader:480:27)

Node.js v23.1.0

Not very knowledgeable in JS, but switching from assert to with fixed the issue. You might want to look into it:

import en_female_1 from "./default_speakers/en_female_1.json" with { type: "json" };
import en_female_2 from "./default_speakers/en_female_2.json" with { type: "json" };
import en_male_1 from "./default_speakers/en_male_1.json" with { type: "json" };
import en_male_2 from "./default_speakers/en_male_2.json" with { type: "json" };
import en_male_3 from "./default_speakers/en_male_3.json" with { type: "json" };
import en_male_4 from "./default_speakers/en_male_4.json" with { type: "json" };
import ja_female_1 from "./default_speakers/ja_female_1.json" with { type: "json" };
import ja_female_2 from "./default_speakers/ja_female_2.json" with { type: "json" };
import ja_female_3 from "./default_speakers/ja_female_3.json" with { type: "json" };
import ja_male_1 from "./default_speakers/ja_male_1.json" with { type: "json" };
import ko_female_1 from "./default_speakers/ko_female_1.json" with { type: "json" };
import ko_female_2 from "./default_speakers/ko_female_2.json" with { type: "json" };
import ko_male_1 from "./default_speakers/ko_male_1.json" with { type: "json" };
import ko_male_2 from "./default_speakers/ko_male_2.json" with { type: "json" };
import zh_female_1 from "./default_speakers/zh_female_1.json" with { type: "json" };
import zh_male_1 from "./default_speakers/zh_male_1.json" with { type: "json" };

xenova · 2024-12-04T10:54:58Z

Good point! I was using the older syntax still. Will update 👍

outetts.js/version/v1/default_speakers.js

edwko · 2024-12-04T11:43:47Z

Published the current dev code to the npm package:
https://www.npmjs.com/package/outetts

npm i outetts

I ran the example code, and everything seems to be working.

xenova · 2024-12-05T15:28:49Z

I'll move the PR to "ready to review" soon - just waiting for an upstream bugfix in onnxruntime-web to fix WebGPU implementation of audio decoder. This will be released in Transformers.js v3.1.2 👍

xenova · 2024-12-07T22:00:55Z

https://www.npmjs.com/package/@huggingface/transformers/v/3.1.2 is out. Updating package now 👍

xenova · 2024-12-08T00:26:11Z

Online demo: https://huggingface.co/spaces/webml-community/text-to-speech-webgpu

Also, marking as ready (even though the audio decoder still runs on CPU). That can be added in an update :)

PS: https://www.npmjs.com/package/outetts seems to be private/down, so I've installed from source for now.

edwko · 2024-12-08T19:42:50Z

Looks good, merged it! Thanks for putting this together :)

PS: https://www.npmjs.com/package/outetts seems to be private/down, so I've installed from source for now.

Hmm, seems to work fine, tried installing on a few different devices with no issues. Is it not loading for you?

xenova · 2024-12-08T19:47:20Z

Amazing! 🥳 It was an issue on my side - all good now!

xenova added 7 commits December 3, 2024 01:35

Implement basic outetts.js version

66df2f8

Formatting

db3837f

Add package.json

f0cdb37

Add example/demo code

7d95fd4

Create .gitignore

a9b93f8

Remove debug log

64818a8

Merge branch 'edwko:main' into main

1338b25

edwko reviewed Dec 3, 2024

View reviewed changes

outetts.js/version/v1/prompt_processor.js Outdated Show resolved Hide resolved

outetts.js/version/v1/prompt_processor.js Outdated Show resolved Hide resolved

xenova commented Dec 3, 2024

View reviewed changes

outetts.js/version/v1/interface.js Outdated Show resolved Hide resolved

xenova commented Dec 3, 2024

View reviewed changes

outetts.js/version/v1/interface.js Outdated Show resolved Hide resolved

This was referenced Dec 3, 2024

Could I run this in the browser? #30

Closed

Create Text-to-Speech WebGPU demo huggingface/transformers.js-examples#17

Merged

xenova added 17 commits December 3, 2024 17:50

Disable webgpu for wavtokenizer

c860872

unpack kwargs into generate function

05722b7

Add number_to_words functionality

694bb60

Add number_to_words unit tests

6895545

Use vitest

bf9cc63

Throw error if non-english language

3952e4d

Temporarily disable v0.1 model

728e139

Update unit tests

2328565

Update unit tests

1fa3714

Improvements

8173c6f

Update unit tests

d5db3d2

Improvements

b9c4e0e

No need for a list

8b5de93

Trim result, just in case

a14005b

Add coverage tests

5a516b3

Fix prompt processor

51cce8f

Add another mixed digit unit test

912da12

xenova commented Dec 4, 2024

View reviewed changes

outetts.js/version/v1/default_speakers.js Outdated Show resolved Hide resolved

assert -> with

6bb003d

xenova added 3 commits December 7, 2024 22:07

Enable webgpu for audio_codec

4b36659

Bump transformers.js version

c19659b

Use WASM/CPU for audio decoder

7323516

xenova marked this pull request as ready for review December 8, 2024 00:26

xenova changed the title ~~[WIP] Create outetts JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc.~~ Create outetts JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. Dec 8, 2024

edwko merged commit ddcd51b into edwko:main Dec 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create `outetts` JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. #42

Create `outetts` JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. #42

xenova commented Dec 3, 2024

xenova commented Dec 3, 2024

xenova commented Dec 3, 2024

edwko commented Dec 4, 2024 •

edited

Loading

edwko commented Dec 4, 2024

xenova commented Dec 4, 2024

edwko commented Dec 4, 2024

xenova commented Dec 5, 2024

xenova commented Dec 7, 2024

xenova commented Dec 8, 2024 •

edited

Loading

edwko commented Dec 8, 2024

xenova commented Dec 8, 2024

Create outetts JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. #42

Create outetts JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. #42

Conversation

xenova commented Dec 3, 2024

xenova commented Dec 3, 2024

xenova commented Dec 3, 2024

edwko commented Dec 4, 2024 • edited Loading

edwko commented Dec 4, 2024

xenova commented Dec 4, 2024

edwko commented Dec 4, 2024

xenova commented Dec 5, 2024

xenova commented Dec 7, 2024

xenova commented Dec 8, 2024 • edited Loading

edwko commented Dec 8, 2024

xenova commented Dec 8, 2024

Create `outetts` JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. #42

Create `outetts` JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. #42

edwko commented Dec 4, 2024 •

edited

Loading

xenova commented Dec 8, 2024 •

edited

Loading