Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create outetts JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. #42

Merged
merged 28 commits into from
Dec 8, 2024

Conversation

xenova
Copy link
Contributor

@xenova xenova commented Dec 3, 2024

This PR creates a JavaScript version of outetts, allowing the models to be run 100% locally in the browser using Transformers.js. It also supports running in other JavaScript environments like Node.js, Deno, Bun, etc.

I've tried to keep the API as similar to the python library as possible.

Example code:

import { HFModelConfig_v1, InterfaceHF } from "outetts";

// Configure the model
const model_config = new HFModelConfig_v1({
    model_path: "onnx-community/OuteTTS-0.2-500M",
    language: "en", // Supported languages in v0.2: en, zh, ja, ko
    dtype: "fp32", // Supported dtypes: fp32, q8, q4
});

// Initialize the interface
const tts_interface = await InterfaceHF({ model_version: "0.2", cfg: model_config });

// Print available default speakers
tts_interface.print_default_speakers();

// Load a default speaker
const speaker = tts_interface.load_default_speaker("male_1");

// Generate speech
const output = await tts_interface.generate({
    text: "Speech synthesis is the artificial production of human speech.",
    temperature: 0.1, // Lower temperature values may result in a more stable tone
    repetition_penalty: 1.1,
    max_length: 4096,

    // Optional: Use a speaker profile for consistent voice characteristics
    // Without a speaker profile, the model will generate a voice with random characteristics
    speaker,
});

// Save the synthesized speech to a file
output.save("output.wav");

outetts.js/version/v1/prompt_processor.js Outdated Show resolved Hide resolved
outetts.js/version/v1/prompt_processor.js Outdated Show resolved Hide resolved
@xenova
Copy link
Contributor Author

xenova commented Dec 3, 2024

v1 for the web demo:

output.mp4

@xenova
Copy link
Contributor Author

xenova commented Dec 3, 2024

After going down a long rabbit hole of number -> word conversion (and finding bugs in jaraco/inflect#226), I've got it working.

One interesting thing I found while testing is that - even with the python version - dates, like 2016 get read as "twenty sixteen", even though the number is split as "two thousand and sixteen". Also, numbers on their own aren't spoken very well. Might be an improvement for v3.

Example: "Hugging Face was founded in 2016."

2016.mp4

@edwko
Copy link
Owner

edwko commented Dec 4, 2024

After going down a long rabbit hole of number -> word conversion (and finding bugs in jaraco/inflect#226), I've got it working.

One interesting thing I found while testing is that - even with the python version - dates, like 2016 get read as "twenty sixteen", even though the number is split as "two thousand and sixteen". Also, numbers on their own aren't spoken very well. Might be an improvement for v3.

Example: "Hugging Face was founded in 2016."
2016.mp4

Yeah, the numbers definitely need some improvement, and the next version should address this as well.
As for an alternative to inflect, num2words looks like a solid option. It supports modes such as cardinal (default), ordinal, ordinal_num, year, and currency. Not sure how I’d integrate them automatically, though.

It also does not seem to have the issue you mentioned. For example:

from num2words import num2words
for _ in range(10000):
    if num2words(0.000001) != 'zero point zero zero zero zero zero one':
        print("Failed")

@edwko
Copy link
Owner

edwko commented Dec 4, 2024

Ran into an issue:

import en_female_2 from "./default_speakers/en_female_2.json" assert { type: "json" };  
                                                              ^^^^^^

SyntaxError: Unexpected identifier 'assert'
    at compileSourceTextModule (node:internal/modules/esm/utils:338:16)
    at ModuleLoader.moduleStrategy (node:internal/modules/esm/translators:103:18)
    at #translate (node:internal/modules/esm/loader:433:12)
    at ModuleLoader.loadAndTranslate (node:internal/modules/esm/loader:480:27)

Node.js v23.1.0

Not very knowledgeable in JS, but switching from assert to with fixed the issue. You might want to look into it:

import en_female_1 from "./default_speakers/en_female_1.json" with { type: "json" };
import en_female_2 from "./default_speakers/en_female_2.json" with { type: "json" };
import en_male_1 from "./default_speakers/en_male_1.json" with { type: "json" };
import en_male_2 from "./default_speakers/en_male_2.json" with { type: "json" };
import en_male_3 from "./default_speakers/en_male_3.json" with { type: "json" };
import en_male_4 from "./default_speakers/en_male_4.json" with { type: "json" };
import ja_female_1 from "./default_speakers/ja_female_1.json" with { type: "json" };
import ja_female_2 from "./default_speakers/ja_female_2.json" with { type: "json" };
import ja_female_3 from "./default_speakers/ja_female_3.json" with { type: "json" };
import ja_male_1 from "./default_speakers/ja_male_1.json" with { type: "json" };
import ko_female_1 from "./default_speakers/ko_female_1.json" with { type: "json" };
import ko_female_2 from "./default_speakers/ko_female_2.json" with { type: "json" };
import ko_male_1 from "./default_speakers/ko_male_1.json" with { type: "json" };
import ko_male_2 from "./default_speakers/ko_male_2.json" with { type: "json" };
import zh_female_1 from "./default_speakers/zh_female_1.json" with { type: "json" };
import zh_male_1 from "./default_speakers/zh_male_1.json" with { type: "json" };

@xenova
Copy link
Contributor Author

xenova commented Dec 4, 2024

Good point! I was using the older syntax still. Will update 👍
image

@edwko
Copy link
Owner

edwko commented Dec 4, 2024

Published the current dev code to the npm package:
https://www.npmjs.com/package/outetts

npm i outetts

I ran the example code, and everything seems to be working.

@xenova
Copy link
Contributor Author

xenova commented Dec 5, 2024

I'll move the PR to "ready to review" soon - just waiting for an upstream bugfix in onnxruntime-web to fix WebGPU implementation of audio decoder. This will be released in Transformers.js v3.1.2 👍

@xenova
Copy link
Contributor Author

xenova commented Dec 7, 2024

https://www.npmjs.com/package/@huggingface/transformers/v/3.1.2 is out. Updating package now 👍

@xenova
Copy link
Contributor Author

xenova commented Dec 8, 2024

Online demo: https://huggingface.co/spaces/webml-community/text-to-speech-webgpu

Also, marking as ready (even though the audio decoder still runs on CPU). That can be added in an update :)

PS: https://www.npmjs.com/package/outetts seems to be private/down, so I've installed from source for now.

@xenova xenova marked this pull request as ready for review December 8, 2024 00:26
@xenova xenova changed the title [WIP] Create outetts JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. Create outetts JS library for text-to-speech in the browser, Node.js, Deno, Bun, etc. Dec 8, 2024
@edwko edwko merged commit ddcd51b into edwko:main Dec 8, 2024
@edwko
Copy link
Owner

edwko commented Dec 8, 2024

Looks good, merged it! Thanks for putting this together :)

PS: https://www.npmjs.com/package/outetts seems to be private/down, so I've installed from source for now.

Hmm, seems to work fine, tried installing on a few different devices with no issues. Is it not loading for you?

@xenova
Copy link
Contributor Author

xenova commented Dec 8, 2024

Amazing! 🥳 It was an issue on my side - all good now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants