Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ストリーミングモードのdecodeを実装(precompute_renderとrender) #854

Merged
merged 23 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions crates/voicevox_core/src/blocking.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

pub use crate::{
engine::open_jtalk::blocking::OpenJtalk, infer::runtimes::onnxruntime::blocking::Onnxruntime,
synthesizer::blocking::Synthesizer, user_dict::dict::blocking::UserDict,
voice_model::blocking::VoiceModelFile,
synthesizer::blocking::AudioFeature, synthesizer::blocking::Synthesizer,
user_dict::dict::blocking::UserDict, voice_model::blocking::VoiceModelFile,
};

pub mod onnxruntime {
Expand Down
32 changes: 32 additions & 0 deletions crates/voicevox_core/src/engine/audio_file.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
use std::io::{Cursor, Write as _};

/// 16bit PCMにヘッダを付加しWAVフォーマットのバイナリを生成する。
pub fn wav_from_s16le(pcm: &[u8], sampling_rate: u32, is_stereo: bool) -> Vec<u8> {
let num_channels: u16 = if is_stereo { 2 } else { 1 };
let bit_depth: u16 = 16;
let block_size: u16 = bit_depth * num_channels / 8;

let bytes_size = pcm.len() as u32;
let wave_size = bytes_size + 44;

let buf: Vec<u8> = Vec::with_capacity(wave_size as usize);
let mut cur = Cursor::new(buf);

cur.write_all("RIFF".as_bytes()).unwrap();
cur.write_all(&(wave_size - 8).to_le_bytes()).unwrap();
cur.write_all("WAVEfmt ".as_bytes()).unwrap();
cur.write_all(&16_u32.to_le_bytes()).unwrap(); // fmt header length
cur.write_all(&1_u16.to_le_bytes()).unwrap(); // linear PCM
cur.write_all(&num_channels.to_le_bytes()).unwrap();
cur.write_all(&sampling_rate.to_le_bytes()).unwrap();

let block_rate = sampling_rate * block_size as u32;

cur.write_all(&block_rate.to_le_bytes()).unwrap();
cur.write_all(&block_size.to_le_bytes()).unwrap();
cur.write_all(&bit_depth.to_le_bytes()).unwrap();
cur.write_all("data".as_bytes()).unwrap();
cur.write_all(&bytes_size.to_le_bytes()).unwrap();
cur.write_all(pcm).unwrap();
cur.into_inner()
}
2 changes: 2 additions & 0 deletions crates/voicevox_core/src/engine/mod.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
mod acoustic_feature_extractor;
mod audio_file;
mod full_context_label;
mod kana_parser;
mod model;
mod mora_list;
pub(crate) mod open_jtalk;

pub(crate) use self::acoustic_feature_extractor::OjtPhoneme;
pub use self::audio_file::wav_from_s16le;
pub(crate) use self::full_context_label::{
extract_full_context_label, mora_to_text, FullContextLabelError,
};
Expand Down
2 changes: 1 addition & 1 deletion crates/voicevox_core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ use rstest_reuse;

pub use self::{
devices::SupportedDevices,
engine::{AccentPhrase, AudioQuery, FullcontextExtractor, Mora},
engine::{wav_from_s16le, AccentPhrase, AudioQuery, FullcontextExtractor, Mora},
error::{Error, ErrorKind},
metas::{
RawStyleId, RawStyleVersion, SpeakerMeta, StyleId, StyleMeta, StyleType, StyleVersion,
Expand Down
Loading
Loading