最適な話者ピッチ範囲を送れるように #1121

Patchethium · 2024-03-16T23:29:02Z

ここの続きとして、metas.jsonに追加されたrangeをAPIで送れるようにしました。supported_featuresと同じく、speaker_infosで取得できます。

後ほどGUIの方にもPR出します。

Patchethium · 2024-03-17T14:18:42Z

なぜかtestに通れないですがコード自体はもう大丈夫だと思います。

Hiroshiba

大変お待たせしました🙇

エディタ向けにピッチ範囲をエンジンから送るのはとても良いアイデアだと思っています！
ぜひ実装したいです。

いくつか相談がしたいのでコメントします。
後で議論を追いやすいように適当なファイルにコメントを書きました。

english

Sorry to keep you waiting 🙇

I think it's a great idea to send the pitch range to the editor from the engine!
I definitely want to implement it.

I have some points to discuss, so I'll leave comments. To make the discussion easier to follow later, I've written the comments in an appropriate file.

Hiroshiba · 2024-04-04T04:21:40Z

speaker_info/7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff/metas.json

+{
+    "supported_features": {
+        "permitted_synthesis_morphing": "SELF_ONLY"
+    },
+    "range": [
+        {
+            "style_id": 2,
+            "low": 5.16,
+            "high": 6.2
+        },
+        {
+            "style_id": 0,
+            "low": 5.21,
+            "high": 6.13
+        }
+    ]
+}


metas.jsonのデータ構造について

エンジンのmetas.jsonの下にピッチ範囲の情報を書くのは良いアイデアだと思っています。
将来的にはピッチだけでなく、ソング機能用にボリュームも範囲や基準値も情報に加えたいです。
なのでこうするのはどうでしょう。（スタイルの追加情報だということがわかるようにしてみました）

{ "supported_features": { ... }, "additional_style_infos": [ { "style_id": int, "style_info": { "pitch": { "low": float, "high": float # 将来的に "reference" も追加 } # 将来的に以下も追加 # "volume": { # "low": float, # "high": float, # "reference": float # } } } ] }

english

About the data structure of metas.json

I think it's a good idea to write the pitch range information under the engine's metas.json.
In the future, in addition to the pitch, I would like to include information on volume ranges and reference values for the song function.
So, how about this? (I've tried to make it clear that it's additional information related to style)

Hiroshiba · 2024-04-04T04:25:59Z

run.py

+        tags=["その他"],
+        summary="指定したスタイルに対して最適なピッチ範囲を得る",
+    )
+    def optimal_pitch(style_id: StyleId) -> StylePitchRange:


APIとデータ構造について

ピッチ範囲を取得できる/optimal_pitchAPIを実装していますが、これはAPIを足すのではなく、スタイル関連情報がまとまっているところに足す形だとより扱いやすいかもしれません。
例えばSpeakerStyle内などです。
ただそれをするにはコードを読んでみた感じかなり改革が必要そうでした。でも良いAPI設計のためにやるべきだと感じています。
改革はプルリクエスト範囲から外れているので興味なければ苦痛かもしれません。もし興味あれば以下の議論に参加して頂ければ。

@y-chan さんと @tarepan さんに相談です。
たぶんピッチ範囲を格納するのは、StyleType等が保存されてるSpeakerStyle内が最適だと思うがどうでしょうか。
ただ、もし仮にSpeakerStyleにピッチ範囲を含める場合は問題があります。
SpeakerStyleはCoreSpeaker内にあるのですが、CoreSpeakerはコアの読み込みから利用されているため、変更できなくなっています。
そもそもAPIと無関係であるべきなCoreSpeakerモデルがここに定義されてることが変で、コア用のデータ構造はコア読み込みの部分だけで閉じているべきに感じました。
なのでmetas/Metas.py内からCoreSpeakerの定義をやめ、Speaker直下に.nameや.speaker_uuidや.supported_featuresを定義するのとかどうでしょうか。
たぶんAPIの返り値は一切変えずにCoreSpeaker定義を失くせると思っています。

この健全化作業を @tarepan さんにお任せしても良いでしょうか🙇
その変更のあと、 @Patchethium さんがこのプルリクエスト内で今のSpeakerStyle内に.pitch_rangeを追加するというのはどうでしょうか。

english

About the API and data structure

We are implementing an /optimal_pitch API that can retrieve the pitch range, but instead of adding this as a separate API, it might be more manageable to add it to a place where style-related information is aggregated, such as inside SpeakerStyle.
However, from what I've seen reading the code, this might require quite a bit of reform. But I feel it's worth doing for good API design.
This reform might be outside the scope of a pull request, so it could be a pain if you're not interested. But if you are interested, I would appreciate your participation in the following discussion.

@y-chan and @tarepan
Perhaps the best place to store the pitch range would be inside SpeakerStyle, where StyleType, etc., are stored, but there are issues with that.
SpeakerStyle is inside CoreSpeaker, but CoreSpeaker is being used from the core load, making it unchangeable.
It's odd that the CoreSpeaker model, which should be unrelated to the API, is defined here. I feel that the data structure for the core should be confined to just the core loading part.
So, how about stopping the definition of CoreSpeaker inside metas/Metas.py and defining .name, .speaker_uuid, .supported_features, etc., directly under Speaker?
I believe we can remove the definition of CoreSpeaker without changing any API responses at all.

Could we leave this rationalization task to @tarepan 🙇
After that change, how about @Patchethium adding .pitch_range to the current SpeakerStyle within this pull request?

ピッチ範囲を格納するのは ... SpeakerStyle 内が最適

👍️
同意です。
意味論からすると SpeakerStyle 内に配置するのが妥当と考えます。

APIと無関係であるべきなCoreSpeakerモデルがここに定義されてることが変で、コア用のデータ構造はコア読み込みの部分だけで閉じているべき

👍️
同意です。
現状は、list[Speaker] が VV API に露出する型であり、CoreSpeaker が Core API を包む型となっています。
Speaker の定義時に CoreSpeaker 継承を用いた結果 VV API と Core API が相互依存化し、CoreAPI が不変だからピッチ情報追加 VV API 変更ができない、という状況に陥っています。

CoreSpeakerの定義をやめ、Speaker直下に.nameや ... を定義

👍️
同意です。
class Speaker(CoreSpeaker, EngineSpeaker): による VV API - Core API 相互依存がこれで解けます。

健全化作業を @ tarepan さんにお任せしても良いでしょうか

👍️
承りました。複数 steps / PRs で構成されるリファクタリングをおこないます。
影響範囲が複数ファイルにまたがるため、いま健全化すると既存 PRs とコンフリクト祭りします。他 PRs が review & merge され着手可能になった段階で、@Hiroshiba さんから @ tarepan へ改めてメンションをお願いします。着手可能メンションを受け次第、実装に着手します。

私はコアに関する知識が一切ないのでなんとも言えないですが、スタイル情報を持っているコア側から取るのは一番自然なのは同意です。もし @tarepan さん）が実装してくればこれで一番良いと思います。
それでは一応ここをcloseにして、コア側の健全化作業を待つことにします。

遅くなりました！！ 🙇

@tarepan ありがとうございます！！

他 PRs が review & merge され着手可能になった段階で、@Hiroshiba さんから @ tarepan へ改めてメンションをお願いします。

承知しました！と言いたいところなのですが、ちょっとどのタイミングならいけやすいかPR全部把握しているわけではないのでわからないというのが正直なところです。。 🙇
今ぱっとPRのタイトルを見て回った感じ、API Routerの変更とテストの追加が主だと感じました。
このタスクはWebAPI周り・コア周りのSpeakerデータ構造のタスクなので、意外と今衝突していないかもしれません。

@Patchethium さんもありがとうございます！！

@Patchethium
お待たせしました。#1202 がマージされ、エンジンのデータ構造が整理されました。
ピッチ範囲をエンジンの metas.json に配置するのであれば実装可能かと思います（コアの metas.json に置くかどうかはメンテナ判断かと思います）。

Hiroshiba · 2024-04-04T04:28:10Z

speaker_info/35b2c544-660e-401e-b503-0e14c635303a/metas.json

 {
-  "supported_features": { "permitted_synthesis_morphing": "NOTHING" }
-}
+    "supported_features": {},
+    "range": [
+        {
+            "style_id": 8,
+            "low": 5.07,
+            "high": 6.5
+        }
+    ]
+}


データを配置する場所について

これはエンジンのmetas.jsonに書いてるけど、制作の都合上コアのmetas.jsonに書く方がやりやすいかもしれません。
データセットからピッチ範囲を計算する方針の場合は特に。
あとコアに書けばコアライブラリを使っている人も使えて便利かもしれません。
どうするかわからないけど、将来的にそういう設計に変えるかも、という共有です。

english

About where to place the data

This is written in the engine's metas.json, but it might be easier to handle if written in the core's metas.json, especially if we're calculating the pitch range from the dataset.
Also, if it's written in the core, it might be useful for people using the core library.
I'm not sure how to go about it, but it's a comment that we might change to such a design in the future.

同意です。ENGINE の metas.json ではなく CORE から取得すべきだと考えます。
なぜならピッチ範囲情報と並列関係にある SpeakerStyle の属性値（name / id / type）が CORE から取得する情報だからです。

add range into mock files, add range field in EngineSpeaker

eeae9e9

Patchethium requested a review from a team as a code owner March 16, 2024 23:29

Patchethium requested review from y-chan and removed request for a team March 16, 2024 23:29

Patchethium marked this pull request as draft March 16, 2024 23:38

add api /optimal_pitch that fetches pitch range by style id

c20f264

Patchethium marked this pull request as ready for review March 17, 2024 14:12

tarepan requested a review from Hiroshiba March 17, 2024 15:24

Patchethium mentioned this pull request Mar 19, 2024

話者の最適なピッチ範囲に限定できるように VOICEVOX/voicevox#1942

Open

Hiroshiba reviewed Apr 4, 2024

View reviewed changes

tarepan mentioned this pull request May 1, 2024

整理: Speaker と内部型の分離 #1202

Merged

tarepan mentioned this pull request May 16, 2024

refactor: API と音声ライブラリを分離 #1249

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

最適な話者ピッチ範囲を送れるように #1121

最適な話者ピッチ範囲を送れるように #1121

Patchethium commented Mar 16, 2024 •

edited

Loading

Patchethium commented Mar 17, 2024

Hiroshiba left a comment

Hiroshiba Apr 4, 2024

Hiroshiba Apr 4, 2024

tarepan Apr 4, 2024 •

edited

Loading

Patchethium Apr 9, 2024

Hiroshiba Apr 30, 2024 •

edited

Loading

tarepan May 6, 2024

Hiroshiba Apr 4, 2024

tarepan Apr 4, 2024 •

edited

Loading

最適な話者ピッチ範囲を送れるように #1121

Are you sure you want to change the base?

最適な話者ピッチ範囲を送れるように #1121

Conversation

Patchethium commented Mar 16, 2024 • edited Loading

Patchethium commented Mar 17, 2024

Hiroshiba left a comment

Choose a reason for hiding this comment

Hiroshiba Apr 4, 2024

Choose a reason for hiding this comment

metas.jsonのデータ構造について

About the data structure of metas.json

Hiroshiba Apr 4, 2024

Choose a reason for hiding this comment

APIとデータ構造について

About the API and data structure

tarepan Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

Patchethium Apr 9, 2024

Choose a reason for hiding this comment

Hiroshiba Apr 30, 2024 • edited Loading

Choose a reason for hiding this comment

tarepan May 6, 2024

Choose a reason for hiding this comment

Hiroshiba Apr 4, 2024

Choose a reason for hiding this comment

データを配置する場所について

About where to place the data

tarepan Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

Patchethium commented Mar 16, 2024 •

edited

Loading

tarepan Apr 4, 2024 •

edited

Loading

Hiroshiba Apr 30, 2024 •

edited

Loading

tarepan Apr 4, 2024 •

edited

Loading