Piper voices licensing question #271
-
Hi, I was looking at the MODEL_CARD file of many different voices and I noticed that most or some of the voices have "CC BY 4.0" license, but the voice was trained by using another voice such as lessac voice which has a "blizzard" license which is unclean to me and looks intimidating and I am not sure by that if I can use that voice commercially because of that (Such as libritts_r or "Joe" for example). My questions are:
I appreciate you answer to each one of my questions if you know the answer. Thanks ! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 13 replies
-
Partially quoting myself from a discussion elsewhere: I'm not a lawyer, so I can't offer any legal advice unfortunately. The Piper project is intended for text to speech research, and does not impose any additional licenses on the checkpoints or voice models. I've done my best to include links to the original licenses and training history, so I personally believe it is the responsibility of the end user to make the ultimate judgement. Sorry for the vague answer, but the truth is I have no idea how licenses are supposed to apply in the context of machine learning 🤷♂️ |
Beta Was this translation helpful? Give feedback.
-
I've been reading and rereading all those licenses. (I'm also not a lawyer, so, you know, this is not legal advice) Here are my thoughts on your questions: (1) I believe that the blizzard license is restrictive and will not allow any derivative to be used commercially. It's the hardest to read license, but it restricts all usage of thhe dataset (which it calls "the Materials") to be "Research Purposes". It defines "Research Purposes" as (and I'm cutting out irrelevant pieces to hopefully add clarity): "Research Purposes" means only those purposes associated with research and exploration ... using, incorporating or based upon the Materials (in whole or in part) ... , and ... excludes ... copying, ..., developing, adapting, amending or otherwise using the Materials for any commercial purpose, including the development, ... commercialisation, ... or licencing of voice synthesis ... products or services including, ... voice synthesis products or services... and other speech technology products and services." (2) The license on the libritts dataset used for voice is "CC BY" which does permit commercial usage so long as you give attribution properly. (3) Yes, all voices are covered. The dataset as a whole has the same license. That dataset was prepared from librivox recordings, which are all in the public domain. (4) Again, yes, as long as you follow the attribution requirement. The only big, ready to go dataset I know of is the LJSpeech dataset (https://keithito.com/LJ-Speech-Dataset/). I trained a high quality voice based on that for 2000 steps, which I released to the public domain as well. I'm not extatic with the results of some of the testing sentences, but for regular usage, it is fine (ymmv). I have no idea what could be done to train that voice better (maybe train with a larger batch size? Does that affect anything?). I have a RTX 3060 in a spare machine, and it took almost a month of constant thinking for that little guy to knock it out, so it's not practical to do too many tests. If I can ever figure out runpod.io better, so I can cheaply use better hardware, I may experiment on it again. That said, I'd love to have a good multispeaker voice that I don't ever have to worry about licensing for. There are some good speakers in the libritts voice, but most sound tinny or noisy. I spent hours and hours going over the 900 samples looking for ones I like. I have considered just starting a new training of the libritts-r set from scratch. I just need to be prepared to lose that machine for another month. I have also experimented with various automated tools to prepare my own similar dataset, but only using good sounding (to my ear) recordings from librivox. And also having more samples from a smaller set of speakers. Prepping a dataset by hand from good librivox recordings takes forever. So far, many of my experiments end up with the same occational weird pronounciations as I got with the LJSpeech voice. I don't know if it's a problem with phoneme usage in the dataset, or something else I'm doing wrong with training. |
Beta Was this translation helpful? Give feedback.
I've been reading and rereading all those licenses. (I'm also not a lawyer, so, you know, this is not legal advice) Here are my thoughts on your questions:
(1) I believe that the blizzard license is restrictive and will not allow any derivative to be used commercially. It's the hardest to read license, but it restricts all usage of thhe dataset (which it calls "the Materials") to be "Research Purposes". It defines "Research Purposes" as (and I'm cutting out irrelevant pieces to hopefully add clarity):
"Research Purposes" means only those purposes associated with research and exploration ... using, incorporating or based upon the Materials (in whole or in part) ... , and ... excludes ... c…