Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
yaoxunxu committed Dec 26, 2023
0 parents commit d9c4ec9
Show file tree
Hide file tree
Showing 22 changed files with 322 additions and 0 deletions.
116 changes: 116 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
CC0 1.0 Universal

Statement of Purpose

The laws of most jurisdictions throughout the world automatically confer
exclusive Copyright and Related Rights (defined below) upon the creator and
subsequent owner(s) (each and all, an "owner") of an original work of
authorship and/or a database (each, a "Work").

Certain owners wish to permanently relinquish those rights to a Work for the
purpose of contributing to a commons of creative, cultural and scientific
works ("Commons") that the public can reliably and without fear of later
claims of infringement build upon, modify, incorporate in other works, reuse
and redistribute as freely as possible in any form whatsoever and for any
purposes, including without limitation commercial purposes. These owners may
contribute to the Commons to promote the ideal of a free culture and the
further production of creative, cultural and scientific works, or to gain
reputation or greater distribution for their Work in part through the use and
efforts of others.

For these and/or other purposes and motivations, and without any expectation
of additional consideration or compensation, the person associating CC0 with a
Work (the "Affirmer"), to the extent that he or she is an owner of Copyright
and Related Rights in the Work, voluntarily elects to apply CC0 to the Work
and publicly distribute the Work under its terms, with knowledge of his or her
Copyright and Related Rights in the Work and the meaning and intended legal
effect of CC0 on those rights.

1. Copyright and Related Rights. A Work made available under CC0 may be
protected by copyright and related or neighboring rights ("Copyright and
Related Rights"). Copyright and Related Rights include, but are not limited
to, the following:

i. the right to reproduce, adapt, distribute, perform, display, communicate,
and translate a Work;

ii. moral rights retained by the original author(s) and/or performer(s);

iii. publicity and privacy rights pertaining to a person's image or likeness
depicted in a Work;

iv. rights protecting against unfair competition in regards to a Work,
subject to the limitations in paragraph 4(a), below;

v. rights protecting the extraction, dissemination, use and reuse of data in
a Work;

vi. database rights (such as those arising under Directive 96/9/EC of the
European Parliament and of the Council of 11 March 1996 on the legal
protection of databases, and under any national implementation thereof,
including any amended or successor version of such directive); and

vii. other similar, equivalent or corresponding rights throughout the world
based on applicable law or treaty, and any national implementations thereof.

2. Waiver. To the greatest extent permitted by, but not in contravention of,
applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
and Related Rights and associated claims and causes of action, whether now
known or unknown (including existing as well as future claims and causes of
action), in the Work (i) in all territories worldwide, (ii) for the maximum
duration provided by applicable law or treaty (including future time
extensions), (iii) in any current or future medium and for any number of
copies, and (iv) for any purpose whatsoever, including without limitation
commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes
the Waiver for the benefit of each member of the public at large and to the
detriment of Affirmer's heirs and successors, fully intending that such Waiver
shall not be subject to revocation, rescission, cancellation, termination, or
any other legal or equitable action to disrupt the quiet enjoyment of the Work
by the public as contemplated by Affirmer's express Statement of Purpose.

3. Public License Fallback. Should any part of the Waiver for any reason be
judged legally invalid or ineffective under applicable law, then the Waiver
shall be preserved to the maximum extent permitted taking into account
Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
is so judged Affirmer hereby grants to each affected person a royalty-free,
non transferable, non sublicensable, non exclusive, irrevocable and
unconditional license to exercise Affirmer's Copyright and Related Rights in
the Work (i) in all territories worldwide, (ii) for the maximum duration
provided by applicable law or treaty (including future time extensions), (iii)
in any current or future medium and for any number of copies, and (iv) for any
purpose whatsoever, including without limitation commercial, advertising or
promotional purposes (the "License"). The License shall be deemed effective as
of the date CC0 was applied by Affirmer to the Work. Should any part of the
License for any reason be judged legally invalid or ineffective under
applicable law, such partial invalidity or ineffectiveness shall not
invalidate the remainder of the License, and in such case Affirmer hereby
affirms that he or she will not (i) exercise any of his or her remaining
Copyright and Related Rights in the Work or (ii) assert any associated claims
and causes of action with respect to the Work, in either case contrary to
Affirmer's express Statement of Purpose.

4. Limitations and Disclaimers.

a. No trademark or patent rights held by Affirmer are waived, abandoned,
surrendered, licensed or otherwise affected by this document.

b. Affirmer offers the Work as-is and makes no representations or warranties
of any kind concerning the Work, express, implied, statutory or otherwise,
including without limitation warranties of title, merchantability, fitness
for a particular purpose, non infringement, or the absence of latent or
other defects, accuracy, or the present or absence of errors, whether or not
discoverable, all to the greatest extent permissible under applicable law.

c. Affirmer disclaims responsibility for clearing rights of other persons
that may apply to the Work or any use thereof, including without limitation
any person's Copyright and Related Rights in the Work. Further, Affirmer
disclaims responsibility for obtaining any necessary consents, permissions
or other rights required for any use of the Work.

d. Affirmer understands and acknowledges that Creative Commons is not a
party to this document and has no duty or obligation with respect to this
CC0 or use of the Work.

For more information, please see
<http://creativecommons.org/publicdomain/zero/1.0/>
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# automatic-dubbing
Please visit https://thuhcsi.github.io/StyleDub/
6 changes: 6 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
remote_theme: thuhcsi/slate
title: SECAP: Speech Emotion Captioning with Large Language Model
description: Accepted by AAAI 2024.
show_downloads: true
google_analytics:
theme: jekyll-theme-cayman
8 changes: 8 additions & 0 deletions assets/css/style.scss
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
---

@import "jekyll-theme-slate";

audio {
width: 150px;
}
Binary file added assets/images/bg_hr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/blacktocat.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/icon_download.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions assets/images/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/sprite_download.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
110 changes: 110 additions & 0 deletions index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
layout: default
---
<!--
# Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
# Abstract
Automatic dubbing, which generates a corresponding version of the input speech in another language, could be widely utilized in many real-world scenarios such as video and game localization. In addition to synthesizing the translated scripts, automatic dubbing needs to further transfer the speaking style in the original language to the dubbed speeches
to give audiences the impression that the characters are speaking in their native tongue.
However, state-of-the-art automatic dubbing systems only model the transfer on duration and speaking rate, neglecting the other aspects in speaking style such as emotion, intonation and emphasis which are also crucial to fully perform the characters and speech understanding.
In this paper, we propose a joint multi-scale cross-lingual speaking style transfer framework to simultaneously model the bidirectional speaking style transfer between languages at both global (i.e. utterance level) and local (i.e. word level) scales.
The global and local speaking styles in each language are extracted and utilized to predicted the global and local speaking styles in the other language with an encoder-decoder framework for each direction and a shared bidirectional attention mechanism for both directions.
A multi-scale speaking style enhanced FastSpeech 2 is then utilized to synthesize the predicted the global and local speaking styles to speech for each language. Experiment results demonstrate the effectiveness of our proposed framework, which outperforms a baseline with only duration transfer in both objective and subjective evaluations.
<center>
<img src="./imgs/model.png" width="20%" height="20%">
<br>
<div style="color:orange; border-bottom: 1px solid #d9d9d9;
display: inline-block;
color: #999;
padding: 2px;"> Fig.1: The the proposed joint multi-scale cross-lingual speaking style transfer model. </div>
</center>
-->


<!-- # Subjective Evaluation -->

To demonstrate that our proposed model can significantly transfer the cross-lingual speaking styles both in global and local from source speech to the synthesized speech, some samples are provided for comparison. **Source Speech** means the source speech in the original language, reconstructed by a vocoder. **FastSpeech 2** means an open-source implementation of FastSpeech 2, with no speaking style transfer. **Duration Tansfer** means duration tansfer model, which predicts the duration of every word in the target speech. **Joint Style Transfer** means the proposed model, which predicts joint multi-scale cross-lingual speaking style in the target speech. In addition, a well-trained HIFI-GAN is used as the vocoder to generate waveform.

## Dataset samples

<table>
<tr>
<th>Wav</th>
<th>Speech Transcription</th>
<th>Human-Labeled Speach Emotion Label</th>
<th>Human-Labeled Speach Emotion Caption</th>
</tr>
<tr>
<td rowspan="3"><audio controls><source src="./wavs/tx_emotion_00201000008.wav" type="audio/wav">Your browser does not support the audio element.</audio></td>
<td rowspan="3">是不是女人对爱总是一往情深,是不是女人都太过天真。</td>
<td rowspan="3">伤心</td>
<td>心里悲伤痛苦,酸楚且愤怒</td>
</tr>
<tr>
<td>后悔难过,大声哭泣,心中充满悔恨和自责</td>
</tr>
<tr>
<td>悲痛到无法自拔</td>
</tr>
<tr>
<td rowspan="5"><audio controls><source src="./wavs/tx_xiao_0200103000507.wav" type="audio/wav">Your browser does not support the audio element.</audio></td>
<td rowspan="5">我们一家三口上二楼,发现不排队,欣喜! </td>
<td rowspan="5">开心</td>
<td>语气十分欣喜,有些惊讶,带着喜出望外之感</td>
</tr>
<tr>
<td>表明内心感到无比的惊喜和幸运,带有些许的得意之感。</td>
</tr>
<tr>
<td>语气中透露自信、欢乐,带有些些自豪。</td>
</tr>
<tr>
<td>语速较快,语调逐渐上扬,显示出内心的惊喜。</td>
</tr>
<tr>
<td>语气激动,非常愉悦和得意。</td>
</tr>
<tr>
<td rowspan="5"><audio controls><source src="./wavs/tx_xiao_0200103000507.wav" type="audio/wav">Your browser does not support the audio element.</audio>tx_xiao_0100104000109</td>
<td rowspan="5">你们看她那脏兮兮的衣服还穿反了。</td>
<td rowspan="5">生气</td>
<td>声嘶力竭的吼,音调高,情绪激动愤怒。</td>
</tr>
<tr>
<td>语速快,声调越来越高,声音昂扬,形容很想发怒。</td>
</tr>
<tr>
<td>语气带着嘲讽,情绪里满是轻蔑。</td>
</tr>
<tr>
<td>语气中饱含着不满与嫌弃厌恶之感,还有一丝轻蔑。</td>
</tr>
<tr>
<td>语言急促,声音严厉,音量大。</td>
</tr>










## Test samples

| Wav | Human-Labeled Speach Emotion Label | Human-Labeled Speach Emotion Caption | SECap generated speech emotion caption |
| :---- | :---- | :---- | :---- |
| <audio controls><source src="./wavs/tx_emotion_00303000260.wav" type="audio/wav">Your browser does not support the audio element.</audio> | 快乐 | 心情快乐舒畅 | 快乐而愉悦,心情舒畅 |
| <audio controls><source src="./wavs/tx_emulate_02_231_0008_000034.wav" type="audio/wav">Your browser does not support the audio element.</audio> | 愤怒 | 语调高扬,反对的语气,突出内心的不满和愤怒 | 语速较快,声音较高,情绪中带着一丝不满 |
| <audio controls><source src="./wavs/tx_xiao_0200103000102.wav" type="audio/wav">Your browser does not support the audio element.</audio> | 激动 | 又惊又喜,语气中充满喜悦和惊讶 | 满怀期待的语气,情绪中透着激动和兴奋 |
| <audio controls><source src="./wavs/tx_xiao_0200107000936.wav" type="audio/wav">Your browser does not support the audio element.</audio> | 悲伤 | 表明内心充满委屈和不满,流露出极度的悲伤 | 情绪波动较大,语气中充满了委屈和难过 |
| <audio controls><source src="./wavs/tx_emulate_02_245_0001_000034.wav" type="audio/wav">Your browser does not support the audio element.</audio> | 激动 | 声音高亢尖利,语气里带着不悦,情绪激动,非常恼怒的情绪 | 语气激昂,内心十分不开心,而且特别的愤怒 |
| <audio controls><source src="./wavs/tx_emulate_00_109_0004_000071.wav" type="audio/wav">Your browser does not support the audio element.</audio> | 激动 | 语速极快,情绪中非常不爽,声音很尖锐,后面又流露放缓了态度 | 语气中带有一种不满和不耐烦的情绪,还有一点抱怨 |
| <audio controls><source src="./wavs/tx_emulate_01_003_0003_000018.wav" type="audio/wav">Your browser does not support the audio element.</audio> | 快乐 | 非常高兴,话语中透露着喜欢和愉悦 | 语气中透露着一种愉悦的心情,还有一点的期待 |


Loading

0 comments on commit d9c4ec9

Please sign in to comment.