- Input speech: the original audio
- Voice conversion: apply the voice conversion to convert the voice and preserve the accent (pronunciation and prosody)
- Accent conversion: apply the accent conversion to convert the accent (from American-accent to Indian-accent and from Indian-accent to American-accent) and preserve the voice
Text | Input | Voice conversion | Accent conversion | |||
FragmentVC | VQMIVC | LSTM-based + VQMIVC synthetic data | Wav2vec-based + FragmentVC synthetic data | Wav2vec-based + VQMIVC synthetic data | ||
Author of the danger trail Philip Steels etc | ||||||
Not at this particular case Tom apologized Whittemore | ||||||
For the twentieth time that evening the two men shook hands | ||||||
Will we ever forget it | ||||||