SoftGAN: Learning generative models efficiently with application to CycleGAN Voice Conversion - Sorbonne Université Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2019

SoftGAN: Learning generative models efficiently with application to CycleGAN Voice Conversion

Résumé

Voice conversion with deep neural networks has become extremely popular over the last few years with improvements over the past VC architectures. In particular, GAN architectures such as the cycleGAN and the VAEGAN have offered the possibility to learn voice conversion from non-parallel databases. However, GAN-based methods are highly unstable, requiring often a careful tuning of hyper-parameters, and can lead to poor voice identity conversion and substantially degraded converted speech signal. This paper discusses and tackles the stability issues of the GAN in the context of voice conversion. The proposed SoftGAN method aims at reducing the impact of the generator on the discriminator and vice versa during training, so both can learn more gradually and efficiently during training, in particular avoiding a training not in tandem. A subjective experiment conducted on a voice conversion task on the voice conversion challenge 2018 dataset shows that the proposed SoftGAN significantly improves the quality of the voice conversion while preserving the naturalness of the converted speech.

Dates et versions

hal-02457060 , version 1 (27-01-2020)

Identifiants

Citer

Rafael Ferro, Nicolas Obin, Axel Roebel. SoftGAN: Learning generative models efficiently with application to CycleGAN Voice Conversion. 2019. ⟨hal-02457060⟩
149 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More