Bonnici, RS;
Benning, M;
Saitis, C;
(2022)
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks.
In:
Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN).
(pp. pp. 1-8).
Institute of Electrical and Electronics Engineers (IEEE)
Preview |
Text
Timbre_Transfer_accepted.pdf - Accepted Version Download (2MB) | Preview |
Abstract
This work investigates the application of deep learning to timbre transfer. The adopted approach combines Variational Autoencoders with Generative Adversarial Networks to construct meaningful representations of the source audio and produce realistic generations of the target audio and is applied to the Flickr 8k Audio dataset for transferring the vocal timbre between speakers and the URMP dataset for transferring the musical timbre between instruments. Variations of the adopted approach were trained, and performance was compared using the metrics SSIM (Structural Similarity Index) and FAD (Frechét Audio Distance). It was found that a many-to-many approach supersedes a one-to-one approach in terms of reconstructive capabilities, while one-to-one showed better results in terms of adversarial translation. The adoption of a basic over a bottleneck residual block design is more suitable for enriching content information about a latent space, and the decision on whether cyclic loss takes on a variational autoencoder or vanilla auto encoder approach does not have a significant impact on reconstructive and adversarial translation aspects of the model.
Type: | Proceedings paper |
---|---|
Title: | Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks |
Event: | 2022 International Joint Conference on Neural Networks (IJCNN) |
Location: | Padua, Italy |
Dates: | 18th-23rd July 2022 |
ISBN-13: | 978-1-7281-8671-9 |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/IJCNN55064.2022.9892107 |
Publisher version: | http://dx.doi.org/10.1109/ijcnn55064.2022.9892107 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions. |
Keywords: | music, speech, generative adversarial networks, cyclic consistency, variational autoencoders, voice conversion, timbre transfer, style transfer |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10185182 |




Archive Staff Only
![]() |
View Item |