UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks

Bonnici, RS; Benning, M; Saitis, C; (2022) Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks. In: Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN). (pp. pp. 1-8). Institute of Electrical and Electronics Engineers (IEEE) Green open access

[thumbnail of Timbre_Transfer_accepted.pdf]
Preview
Text
Timbre_Transfer_accepted.pdf - Accepted Version

Download (2MB) | Preview

Abstract

This work investigates the application of deep learning to timbre transfer. The adopted approach combines Variational Autoencoders with Generative Adversarial Networks to construct meaningful representations of the source audio and produce realistic generations of the target audio and is applied to the Flickr 8k Audio dataset for transferring the vocal timbre between speakers and the URMP dataset for transferring the musical timbre between instruments. Variations of the adopted approach were trained, and performance was compared using the metrics SSIM (Structural Similarity Index) and FAD (Frechét Audio Distance). It was found that a many-to-many approach supersedes a one-to-one approach in terms of reconstructive capabilities, while one-to-one showed better results in terms of adversarial translation. The adoption of a basic over a bottleneck residual block design is more suitable for enriching content information about a latent space, and the decision on whether cyclic loss takes on a variational autoencoder or vanilla auto encoder approach does not have a significant impact on reconstructive and adversarial translation aspects of the model.

Type: Proceedings paper
Title: Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks
Event: 2022 International Joint Conference on Neural Networks (IJCNN)
Location: Padua, Italy
Dates: 18th-23rd July 2022
ISBN-13: 978-1-7281-8671-9
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/IJCNN55064.2022.9892107
Publisher version: http://dx.doi.org/10.1109/ijcnn55064.2022.9892107
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
Keywords: music, speech, generative adversarial networks, cyclic consistency, variational autoencoders, voice conversion, timbre transfer, style transfer
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10185182
Downloads since deposit
Loading...
46Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item