UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Lam, MWY; Wang, J; Su, D; Yu, D; (2022) BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis. In: ICLR 2022 - 10th International Conference on Learning Representations. ICLR Green open access

[thumbnail of 1626_bddm_bilateral_denoising_diffu.pdf]
Preview
Text
1626_bddm_bilateral_denoising_diffu.pdf - Published Version

Download (1MB) | Preview

Abstract

Diffusion probabilistic models (DPMs) and their extensions have emerged as competitive generative models yet confront challenges of efficient sampling. We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective. We show that the new surrogate objective can achieve a lower bound of the log marginal likelihood tighter than a conventional surrogate. We also find that BDDM allows inheriting pre-trained score network parameters from any DPMs and consequently enables speedy and stable learning of the schedule network and optimization of a noise schedule for sampling. Our experiments demonstrate that BDDMs can generate high-fidelity audio samples with as few as three sampling steps. Moreover, compared to other state-of-the-art diffusion-based neural vocoders, BDDMs produce comparable or higher quality samples indistinguishable from human speech, notably with only seven sampling steps (143x faster than WaveGrad and 28.6x faster than DiffWave). We release our code at https://github.com/tencent-ailab/bddm.

Type: Proceedings paper
Title: BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis
Event: ICLR 2022 - 10th International Conference on Learning Representations
Open access status: An open access version is available from UCL Discovery
Publisher version: https://openreview.net/forum?id=L7wzpQttNO
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Speech Synthesis, Vocoder, Generative Model, Diffusion Model
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10168070
Downloads since deposit
26Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item