UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks

Lam, MWY; Wang, J; Su, D; Yu, D; (2021) Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks. In: Proceedings of the 2021 IEEE Spoken Language Technology Workshop (SLT). (pp. pp. 801-808). IEEE Green open access

[thumbnail of 2101.05014.pdf]
Preview
Text
2101.05014.pdf - Accepted Version

Download (934kB) | Preview

Abstract

Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation. Nevertheless, conventional TasNets struggle to satisfy the memory and latency constraints in industrial applications. In this regard, we design a low-cost high-performance architecture, namely, globally attentive locally recurrent (GALR) network. Alike the dual-path RNN (DPRNN), we first split a feature sequence into 2D segments and then process the sequence along both the intra- and inter-segment dimensions. Our main innovation lies in that, on top of features recurrently processed along the inter-segment dimensions, GALR applies a self-attention mechanism to the sequence along the inter-segment dimension, which aggregates context-aware information and also enables parallelization. Our experiments suggest that GALR is a notably more effective network than the prior work. On one hand, with only 1.5M parameters, it has achieved comparable separation performance at a much lower cost with 36.1% less runtime memory and 49.4% fewer computational operations, relative to the DPRNN. On the other hand, in a comparable model size with DPRNN, GALR has consistently outperformed DPRNN in three datasets, in particular, with a substantial margin of 2.4dB absolute improvement of SI-SNRi in the benchmark WSJ0-2mix task.

Type: Proceedings paper
Title: Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks
Event: 2021 IEEE Spoken Language Technology Workshop (SLT)
Location: Shenzhen, China
Dates: 19th-22nd Jan 2021
ISBN-13: 9781728170664
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/SLT48900.2021.9383464
Publisher version: https://doi.org/10.1109/SLT48900.2021.9383464
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
Keywords: speech separation, TasNet, low-cost, multi-head attention
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10154105
Downloads since deposit
Loading...
32Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item