eprintid: 10154105
rev_number: 13
eprint_status: archive
userid: 699
dir: disk0/10/15/41/05
datestamp: 2022-08-23 11:04:17
lastmod: 2022-08-23 11:04:17
status_changed: 2022-08-23 11:04:17
type: proceedings_section
metadata_visibility: show
sword_depositor: 699
creators_name: Lam, MWY
creators_name: Wang, J
creators_name: Su, D
creators_name: Yu, D
title: Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks
ispublished: pub
divisions: C05
divisions: F48
divisions: B04
divisions: UCL
keywords: speech separation, TasNet, low-cost, multi-head
attention
note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
abstract: Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation. Nevertheless, conventional TasNets struggle to satisfy the memory and latency constraints in industrial applications. In this regard, we design a low-cost high-performance architecture, namely, globally attentive locally recurrent (GALR) network. Alike the dual-path RNN (DPRNN), we first split a feature sequence into 2D segments and then process the sequence along both the intra- and inter-segment dimensions. Our main innovation lies in that, on top of features recurrently processed along the inter-segment dimensions, GALR applies a self-attention mechanism to the sequence along the inter-segment dimension, which aggregates context-aware information and also enables parallelization. Our experiments suggest that GALR is a notably more effective network than the prior work. On one hand, with only 1.5M parameters, it has achieved comparable separation performance at a much lower cost with 36.1% less runtime memory and 49.4% fewer computational operations, relative to the DPRNN. On the other hand, in a comparable model size with DPRNN, GALR has consistently outperformed DPRNN in three datasets, in particular, with a substantial margin of 2.4dB absolute improvement of SI-SNRi in the benchmark WSJ0-2mix task.
date: 2021
date_type: published
publisher: IEEE
official_url: https://doi.org/10.1109/SLT48900.2021.9383464
oa_status: green
full_text_type: other
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 1971132
doi: 10.1109/SLT48900.2021.9383464
isbn_13: 9781728170664
lyricists_name: Wang, Jun
lyricists_id: JWANG00
actors_name: Flynn, Bernadette
actors_id: BFFLY94
actors_role: owner
full_text_status: public
pres_type: paper
publication: 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
pagerange: 801-808
event_title: 2021 IEEE Spoken Language Technology Workshop (SLT)
event_location: Shenzhen, China
event_dates: 19th-22nd Jan 2021
book_title: Proceedings of the 2021 IEEE Spoken Language Technology Workshop (SLT)
citation:        Lam, MWY;    Wang, J;    Su, D;    Yu, D;      (2021)    Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks.                     In:  Proceedings of the 2021 IEEE Spoken Language Technology Workshop (SLT).  (pp. pp. 801-808).  IEEE       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10154105/1/2101.05014.pdf