UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning

Abdullah, Salinna; Zamanim, Majid; Demosthenous, Andreas; (2024) Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning. IEEE Open Journal of Circuits and Systems , 5 pp. 141-152. 10.1109/ojcas.2024.3389100. Green open access

[thumbnail of Hardware_Efficient_Speech_Enhancement_With_Noise_Aware_Multi-Target_Deep_Learning.pdf]
Preview
Text
Hardware_Efficient_Speech_Enhancement_With_Noise_Aware_Multi-Target_Deep_Learning.pdf - Published Version

Download (7MB) | Preview

Abstract

This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88 mm2 and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.

Type: Article
Title: Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/ojcas.2024.3389100
Publisher version: https://doi.org/10.1109/ojcas.2024.3389100
Language: English
Additional information: Copyright © 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
Keywords: Deep neural network, digital circuits, field programmable gate array (FPGA), mapping, masking, multi-target learning, speech enhancement, structured pruning, ternary quantisation
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10191994
Downloads since deposit
13Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item