UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask

Abdullah, S; Zamani, M; Demosthenous, A; (2021) Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask. IEEE Access 10.1109/access.2021.3056711. Green open access

[thumbnail of 09345671.pdf]
Preview
Text
09345671.pdf - Published Version

Download (2MB) | Preview

Abstract

Many studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs the ideal binary mask or the ideal ratio mask to reconstruct the enhanced speech signal. However, many SE applications in real scenarios demand a desirable balance between denoising capability and computational cost. In this study, first, an improvement over the ideal ratio mask to attain more superior SE performance is proposed through introducing an efficient adaptive correlation-based factor for adjusting the ratio mask. The proposed method exploits the correlation coefficients among the noisy speech, noise and clean speech to effectively re-distribute the power ratio of the speech and noise during the ratio mask construction phase. Second, to make the supervised SE system more computationally-efficient, quantization techniques are considered to reduce the number of bits needed to represent floating numbers, leading to a more compact SE model. The proposed quantized correlation mask is utilized in conjunction with a 4-layer deep neural network (DNN-QCM) comprising dropout regulation, pre-training and noise-aware training to derive a robust and high-order mapping in enhancement, and to improve generalization capability in unseen conditions. Results show that the quantized correlation mask outperforms the conventional ratio mask representation and the other SE algorithms used for comparison. When compared to a DNN with ideal ratio mask as its learning targets, the DNN-QCM provided an improvement of approximately 6.5% in the short-time objective intelligibility score and 11.0% in the perceptual evaluation of speech quality score. The introduction of the quantization method can reduce the neural network weights to a 5-bit representation from a 32-bit, while effectively suppressing stationary and non-stationary noise. Timing analyses also show that with the techniques incorporated in the proposed DNN-QCM system to increase its compac...

Type: Article
Title: Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/access.2021.3056711
Publisher version: https://doi.org/10.1109/ACCESS.2021.3056711
Language: English
Additional information: This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10121574
Downloads since deposit
273Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item