UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Efficient Deep-Learning Speech Enhancement Algorithms for Hearing Devices

Abdullah, Salinna; (2023) Efficient Deep-Learning Speech Enhancement Algorithms for Hearing Devices. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Abdullah_ID_thesis.pdf]
Preview
Text
Abdullah_ID_thesis.pdf - Other

Download (17MB) | Preview

Abstract

Speech, essential for human communication, often occurs simultaneously with acoustic interference, such as background noise and room reverberation in natural, real-world environments. Perceiving speech with the presence of noise, especially in conditions where the background noise may significantly interfere with the perception of speech [described as low signal-to-noise ratio (SNR) conditions], is challenging, especially for hearing-impaired listeners. This presents a need for improved speech processing in hearing-assistive devices like hearing aids and cochlear implants to enhance speech intelligibility and quality for hearing-impaired users. Speech enhancement (SE) is the task of segregating speech from background noise interference and is employed in applications ranging from automatic speech recognition to telecommunication, in addition to hearing devices. This thesis details the development of various deep-learning-based monaural SE algorithms and presents an extensive study of their efficacy compared to conventional and state-of-the-art SE algorithms. Since the algorithms were intended for hearing devices which are space, memory and power constrained, optimisation and compression techniques were explored to achieve compact and energy-efficient SE models. SE can be treated as a supervised learning problem where enhanced speech is estimated from the noisy speech input. This thesis describes two variations of feedforward deep neural network-based (DNN-based) SE algorithm and one convolutional neural network-based (CNN-based) SE algorithm that were developed as part of the PhD work. The proposed supervised DNN SE systems comprise three main components: acoustic features, training targets, and learning machines (in this case, DNN). Acoustic feature extraction methods inspired by the human auditory model were explored and evaluated for enhanced speech estimation. A complementary combination of acoustic features encompassing broad spectro-temporal contexts was found to be beneficial for improved denoising performance. Gammatone-domain features were frequently used as they were originally designed to model human cochlear filtering and have demonstrated good noise robustness and high feature separability. Training target generation involves estimating time-frequency (T-F) masks or mapping procedures that would generate an enhanced speech output when applied to the noisy speech. Two different training target generation methods were formulated: (1) ideal ratio mask (IRM) tuned with correlation information about the noise and clean speech relative to the noisy speech, and (2) automatic switching between the mapping and masking-based training target estimation conditioned on the noisy speech features. The CNN-based SE algorithm addresses the challenge of designing convolutional filter kernels optimal for SE applications. To reduce the computational complexity and memory requirement of the deep learning SE models, novel quantisation and structured pruning approaches were proposed. Systematic evaluations show that the proposed systems have led to better objective speech intelligibility and quality outcomes over conventional supervised SE (e.g., DNN trained with IRMs) and non-supervised SE (e.g., Wiener filtering). Furthermore, these systems have outperformed many other SE algorithms proposed in the literature whilst being more computationally efficient. The proposed algorithms showed good generalisation capability when presented with untrained noise types and SNRs. A DNN-based SE algorithm was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm complementary metal–oxide–semiconductor (CMOS) process led to a chip core area of 3.88 mm2 and a power consumption of 1.91 mW when operating at 10 MHz clock frequency.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Efficient Deep-Learning Speech Enhancement Algorithms for Hearing Devices
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2023. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10180180
Downloads since deposit
2Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item