UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function And Region-Aware Convolution

Abdullah, Salinna; Zamani, Majid; Demosthenous, Andreas; (2022) A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function And Region-Aware Convolution. IEEE Access , 10 pp. 130657-130671. 10.1109/access.2022.3228744. Green open access

[thumbnail of Binti Abdullah_A_Compact_CNN-Based_Speech_Enhancement_With_Adaptive_Filter_Design_Using_Gabor_Function_and_Region-Aware_Convolution_VoR.pdf]
Preview
Text
Binti Abdullah_A_Compact_CNN-Based_Speech_Enhancement_With_Adaptive_Filter_Design_Using_Gabor_Function_and_Region-Aware_Convolution_VoR.pdf

Download (4MB) | Preview

Abstract

Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named ‘CNN-AFD’) using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., ‘region-aware’) while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at −5, 0 and 5 dB signal-to-noise ratios (SNRs).

Type: Article
Title: A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function And Region-Aware Convolution
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/access.2022.3228744
Publisher version: https://doi.org/10.1109/access.2022.3228744
Language: English
Additional information: This is an Open Access article published under a Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/).
Keywords: Adaptive filter design, activation analysis, convolutional neural network, Gabor filter, pruning, skip convolution, speech enhancement
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10162098
Downloads since deposit
65Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item