Abdullah, Salinna;
Zamani, Majid;
Demosthenous, Andreas;
(2022)
A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function And Region-Aware Convolution.
IEEE Access
, 10
pp. 130657-130671.
10.1109/access.2022.3228744.
Preview |
Text
Binti Abdullah_A_Compact_CNN-Based_Speech_Enhancement_With_Adaptive_Filter_Design_Using_Gabor_Function_and_Region-Aware_Convolution_VoR.pdf Download (4MB) | Preview |
Abstract
Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named ‘CNN-AFD’) using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., ‘region-aware’) while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at −5, 0 and 5 dB signal-to-noise ratios (SNRs).
Type: | Article |
---|---|
Title: | A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function And Region-Aware Convolution |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/access.2022.3228744 |
Publisher version: | https://doi.org/10.1109/access.2022.3228744 |
Language: | English |
Additional information: | This is an Open Access article published under a Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). |
Keywords: | Adaptive filter design, activation analysis, convolutional neural network, Gabor filter, pruning, skip convolution, speech enhancement |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng |
URI: | https://discovery.ucl.ac.uk/id/eprint/10162098 |




Archive Staff Only
![]() |
View Item |