UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Mic2Mic: using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems

Mathur, A; Isopoussu, A; Kawsar, F; Berthouze, N; Lane, ND; (2019) Mic2Mic: using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems. In: Eskicioglu, R and Mottola, L and Priyantha, B, (eds.) IPSN '19: Proceedings of the 18th International Conference on Information Processing in Sensor Networks. (pp. pp. 169-180). Association for Computing Machinery (ACM): New York, NY, USA. Green open access

[thumbnail of mathur_ipsn19.pdf]
Preview
Text
mathur_ipsn19.pdf - Published Version

Download (17MB) | Preview

Abstract

Mobile and embedded devices are increasingly using microphones and audio-based computational models to infer user context. A major challenge in building systems that combine audio models with commodity microphones is to guarantee their accuracy and robustness in the real-world. Besides many environmental dynamics, a primary factor that impacts the robustness of audio models is microphone variability. In this work, we propose Mic2Mic - a machine-learned system component - which resides in the inference pipeline of audio models and at real-time reduces the variability in audio data caused by microphone-specific factors. Two key considerations for the design of Mic2Mic were: a) to decouple the problem of microphone variability from the audio task, and b) put minimal burden on end-users to provide training data. With these in mind, we apply the principles of cycle-consistent generative adversarial networks (CycleGANs) to learn Mic2Mic using unlabeled and unpaired data collected from different microphones. Our experiments show that Mic2Mic can recover between 66% to 89% of the accuracy lost due to microphone variability for two common audio tasks.

Type: Proceedings paper
Title: Mic2Mic: using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems
Event: 18th International Conference on Information Processing in Sensor Networks (IPSN '19)
Location: Montreal, Canada
Dates: 16 April 2019 - 18 April 2019
ISBN-13: 978-1-4503-6284-9
Open access status: An open access version is available from UCL Discovery
DOI: 10.1145/3302506.3310398
Publisher version: https://doi.org/10.1145/3302506.3310398
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: GAN, speech models, microphone variability, robustness
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > UCL Interaction Centre
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10075037
Downloads since deposit
21Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item