UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Improving machine translation systems via isotopic replacement

Sun, Zeyu; Zhang, Jie M.; Xiong, Yingfei; Harman, Mark; Papadakis, Mike; Zhang, Lu; (2022) Improving machine translation systems via isotopic replacement. In: Proceedings of the 44th International Conference on Software Engineering (ICSE '22). (pp. pp. 1181-1192). ACM (Association for Computing Machinery): Pittsburgh, PA, USA. Green open access

[thumbnail of Zhang_Improving Machine Translation Systems via Isotopic Replacemen_AAM.pdf]
Preview
Text
Zhang_Improving Machine Translation Systems via Isotopic Replacemen_AAM.pdf

Download (847kB) | Preview

Abstract

Machine translation plays an essential role in people’s daily international communication. However, machine translation systems are far from perfect. To tackle this problem, researchers have proposed several approaches to testing machine translation. A promising trend among these approaches is to use word replacement, where only one word in the original sentence is replaced with another word to form a sentence pair. However, precise control of the impact of word replacement remains an outstanding issue in these approaches. To address this issue, we propose CAT, a novel word-replacement-based approach, whose basic idea is to identify word replacement with controlled impact (referred to as isotopic replacement). To achieve this purpose, we use a neural-based language model to encode the sentence context, and design a neural-network-based algorithm to evaluate context-aware semantic similarity between two words. Furthermore, similar to TransRepair, a state-of-the-art word-replacement-based approach, CAT also provides automatic fixing of revealed bugs without model retraining. Our evaluation on Google Translate and Transformer indicates that CAT achieves significant improvements over TransRepair. In particular, 1) CAT detects seven more types of bugs than TransRepair; 2) CAT detects 129% more translation bugs than TransRepair; 3) CAT repairs twice more bugs than TransRepair, many of which may bring serious consequences if left unfixed; and 4) CAT has better efficiency than TransRepair in input generation (0.01s v.s. 0.41s) and comparable efficiency with TransRepair in bug repair (1.92s v.s. 1.34s).

Type: Proceedings paper
Title: Improving machine translation systems via isotopic replacement
Event: The 44th International Conference on Software Engineering (ICSE '22)
Dates: 21st-29th May 2022
ISBN-13: 978-1-4503-9221-1
Open access status: An open access version is available from UCL Discovery
DOI: 10.1145/3510003.3510206
Publisher version: https://doi.org/10.1145/3510003.3510206
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
Keywords: machine translation, testing and repair, machine learning testing, neural networks
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10149498
Downloads since deposit
Loading...
211Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item