TY  - INPR
N1  - © The Authors. Open Access under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/deed.en)
IS  - 1
TI  - Data Note: Alternative Name Encodings - Using Jyutping or Pinyin as tonal
representations of Chinese names for data linkage
AV  - public
VL  - 6
Y1  - 2025/03/14/
JF  - International Journal of Population Data Science
KW  - data linkage; romanisation; linkage errors; data equity
A1  - Lam, Joseph
A1  - Cortina Borja, Mario
A1  - Aldridge, Robert
A1  - Blackburn, ruth
A1  - Harron, Katie
ID  - discovery10205804
N2  - Accurate data linkage across large administrative databases is crucial for addressing
complex research and policy questions, yet linkage errors?stemming from inconsistent name
representations?can introduce biases, predominantly for names not given in English. This data note
examines the impact of romanisation on linkage accuracy, focusing on Chinese names and comparing
standardised systems (Jyutping and Pinyin) with the non-standardised Hong Kong Government
Cantonese Romanisation (HKG-romanisation). We identify three primary issues: language-specific
variations in romanisation, the loss of tonal information inherent to tonal languages, and discrepancies
in name order conventions. Using a dataset of 771 Hong Kong student names, our analysis
reveals that standardised romanisation systems enhance the uniqueness and consistency of name
representations, thereby improving linkage precision and recall compared to HKG-romanisation.
Specifically, Jyutping and Pinyin achieved over 95% recall in blocking strategies, whereas HKGromanisation only reached 68.8%. Incorporating tonal information further improved recall. These
findings underscore the necessity of adopting standardised, tone-sensitive romanisation systems and
flexible database designs to reduce linkage errors and promote data equity for under-represented
groups. We advocate for the implementation of phonetic encodings in databases, alongside
language-specific pre-processing protocols, to ensure more inclusive and accurate data linkage
processes.
SN  - 2399-4908
UR  - https://doi.org/10.23889/ijpds.v6i1.2935
PB  - Swansea University
ER  -