UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Towards automated clinical coding

Catling, F; Spithourakis, GP; Riedel, S; (2018) Towards automated clinical coding. International Journal of Medical Informatics , 120 pp. 50-61. 10.1016/j.ijmedinf.2018.09.021. Green open access

[thumbnail of Catling_Towards automated clinical coding_AAM.pdf]
Catling_Towards automated clinical coding_AAM.pdf - Accepted Version

Download (806kB) | Preview


BACKGROUND: Patients’ encounters with healthcare services must undergo clinical coding. These codes are typically derived from free-text notes. Manual clinical coding is expensive, time-consuming and prone to error. Automated clinical coding systems have great potential to save resources, and realtime availability of codes would improve oversight of patient care and accelerate research. Automated coding is made challenging by the idiosyncrasies of clinical text, the large number of disease codes and their unbalanced distribution. METHODS: We explore methods for representing clinical text and the labels in hierarchical clinical coding ontologies. Text is represented as term frequency-inverse document frequency counts and then as word embeddings, which we use as input to recurrent neural networks. Labels are represented atomically, and then by learning representations of each node in a coding ontology and composing a representation for each label from its respective node path. We consider different strategies for initialisation of the node representations. We evaluate our methods using the publicly-available Medical Information Mart for Intensive Care III dataset: we extract the history of presenting illness section from each discharge summary in the dataset, then predicting the International Classification of Diseases, ninth revision, Clinical Modification codes associated with these. RESULTS: Composing the label representations from the clinical-coding-ontology nodes increased weighted F1 for prediction of the 17,561 disease labels to 0.264–0.281 from 0.232–0.249 for atomic representations. Recurrent neural network text representation improved weighted F1 for prediction of the 19 disease-category labels to 0.682–0.701 from 0.662–0.682 using term frequency-inverse document frequency. However, term frequency-inverse document frequency outperformed recurrent neural networks for prediction of the 17,561 disease labels. CONCLUSIONS: This study demonstrates that hierarchically-structured medical knowledge can be incorporated into statistical models, and produces improved performance during automated clinical coding. This performance improvement results primarily from improved representation of rarer diseases. We also show that recurrent neural networks improve representation of medical text in some settings. Learning good representations of the very rare diseases in clinical coding ontologies from data alone remains challenging, and alternative means of representing these diseases will form a major focus of future work on automated clinical coding.

Type: Article
Title: Towards automated clinical coding
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.ijmedinf.2018.09.021
Publisher version: https://doi.org/10.1016/j.ijmedinf.2018.09.021
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Clinical coding, Recurrent neural networks, Hierarchical representation learning, Knowledge representation, Natural language processing, Machine learning
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10061782
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item