UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Translating and evaluating historic phenotyping algorithms using SNOMED CT

Elkheder, Musaab; Gonzalez-Izquierdo, Arturo; Qummer Ul Arfeen, Muhammad; Kuan, Valerie; Lumbers, R Thomas; Denaxas, Spiros; Shah, Anoop D; (2022) Translating and evaluating historic phenotyping algorithms using SNOMED CT. Journal of the American Medical Informatics Association 10.1093/jamia/ocac158. (In press). Green open access

[thumbnail of ocac158.pdf]
Preview
Text
ocac158.pdf - Published Version

Download (840kB) | Preview

Abstract

OBJECTIVE: Patient phenotype definitions based on terminologies are required for the computational use of electronic health records. Within UK primary care research databases, such definitions have typically been represented as flat lists of Read terms, but Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) (a widely employed international reference terminology) enables the use of relationships between concepts, which could facilitate the phenotyping process. We implemented SNOMED CT-based phenotyping approaches and investigated their performance in the CPRD Aurum primary care database. MATERIALS AND METHODS: We developed SNOMED CT phenotype definitions for 3 exemplar diseases: diabetes mellitus, asthma, and heart failure, using 3 methods: "primary" (primary concept and its descendants), "extended" (primary concept, descendants, and additional relations), and "value set" (based on text searches of term descriptions). We also derived SNOMED CT codelists in a semiautomated manner for 276 disease phenotypes used in a study of health across the lifecourse. Cohorts selected using each codelist were compared to "gold standard" manually curated Read codelists in a sample of 500 000 patients from CPRD Aurum. RESULTS: SNOMED CT codelists selected a similar set of patients to Read, with F1 scores exceeding 0.93, and age and sex distributions were similar. The "value set" and "extended" codelists had slightly greater recall but lower precision than "primary" codelists. We were able to represent 257 of the 276 phenotypes by a single concept hierarchy, and for 135 phenotypes, the F1 score was greater than 0.9. CONCLUSIONS: SNOMED CT provides an efficient way to define disease phenotypes, resulting in similar patient populations to manually curated codelists.

Type: Article
Title: Translating and evaluating historic phenotyping algorithms using SNOMED CT
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/jamia/ocac158
Publisher version: https://doi.org/10.1093/jamia/ocac158
Language: English
Additional information: © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/).
Keywords: terminology, phenotype, electronic health records, ontology, SNOMED CT
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Infectious Disease Informatics
URI: https://discovery.ucl.ac.uk/id/eprint/10155637
Downloads since deposit
198Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item