UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Data mining information from electronic health records produced high yield and accuracy for current smoking status

Groenhof, TKJ; Koers, LR; Blasse, E; de Groot, M; Grobbee, DE; Bots, ML; Asselbergs, FW; ... UCC-CVRM Study Groups, .; + view all (2020) Data mining information from electronic health records produced high yield and accuracy for current smoking status. Journal of Clinical Epidemiology , 118 pp. 100-106. 10.1016/j.jclinepi.2019.11.006. Green open access

[thumbnail of 1-s2.0-S0895435619304846-main.pdf]
Preview
Text
1-s2.0-S0895435619304846-main.pdf - Published Version

Download (622kB) | Preview

Abstract

OBJECTIVES: Researchers are increasingly using routine clinical data for care evaluations and feedback to patients and clinicians. The quality of these evaluations depends on the quality and completeness of the input data. STUDY DESIGN AND SETTING: We assessed the performance of an electronic health record (EHR)-based data mining algorithm, using the example of the smoking status in a cardiovascular population. As a reference standard, we used the questionnaire from the Utrecht Cardiovascular Cohort (UCC). To assess diagnostic accuracy, we calculated sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). RESULTS: We analyzed 1,661 patients included in the UCC to January 18, 2019. Of those, 14% (n = 238) had missing information on smoking status in the UCC questionnaire. Data mining provided information on smoking status in 99% of the 1,661 participants. Diagnostic accuracy for current smoking was sensitivity 88%, specificity 92%, NPV 98%, and PPV 63%. From false positives, 85% reported they had quit smoking at the time of the UCC. CONCLUSION: Data mining showed great potential in retrieving information on smoking (a near complete yield). Its diagnostic performance is good for negative smoking statuses. The implications of misclassification with data mining are dependent on the application of the data.

Type: Article
Title: Data mining information from electronic health records produced high yield and accuracy for current smoking status
Location: United States
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.jclinepi.2019.11.006
Publisher version: https://doi.org/10.1016/j.jclinepi.2019.11.006
Language: English
Additional information: © 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: Data mining, Data quality, Electronic health records, Learning healthcare system, Routine clinical data, Text mining
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
URI: https://discovery.ucl.ac.uk/id/eprint/10089072
Downloads since deposit
78Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item