Pinto, C;
Pita, R;
Barbosa, G;
Araujo, B;
Bertoldo, J;
Sena, S;
Reis, S;
... Denaxas, S; + view all
(2017)
Probabilistic integration of large Brazilian socioeconomic and clinical databases.
In: Bamidis, PD and Konstantinidis, ST and Rodrigues, PP, (eds.)
Proceedings of the 30th IEEE International Symposium on Computer-Based Medical Systems (CBMS).
(pp. pp. 515-520).
IEEE: New York, USA.
Preview |
Text
CBMS2017_paper_204.pdf - Accepted Version Download (819kB) | Preview |
Abstract
The integration of disparate large and heterogeneous socioeconomic and clinical databases is considered essential to capture and model longitudinal and social aspects of diseases. However, such integration is challenging: databases are stored in disparate locations, make use of different identifiers, have variable data quality, record information in bespoke purpose-specific formats and have different levels of metadata. Novel computational methods are required to integrate them and enable their statistical analyses for epidemiological research purposes. In this paper, we describe a probabilistic approach for constructing a very large population-based cohort comprised of 114 million individuals using linkages between clinical databases from the National Health System and administrative databases from governmental social programmes. We present our data integration model for creating data marts (epidemiological data) and discuss our evaluation results in controlled and uncontrolled scenarios, which demonstrate that our model and tools achieve high accuracy (minimum of 91%) in different probabilistic data integration scenarios.
Type: | Proceedings paper |
---|---|
Title: | Probabilistic integration of large Brazilian socioeconomic and clinical databases |
Event: | 30th IEEE International Symposium on Computer-Based Medical Systems (CBMS), 22-24 June 2017, Thessaloniki, Greece |
Location: | Aristotle Univ Thessaloniki, Thessaloniki, GREECE |
Dates: | 22 June 2017 - 24 June 2017 |
ISBN-13: | 978-1-5386-1711-3 |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/CBMS.2017.64 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Data integration; Probabilistic linkage; Health and social care data; Accuracy assessment |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology |
URI: | https://discovery.ucl.ac.uk/id/eprint/10067325 |
Archive Staff Only
View Item |