UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Disease Surveillance using User-generated Content

Zou, Bin; (2019) Disease Surveillance using User-generated Content. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of bin_zou_phd_thesis.pdf]
Preview
Text
bin_zou_phd_thesis.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Disease surveillance plays a crucial role in detecting or anticipating infectious disease outbreaks. It tracks health-related data from a population to identify and monitor early outbreaks of a disease. Traditional disease surveillance requires a widespread network of sentinel sites to track infections throughout the population. These networks are time and labour intensive to build and maintain, and this creates opportunities for utilizing online user-generated content. Compared to traditional data sources, online user-generated content is fast and cheap to obtain. It covers a larger population, and provides data on topics with little coverage from traditional sources. This can complement traditional disease surveillance systems. In this thesis, we focus on improving disease surveillance using online user-generated content, through machine learning and natural language processing techniques. Our contributions are threefold. First, a feature selection method, which consists of a time series similarity filter and a topic filter, is proposed. The former filter ensures the selected features are good predictors, while the topic filter succeeds in eliminating features that may be highly correlated with disease rates, but are not referring to the target disease. Second, a multi-task learning framework for disease surveillance is proposed, where several disease surveillance models are jointly trained. Multi-task elastic net and multi-task Gaussian Processes are used for regression. The framework improves the generalization of a model by taking advantage of shared structures in the data. Third, a transfer learning framework is proposed for delivering accurate disease rate models without the existence of ground truth information for a target location. The framework consists of three steps: (1) learn a regularized regression model for a source country, (2) map the source queries to target ones using semantic and temporal similarity metrics, and (3) re-adjust the weights of the target queries. To support the theoretical derivations, extensive and repeatable experiments are carried out based on large-scale real-world data. Experimental results have demonstrated substantial improvement of the proposed solutions over strong baselines. In addition, we publish a website that reports real-time flu rate estimation in England (https://fludetector.cs.ucl.ac.uk/).

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Disease Surveillance using User-generated Content
Event: UCL (University College London)
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2019. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
URI: https://discovery.ucl.ac.uk/id/eprint/10067579
Downloads since deposit
288Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item