UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance

Lampos, V; Zou, B; Cox, IJ; (2017) Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance. In: Proceedings of the 26th International Conference on World Wide Web. (pp. pp. 695-704). ACM: New York, NY, USA. Green open access

[img] Text
lampos2017www.pdf - ["content_typename_Published version" not defined]

Download (1MB)

Abstract

Health surveillance systems based on online user-generated content often rely on the identification of textual markers that are related to a target disease. Given the high volume of available data, these systems benefit from an automatic feature selection process. This is accomplished either by applying statistical learning techniques, which do not consider the semantic relationship between the selected features and the inference task, or by developing labour-intensive text classifiers. In this paper, we use neural word embeddings, trained on social media content from Twitter, to determine, in an unsupervised manner, how strongly textual features are semantically linked to an underlying health concept. We then refine conventional feature selection methods by a priori operating on textual variables that are sufficiently close to a target concept. Our experiments focus on the supervised learning problem of estimating influenza-like illness rates from Google search queries. A "flu infection" concept is formulated and used to reduce spurious and potentially confounding features that were selected by previously applied approaches. In this way, we also address forms of scepticism regarding the appropriateness of the feature space, alleviating potential cases of overfitting. Ultimately, the proposed hybrid feature selection method creates a more reliable model that, according to our empirical analysis, improves the inference performance (Mean Absolute Error) of linear and nonlinear regressors by 12% and 28.7%, respectively.

Type: Proceedings paper
Title: Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance
Event: WWW '17 26th International Conference on World Wide Web
Location: Perth, Australia
ISBN-13: 978-1-4503-4913-0
Open access status: An open access version is available from UCL Discovery
DOI: 10.1145/3038912.3052622
Publisher version: http://dx.doi.org/10.1145/3038912.3052622
Language: English
Additional information: @ 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License.
Keywords: Computational Health, Influenza-like Illness, User-Generated Content, Search Query Logs, Feature Selection, Word Embeddings, Regularised Regression, Gaussian Processes
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: http://discovery.ucl.ac.uk/id/eprint/1549538
Downloads since deposit
152Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item