UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A Principal Component Analysis (PCA)-based framework for automated variable selection in geodemographic classification

Liu, Y; Singleton, A; Arribas-Bel, D; (2019) A Principal Component Analysis (PCA)-based framework for automated variable selection in geodemographic classification. Geo-Spatial Information Science , 22 (4) pp. 251-264. 10.1080/10095020.2019.1621549. Green open access

[thumbnail of A Principal Component Analysis PCA based framework for automated variable selection in geodemographic classification.pdf]
Preview
Text
A Principal Component Analysis PCA based framework for automated variable selection in geodemographic classification.pdf - Published Version

Download (3MB) | Preview

Abstract

A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography. However, such representations are influenced by the methodological choices made during their construction. Of particular debate are the choice and specification of input variables, with the objective of identifying inputs that add value but also aim for model parsimony. Within this context, our paper introduces a principal component analysis (PCA)-based automated variable selection methodology that has the objective of identifying candidate inputs to a geodemographic classification from a collection of variables. The proposed methodology is exemplified in the context of variables from the UK 2011 Census, and its output compared to the Office for National Statistics 2011 Output Area Classification (2011 OAC). Through the implementation of the proposed methodology, the quality of the cluster assignment was improved relative to 2011 OAC, manifested by a lower total withincluster sum of square score. Across the UK, more than 70.2% of the Output Areas (OAs) occupied by the newly created classification (i.e. AVS-OAC) outperform the 2011 OAC, with particularly strong performance within Scotland and Wales.

Type: Article
Title: A Principal Component Analysis (PCA)-based framework for automated variable selection in geodemographic classification
Open access status: An open access version is available from UCL Discovery
DOI: 10.1080/10095020.2019.1621549
Publisher version: https://doi.org/10.1080/10095020.2019.1621549
Language: English
Additional information: This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: Geodemographics; variable selection; UK census; spatial data mining; principal component analysis
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Civil, Environ and Geomatic Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10115061
Downloads since deposit
144Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item