UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Predicting Customer Preferences in Non-experimental Retail Settings

Chan, SMH; (2015) Predicting Customer Preferences in Non-experimental Retail Settings. Doctoral thesis , UCL (University College London).

Full text not available from this repository.


This thesis investigates the application of computational statistics and Machine Learning in consumer preference prediction, with specific reference to the challenges imposed by real world operational retail environments. Some retailers base their competitiveness on Machine Learning. For instance, Dunnhumby analyzes more than 400 million online consumer records for retailers, such as Tesco, to optimise business decisions. The experiments in this thesis investigate three main challenges commonly presented in operational scenarios that hinder the application of Machine Learning in retail environments: 1. The measurement of correlation of feature factors for Machine Learning in a noisy setting; 2. The exploration and exploitation balance for predicting purchase preferences on new products; 3. The model adaptability to the changing dynamics over time. A design of a distributed Machine Learning framework for building practical applications of consumer preference prediction is also presented. Experiment 1: Correlation between Contextual Information and Purchase Behaviours under a Non-experimental Retail Environment The first experiment applies statistical methodologies, namely odds ratio and Mantel-Haenszel method, to analyze contextual information in a retail business. More specifically, it investigates the correlation between customers’ recent online browsing behaviours on Boots.com and their in-store purchase behaviours at Boots’ retail stores nationally in a non-experimental noisy setting. Methodologies such as stratified analysis with K-means clustering are proposed to detect and eliminate confounding factors that affect the evaluation of the correlation. The dataset for this experiment, provided by Boots UK, is the first year of a 2-year anonymised real in-store and online purchase records data. It contains profiles of 10,217,972 unique consumers who are Advantage Card holders and 2,939 unique selected products under 10 different brands. Experiment 2: Resources Allocation of Exploration and Exploitation for New Products under Retail Constraints The second experiment provides a two-stage batch solution based on matrix factorization and binary integer programming to optimise the customer response rate to new products of a simulated group buying system. This experiment investigates how the balance between the exploration of new products and the exploitation of existing known model affects overall business gains through purchase prediction and recommendation. In this experiment, the products are new with no prior profile and the number of new products a retailer can recommend to each customer is limited. The effectiveness of one of the traditional experimental design techniques in improving the learning efficiency during the exploration process is evaluated. Experiment 3: Continuous Model Selection for a Changing Retail Environment The third experiment investigates, using root-mean-square error and mean average precision measures, the adaptability of data model for consumer purchase prediction in a non-static retail environment. In particular, it analyzes the prediction accuracy of data models with static parameters over time. A continuous model selection approach by using an automatic hyperparameter tuning technique, namely random search, is proposed and is evaluated. The results challenge the traditional assumption that a one-off initial model selection is sufficient. The dataset for this experiment is a 2-year anonymised real in-store and online purchase records data provided by Boots UK. System Design: A Distributed Machine Learning Framework with Automated Modeling This system design outlines the concept and system architecture. It also demonstrates scenarios of a distributed Machine Learning framework for (i) evaluating, comparing and deploying scalable learning algorithms, (ii) tuning hyperparameters of algorithms manually or automatically and (iii) evaluating model training status. The design has become the foundation of a popular open-source software project - PredictionIO. The project is followed by over 5000 data scientists and practitioners on Github. Contributions to Science The major contribution of this thesis is to offer robust research-based methodologies to handle prediction challenges in real world operational environment for retail businesses. Computational statistics and Machine Learning methodologies are proposed to 1) identify contextual factors that are relevant to consumer preference in noisy non-experimental setting; 2) determine the importance of exploration and exploitation for new products under real-world constraints; 3) adjust data model continuously to adapt to changes in retail environments. This thesis contributes to the existing literature in a number of ways. First, this research proposes a novel statistical method to isolates the influence of confounding factors in correlation analysis for consumer preference prediction. It is a topic that received little attention in empirical literature. Second, this research proves the existence of the correlation between consumer online browsing and in-store purchase behaviours in a real retail dataset. This is a significant finding for the retail industry to improve prediction accuracy in the future. Third, this research examines the influence of the balance between exploration and exploitation of new product profiles on maximising business gains. Forth, this research proves that random selection surprisingly outperforms D-optimal experimental design in some retail cases. Fifth, this research challenges the existing assumption that model selection is needed only once at the initial stage. It proves that prediction accuracy can be improved significantly by continuous model selection. Sixth, this thesis presents the implementation of a continuous model selection approach by using automatic hyperparameter tuning techniques. Finally, this thesis presents a design of a distributed system that can be used for building predictive retail applications.

Type: Thesis (Doctoral)
Title: Predicting Customer Preferences in Non-experimental Retail Settings
Language: English
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
URI: https://discovery.ucl.ac.uk/id/eprint/1468426
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item