UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Investigating machine learning methods in recommender systems

Michailidis, Marios; (2017) Investigating machine learning methods in recommender systems. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Full_copy.pdf]
Preview
Text
Full_copy.pdf

Download (4MB) | Preview

Abstract

This thesis investigates the use of machine learning in improving predictions of the top K* product purchases at a particular a retailer. The data used for this research is a freely-available (for research) sample of the retailer’s transactional data spanning a period of 102 weeks and consisting of several million observations. The thesis consists of four key experiments: 1. Univariate Analysis of the Dataset: The first experiment, which is the univariate analysis of the dataset, sets the background to the following chapters. It provides explanatory insight into the customers’ shopping behaviour and identifies the drivers that connect customers and products. Using various behavioural, descriptive and aggregated features, the training dataset for a group of customers is created to map their future purchasing actions for one specific week. The test dataset is then constructed to predict the purchasing actions for the forthcoming week. This constitutes a univariate analysis and the chapter is an introduction to the features included in the subsequent algorithmic processes. 2. Meta-modelling to predict top K products: The second experiment investigates the improvement in predicting the top K products in terms of precision at K (or precision@K) and Area Under Curve (AUC) through meta-modelling. It compares combining a range of common machine learning algorithms of a supervised nature within a meta-modelling framework (where each generated model will be an input to a secondary model) with any single model involved, field benchmark or simple model combination method. 3. Hybrid method to predict repeated, promotion-driven product purchases in an irregular testing environment: The third experiment demonstrates a hybrid methodology of cross validation, modelling and optimization for improving the accuracy of predicting the products the customers of a retailer will buy after havingbought them at least once with a promotional coupon. This methodology is applied in the context of a train and test environment with limited overlap - the test data includes different coupons, different customers and different time periods. Additionally this chapter uses a real life application and a stress-test of the findings in the feature engineering space from experiment 1. It also borrows ideas from ensemble (or meta) modelling as detailed in experiment 2. 4. The StackNet model: The fourth experiment proposes a framework in the form of a scalable version of [Wolpert 1992] stacked generalization being extended through cross validation methods to many levels resembling in structure a fully connected feedforward neural network where the hidden nodes represent complex functions in the form of machine learning models of any nature. The implementation of the model is made available in the Java programming language. The research contribution of this thesis is to improve the recommendation science used in the grocery and Fast Moving Consumer Goods (FMCG) markets. It seeks to identify methods of increasing the accuracy of predicting what customers are going to buy in the future by leveraging up-to-date innovations in machine learning as well as improving current processes in the areas of feature engineering, data pre-processing and ensemble modelling. For the general scientific community this thesis can be exploited to better understand the type of data available in the grocery market and to gain insights into how to structure similar machine learning and analytical projects. The extensive, computational and algorithmic framework that accompanies this thesis is also available for general use as a prototype to solve similar data challenges. References: Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259. Yang, X., Steck, H., Guo, Y., & Liu, Y. (2012). On top-k recommendation using social networks. In Proceedings of the sixth ACM conference on Recommender systems (pp. 67-74). ACM.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Investigating machine learning methods in recommender systems
Event: UCL (University College London)
Open access status: An open access version is available from UCL Discovery
Language: English
Keywords: Machine Learning, Recommenders, Ensemble, Stacked Generalization, StackNet
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
URI: https://discovery.ucl.ac.uk/id/eprint/10031000
Downloads since deposit
3,015Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item