UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility

Vega Carrasco, Mariflor; Manolopoulou, Ioanna; O'Sullivan, Jason; Prior, Rosie; Musolesi, Mirco; (2022) Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility. Journal of the Royal Statistical Society: Series C (Applied Statistics) 10.1111/rssc.12546. (In press). Green open access

[thumbnail of Royal Stata Society Series C - 2022 - Vega Carrasco - Posterior summaries of grocery retail topic models  Evaluation .pdf]
Preview
Text
Royal Stata Society Series C - 2022 - Vega Carrasco - Posterior summaries of grocery retail topic models Evaluation .pdf - Published Version

Download (3MB) | Preview

Abstract

Understanding the shopping motivations behind market baskets has significant commercial value for the grocery retail industry. The analysis of shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while delivering interpretable outcomes. Latent Dirichlet allocation (LDA) allows processing grocery transactions and the discovering of customer behaviours. Interpretations of topic models typically exploit individual samples overlooking the uncertainty of single topics. Moreover, training LDA multiple times show topics with large uncertainty, that is, topics (dis)appear in some but not all posterior samples, concurring with various authors in the field. In response, we introduce a clustering methodology that post-processes posterior LDA draws to summarise topic distributions represented as recurrent topics. Our approach identifies clusters of topics that belong to different samples and provides associated measures of uncertainty for each group. Our proposed methodology allows the identification of an unconstrained number of customer behaviours presented as recurrent topics. We also establish a more holistic framework for model evaluation, which assesses topic models based not only on their predictive likelihood but also on quality aspects such as coherence and distinctiveness of single topics and credibility of a set of topics. Using the outcomes of a tailored survey, we set thresholds that aid in interpreting quality aspects in grocery retail data. We demonstrate that selecting recurrent topics not only improves predictive likelihood but also outperforms interpretability and credibility. We illustrate our methods with an example from a large British supermarket chain.

Type: Article
Title: Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility
Open access status: An open access version is available from UCL Discovery
DOI: 10.1111/rssc.12546
Publisher version: https://doi.org/10.1111/rssc.12546
Language: English
Additional information: This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third-party material in this article are included in the Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10147095
Downloads since deposit
197Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item