UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A Machine Learning view of Distribution Estimation: Efficient Computation of Empirical Proper Losses, Mixed in/out-of-sample Asymptotics of Empirical Proper Losses, a Unified Machine Learning Interface for Density Estimation, and a Systematic Benchmarking Experiment

Toha, Nurul Ain binti; (2022) A Machine Learning view of Distribution Estimation: Efficient Computation of Empirical Proper Losses, Mixed in/out-of-sample Asymptotics of Empirical Proper Losses, a Unified Machine Learning Interface for Density Estimation, and a Systematic Benchmarking Experiment. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Toha_10144288_Thesis.pdf]
Preview
Text
Toha_10144288_Thesis.pdf

Download (1MB) | Preview

Abstract

Probability distribution is a fundamental area in Statistics. It provides an understanding of the behaviour of a dataset. Distribution estimation is a task to estimate the distribution of a dataset. In machine learning, distribution estimation has been viewed as an unsupervised task as it uses unpaired datasets. One of the focuses of this thesis is to frame, explore and investigate distribution estimation as a supervised learning task (Chapter 3). The goal is to learn a function using an unpaired dataset to predict the distribution of the dataset. Loss functions are used to evaluate the accuracy of the prediction with respect to the true value. In the supervised distribution estimation task, a loss function depends on the type of estimator because it compares each input data points with its predicted distribution. Hence, we present an efficient method to derive the analytic expression of three probabilistic loss functions to evaluate the loss of standard kernel and kernel mixture distribution at an observation point (Chapter 5). The method uses the properties of kernel functions and elementary integration. Loss functions are also used for parameter tuning. We investigate the difference in the behaviour of in-sample and out-of-sample empirical loss functions: (1) log-loss; (2) probabilistic squared loss (PSL); using Gaussian kernel PDF estimator as the bandwidth goes to 0 and infinity (Chapter 6). To perform a consistent training, predicting and evaluation steps for distribution estimation in R, we investigate and implement a unified interface for distribution estimation and integrate it into the package mlr3proba (Chapter 7). Lastly, we conduct a benchmarking experiment to compare multiple distribution learners on multiple datasets and evaluate the learners using different log-loss, probabilistic squared loss (PSL) and integrated Brier loss (IBL) (Chapter 8). The best learner with the minimum out-of-sample empirical loss is selected and all the learners will be ranked using the results from evaluation.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: A Machine Learning view of Distribution Estimation: Efficient Computation of Empirical Proper Losses, Mixed in/out-of-sample Asymptotics of Empirical Proper Losses, a Unified Machine Learning Interface for Density Estimation, and a Systematic Benchmarking Experiment
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2022. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10144288
Downloads since deposit
161Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item