UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Machine Learning Developments in Dependency Modelling and Feature Extraction

Toczydlowska, Dorota; (2020) Machine Learning Developments in Dependency Modelling and Feature Extraction. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Toczydlowska_000_Thesis.pdf]
Preview
Text
Toczydlowska_000_Thesis.pdf

Download (21MB) | Preview

Abstract

Three complementary feature extraction approaches are developed in this thesis which addresses the challenge of dimensionality reduction in the presence of multivariate heavy-tailed and asymmetric distributions. First, we demonstrate how to improve the robustness of the standard Probabilistic Principal Component Analysis by adapting the concept of robust mean and covariance estimation within the standard framework. We then introduce feature extraction methods that extend the standard Principal Component Analysis by exploring distribution-based robustification. This is achieved via Probabilistic Principal Component Analysis (PPCA), in which new, statistically robust variants are derived, also treating missing data. We propose a novel generalisation to the t-Student Probabilistic Principal Component methodology which (1) accounts for asymmetric distribution of the observation data, (2) is a framework for grouped and generalised multiple-degree-of-freedom structures, which provides a more flexible framework to model groups of marginal tail dependence in the observation data, and (3) separates the tail effect of the error terms and factors. The new feature extraction methods are derived in an incomplete data setting to efficiently handle the presence of missing values in the observation vector. We discuss statistical properties of their robustness. In the next part of this thesis, we demonstrate the applicability of feature extraction methods to the statistical analysis of multidimensional dynamics. We introduce the class of Hybrid Factor models that combines classical state-space model formulations with incorporation of exogenous factors. We show how to utilize the information obtained from features extracted using introduced robust PPCA in a modelling framework in a meaningful and parsimonious manner. In the first application study, we show the applicability of robust feature extraction methods in the real data environment of financial markets and combine the obtained results with a stochastic multi-factor panel regression-based state-space model in order to model the dynamic of yield curves, whilst incorporating regression factors. We embed the rank-reduced feature extractions into a stochastic representation of state-space models for yield curve dynamics and compare the results to classical multi-factor dynamic Nelson-Siegel state-space models. This leads to important new representations of yield curve models that can have practical importance for addressing questions of financial stress testing and monetary policy interventions which can efficiently incorporate financial big data. We illustrate our results on various financial and macroeconomic data sets from the Euro Zone and international markets. In the second study, we develop a multi-factor extension of the family of Lee-Carter stochastic mortality models. We build upon the time, period and cohort stochastic model structure to include exogenous observable demographic features that can be used as additional factors to improve model fit and forecasting accuracy. We develop a framework in which (a) we employ projection-based techniques of dimensionality reduction that are amenable to different structures of demographic data; (b) we analyse demographic data sets from the patterns of missingness and the impact of such missingness on the feature extraction; (c) we introduce a class of multi-factor stochastic mortality models incorporating time, period, cohort and demographic features, which we develop within a Bayesian state-space estimation framework. Finally (d) we develop an efficient combined Markov chain and filtering framework for sampling the posterior and forecasting. We undertake a detailed case study on the Human Mortality Database demographic data from European countries and we use the extracted features to better explain the term structure of mortality in the UK over time for male and female populations. This is compared to a pure Lee-Carter stochastic mortality model, demonstrating that our feature extraction framework and consequent multi-factor mortality model improves both in-sample fit and, importantly, out-of-sample mortality forecasts by a non-trivial gain in performance.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Machine Learning Developments in Dependency Modelling and Feature Extraction
Event: UCL (University College London)
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2020. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
URI: https://discovery.ucl.ac.uk/id/eprint/10090828
Downloads since deposit
224Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item