Koukorinis, Andreas;
(2024)
Kernel-based Machine Learning Methods for High-Frequency Financial Time Series Analysis.
Doctoral thesis (Ph.D), UCL (University College London).
![]() |
Text
Koukorinis_10197136_Thesis.pdf Access restricted to UCL open access staff until 1 October 2025. Download (18MB) |
Abstract
This thesis investigates machine learning and statistical methodologies for modelling time series to classify and examine market dynamics and transitions. The research is motivated by the impact of algorithmic trading as observed in trade-and-quote data (henceforth, TAQ), which introduces challenges in modelling non-stationarities with irregular information flow. Across four chapters, innovative applications for time series modelling are discussed. The focus is on short-term market regime classification and statistical comparison testing on multifractal detrendend fluctuation analysis (MFDFA) statistics, emphasizing the development of adaptive and interpretable frameworks. 1. Methodology Chapter 3 establishes the methodological background of the research topics. We begin with the representation problem, then move to kernel methods and learning paradigms, and finally hidden Markov models. We also discuss algorithms for constructing generative feature spaces, multiple kernel learning approaches, and kernel target alignment. Maximum mean discrepancy is introduced for measuring distribution distances in a reproducing kernel Hilbert space (RKHS). Finally, we discuss long-memory and persistence concepts, including appropriate statistical methods for time series data. This theoretical framework supports the models developed in subsequent case studies in Chapters 5 & 6. Appendix C provides a technical treatment of kernels and related topics. 2. Data and statistical transformations This chapter introduces TAQ data and techniques to analyze such datasets. It presents information clock, an alternative to calendar-based sampling, which samples data 3 4 based on market activity variables. An algorithm is introduced to create appropriate class labels to classify imbalanced data. Visualization techniques such as t-SNE and RadViz are used to demonstrate how kernelization improves classification accuracy. This chapter lays the foundation for subsequent data-driven approaches, highlighting their adaptability to evolving market dynamics and the robustness of our methodology. 3. Generative-discriminative machine learning models for regime classification Chapter 5 presents a novel hybrid learning framework. The proposed HMM-SVM-MKL (hidden Markov kernel machine) methodology uses model-based generative feature embeddings to extract informative features from TAQ data, capturing market dynamics and improving predictive accuracy. This is demonstrated by classifying six intraday regimes using data from 40 FTSE100 stocks. 4. Statistical investigation in memory and persistence in interest rate futures contracts Chapter 6, examines the memory and persistence properties of limit order book-related quantities (from TAQ data) of interest rate futures contracts. Through the use of information clocks, multifractal techniques, and distributional testing, we investigate the statistical properties of these quantities, revealing significant variations in memory exponents, multifractal spectra, and power-law tails. Contributions to science This research contributes to the scientific research of highfrequency financial data modeling and its practical applications, encompassing algorithmic trading strategies, risk management frameworks, and the analysis of microstructure phenomena. We also present the methdology for the data and its motivating literature in Chapter 4. 1. Model-based generative feature embeddings Model-based generative feature embeddings have been developed to extract features from TAQ time series data. These embeddings are designed to capture the critical features of the data, which are essential to incorporate the patterns and behaviors of the market microstructure. This approach improves the interpretability and effectiveness 5 of machine learning models applied to financial time series, enabling more accurate predictions and better insights into market dynamics. 2. Hybrid machine learning approach A hybrid machine learning approach, HMM-SVM-MKL, is proposed, integrating hidden Markov models and kernel learning (single or multiple). This combination leverages the strengths of generative models in capturing temporal dynamics and the discriminative power of kernel-based methods. This framework effectively captures complex market behaviors and improves predictive accuracy, demonstrating its empirical success in classifying six distinct intraday regimes using high-frequency data from 40 FTSE100 stocks. 3. Framework for statistical investigation on long memory and persistence Chapter 6 examines the memory and persistence properties of microstructure-linked variables in interest rate futures contracts. A combination of multifractal detrended fluctuation analysis and kernel two-sample testing tests the distribution of statistical characteristics of limit order book data variables. The results provide insights into long memmory and persistence, enabling Bayesian inferences about the dominant mechanisms governing asset dynamics across trading timeframes. The use of information clocks ensures that the true dynamics of market behavior is captured. This contribution advances the field by offering a structured approach to analyze order book-linked MFDFA data.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Kernel-based Machine Learning Methods for High-Frequency Financial Time Series Analysis |
Language: | English |
Additional information: | Copyright © The Author 2024. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
Keywords: | Kernels, Multifractal Detrended Fluctuation Analysis, Limit Order Book, Algorithmic Trading |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10197136 |




Archive Staff Only
![]() |
View Item |