Perusquia, José A;
Griffin, Jim E;
Villa, Cristiano;
(2025)
Beta-CoRM: A Bayesian Approach for n-gram Profiles Analysis.
Computational Statistics and Data Analysis
(In press).
Text
main.pdf - Accepted Version Access restricted to UCL open access staff until 14 March 2025. Download (1MB) |
Abstract
n-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for n-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.
Type: | Article |
---|---|
Title: | Beta-CoRM: A Bayesian Approach for n-gram Profiles Analysis |
Publisher version: | https://www.sciencedirect.com/journal/computationa... |
Language: | English |
Additional information: | This version is the author-accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Bayesian statistics, feature selection, labeled data ,n-grams, supervised learning |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10196456 |
Archive Staff Only
View Item |