eprintid: 10045161
rev_number: 22
eprint_status: archive
userid: 608
dir: disk0/10/04/51/61
datestamp: 2018-03-13 12:38:53
lastmod: 2021-12-27 23:03:22
status_changed: 2020-03-06 10:20:01
type: article
metadata_visibility: show
creators_name: Wan, C
creators_name: Freitas, AA
title: An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features
ispublished: pub
divisions: UCL
divisions: B04
divisions: C05
keywords: Hierarchical feature selection; Classification; Machine learning; Data mining; Bayesian classifiers; K-Nearest Neighbors; Biology of aging
note: © Springer Science+Business Media Dordrecht 2017. This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
abstract: Hierarchical feature selection is a new research area in machine learning/data mining, which consists of performing feature selection by exploiting dependency relationships among hierarchically structured features. This paper evaluates four hierarchical feature selection methods, i.e., HIP, MR, SHSEL and GTD, used together with four types of lazy learning-based classifiers, i.e., Naïve Bayes, Tree Augmented Naïve Bayes, Bayesian Network Augmented Naïve Bayes and k-Nearest Neighbors classifiers. These four hierarchical feature selection methods are compared with each other and with a well-known “flat” feature selection method, i.e., Correlation-based Feature Selection. The adopted bioinformatics datasets consist of aging-related genes used as instances and Gene Ontology terms used as hierarchical features. The experimental results reveal that the HIP (Select Hierarchical Information Preserving Features) method performs best overall, in terms of predictive accuracy and robustness when coping with data where the instances’ classes have a substantially imbalanced distribution. This paper also reports a list of the Gene Ontology terms that were most often selected by the HIP method.
date: 2018-08
date_type: published
official_url: http://doi.org/10.1007/s10462-017-9541-y
oa_status: green
full_text_type: other
language: eng
primo: open
primo_central: open_green
article_type_text: Article
verified: verified_manual
elements_id: 1457370
doi: 10.1007/s10462-017-9541-y
lyricists_name: Wan, Cen
lyricists_id: CWANX32
actors_name: Wan, Cen
actors_id: CWANX32
actors_role: owner
full_text_status: public
publication: Artificial Intelligence Review
volume: 50
pagerange: 201-240
citation:        Wan, C;    Freitas, AA;      (2018)    An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features.                   Artificial Intelligence Review , 50    pp. 201-240.    10.1007/s10462-017-9541-y <https://doi.org/10.1007/s10462-017-9541-y>.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10045161/1/AI-Review-Wan.pdf