Giollo, M;
Jones, DT;
Carraro, M;
Leonardi, E;
Ferrari, C;
Tosatto, SC;
(2017)
Crohn Disease Risk Prediction-Best Practices and Pitfalls with Exome Data.
Human Mutation
, 38
(9)
pp. 1193-1200.
10.1002/humu.23177.
Preview |
Text
humu23177.pdf - Accepted Version Download (1MB) | Preview |
Abstract
The Critical Assessment of Genome Interpretation (CAGI) experiment is the first attempt to evaluate the state-of-the-art in genetic data interpretation. Among the proposed challenges, Crohn disease (CD) risk prediction has become the most classic problem spanning three editions. The scientific question is very hard: can anybody assess the risk to develop CD given the exome data alone? This is one of the ultimate goals of genetic analysis, which motivated most CAGI participants to look for powerful new methods. In the 2016 CD challenge we implemented all the best methods proposed in the past editions. This resulted in 10 algorithms, which were evaluated fairly by CAGI organizers. We also used all the data available from CAGI 11 and 13 to maximize the amount of training samples. The most effective algorithms used known genes associated with CD from the literature. No method could evaluate effectively the importance of unannotated variants by using heuristics. As a downside, all CD datasets were strongly affected by sample stratification. This affected the performance reported by assessors. Therefore, we expect that future datasets will be normalized in order to remove population effects. This will improve methods comparison and promote algorithms focused on causal variants discovery.
Type: | Article |
---|---|
Title: | Crohn Disease Risk Prediction-Best Practices and Pitfalls with Exome Data |
Location: | United States |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1002/humu.23177 |
Publisher version: | http://dx.doi.org/10.1002/humu.23177 |
Language: | English |
Additional information: | This is the peer reviewed version of the following article: Giollo, M., Jones, D. T., Carraro, M., Leonardi, E., Ferrari, C. and Tosatto, S. C.E. (2017), Crohn Disease Risk Prediction–Best Practices and Pitfalls with Exome Data. Human Mutation. Accepted Author Manuscript. doi:10.1002/humu.23177, which has been published in final form at http://dx.doi.org/10.1002/humu.23177. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving. |
Keywords: | Crohn Disease, Disease Risk Prediction, Exome Data, Genetic Analysis, Linear Models, Machine Learning, Methods Comparison, Next-Generation Sequencing, SNV evaluation, Variants prioritization |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/1537232 |
Archive Staff Only
View Item |