eprintid: 10091486 rev_number: 19 eprint_status: archive userid: 608 dir: disk0/10/09/14/86 datestamp: 2020-02-18 15:34:10 lastmod: 2021-11-08 00:04:38 status_changed: 2020-02-18 15:34:10 type: proceedings_section metadata_visibility: show creators_name: Garcia, MS creators_name: Agarwal, B creators_name: Mookerjee, RP creators_name: Jalan, R creators_name: Doyle, G creators_name: Ranco, G creators_name: Arroyo, V creators_name: Pavesi, M creators_name: Garcia, E creators_name: Saliba, F creators_name: Banares, R creators_name: Fernandez, J title: An Accurate Data Preparation Approach for the Prediction of Mortality in ACLF Patients using the CANONIC Dataset ispublished: pub divisions: UCL divisions: B02 divisions: C10 divisions: D17 divisions: G91 note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. abstract: The incidence of chronic liver disease has increased in Europe and can lead to Acute on Chronic Liver Failure (ACLF) which is associated with high levels of mortality due to multisystem organ failure. The characteristics of the ACLF patients can change very rapidly within a short period of time. Continuous assessment of their recovery status is critical for clinicians to adjust and deliver effective treatment. The aim of this paper is to validate the usefulness of a data preparation approach by combining different criteria to replace missing values, balance target-class variables, select useful patient characteristics and optimise hyperparameters of machine learning models for the prediction of ACLF associated mortality rates. A key step in the data preparation is a feature selection Mutual Information (MI) based multivariate approach to build smaller, and yet equally and in some cases more informative, subsets of patient characteristics than those frequently proposed for the prediction of mortality, from patients with ACLF in the CANONIC dataset. The usefulness of the data preparation approach proposed to predict mortality was evaluated by training the XGBoost and Logistic Regression models with the prepared data. Evaluations of the models trained using a test set provided evidence of an overall high accuracy in the prediction of the mortality rates of patients for days after their diagnosis, and in some cases even higher when reduced and more informative subsets of patient characteristics were found. date: 2019-10-07 date_type: published publisher: IEEE official_url: https://doi.org/10.1109/EMBC.2019.8857239 oa_status: green full_text_type: other language: eng primo: open primo_central: open_green verified: verified_manual elements_id: 1744787 doi: 10.1109/EMBC.2019.8857239 lyricists_name: Agarwal, Banwari lyricists_name: Jalan, Rajiv lyricists_name: Mookerjee, Rajeshwar lyricists_id: BAGAR28 lyricists_id: RJALA78 lyricists_id: RPMOO69 actors_name: Stacey, Thomas actors_id: TSSTA20 actors_role: owner full_text_status: public publication: Conf Proc IEEE Eng Med Biol Soc volume: 2019 place_of_pub: Berlin, Germany pagerange: 1371-1377 event_title: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) event_location: United States citation: Garcia, MS; Agarwal, B; Mookerjee, RP; Jalan, R; Doyle, G; Ranco, G; Arroyo, V; ... Fernandez, J; + view all <#> Garcia, MS; Agarwal, B; Mookerjee, RP; Jalan, R; Doyle, G; Ranco, G; Arroyo, V; Pavesi, M; Garcia, E; Saliba, F; Banares, R; Fernandez, J; - view fewer <#> (2019) An Accurate Data Preparation Approach for the Prediction of Mortality in ACLF Patients using the CANONIC Dataset. In: (Proceedings) 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). (pp. pp. 1371-1377). IEEE: Berlin, Germany. Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/10091486/2/Jalan_root.pdf