eprintid: 10123829 rev_number: 13 eprint_status: archive userid: 608 dir: disk0/10/12/38/29 datestamp: 2021-03-12 14:59:51 lastmod: 2022-06-19 06:10:19 status_changed: 2021-03-12 14:59:51 type: article metadata_visibility: show creators_name: Gkioulekas, I creators_name: Papageorgiou, LG title: Tree regression models using statistical testing and mixed integer programming ispublished: pub divisions: UCL divisions: A01 divisions: B04 divisions: C05 divisions: F43 keywords: Mathematical programming; Regression analysis; Decision trees; Subset selection; Optimisation note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. abstract: Regression analysis is a statistical procedure that fits a mathematical function to a set of data in order to capture the relationship between dependent and independent variables. In tree regression, tree structures are constructed by repeated splits of the input space into two subsets, creating if-then-else rules. Such models are popular in the literature due to their ability to be computed quickly and their simple interpretations. This work introduces a tree regression algorithm that exploits an optimisation model of an existing literature method called Mathematical Programming Tree (MPtree) to optimally split nodes into subsets and applies a statistical test to assess the quality of the partitioning. Additionally, an approach of splitting nodes using multivariate decision rules is explored in this work and compared in terms of performance and computational efficiency. Finally, a novel mathematical model is introduced that performs subset selection on each node in order to select an optimal set of variables to considered for splitting, that improves the computational performance of the proposed algorithm. date: 2021-03-18 date_type: published official_url: https://doi.org/10.1016/j.cie.2020.107059 oa_status: green full_text_type: other language: eng primo: open primo_central: open_green verified: verified_manual elements_id: 1845242 doi: 10.1016/j.cie.2020.107059 lyricists_name: Papageorgiou, Lazaros lyricists_id: LPAPA33 actors_name: Papageorgiou, Lazaros actors_id: LPAPA33 actors_role: owner full_text_status: public publication: Computers and Industrial Engineering volume: 153 article_number: 107059 issn: 0360-8352 citation: Gkioulekas, I; Papageorgiou, LG; (2021) Tree regression models using statistical testing and mixed integer programming. Computers and Industrial Engineering , 153 , Article 107059. 10.1016/j.cie.2020.107059 <https://doi.org/10.1016/j.cie.2020.107059>. Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/10123829/1/CAIE_Tree_Regression_Accepted.pdf