eprintid: 10065139 rev_number: 17 eprint_status: archive userid: 608 dir: disk0/10/06/51/39 datestamp: 2019-01-09 13:10:13 lastmod: 2021-12-06 00:39:53 status_changed: 2019-01-09 13:10:13 type: article metadata_visibility: show creators_name: Ferrucci, F creators_name: Salza, P creators_name: Sarro, F title: Using Hadoop MapReduce for parallel genetic algorithms: A comparison of the global, grid and island models ispublished: pub divisions: UCL divisions: B04 divisions: C05 divisions: F48 keywords: Genetic algorithms, parallel genetic algorithms, Hadoop MapReduce, global model, grid model, island model, fault prediction. note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. abstract: The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs’ computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider. date: 2018-12 date_type: published official_url: https://doi.org/10.1162/evco_a_00213 oa_status: green full_text_type: other language: eng primo: open primo_central: open_green verified: verified_manual elements_id: 1614713 doi: 10.1162/EVCO_a_00213 lyricists_name: Sarro, Federica lyricists_id: FSSAR72 actors_name: Sarro, Federica actors_id: FSSAR72 actors_role: owner full_text_status: public publication: Evolutionary Computation volume: 26 number: 4 pagerange: 535-567 issn: 1063-6560 citation: Ferrucci, F; Salza, P; Sarro, F; (2018) Using Hadoop MapReduce for parallel genetic algorithms: A comparison of the global, grid and island models. Evolutionary Computation , 26 (4) pp. 535-567. 10.1162/EVCO_a_00213 <https://doi.org/10.1162/EVCO_a_00213>. Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/10065139/1/ECJ2017.pdf