eprintid: 10065139
rev_number: 17
eprint_status: archive
userid: 608
dir: disk0/10/06/51/39
datestamp: 2019-01-09 13:10:13
lastmod: 2021-12-06 00:39:53
status_changed: 2019-01-09 13:10:13
type: article
metadata_visibility: show
creators_name: Ferrucci, F
creators_name: Salza, P
creators_name: Sarro, F
title: Using Hadoop MapReduce for parallel genetic algorithms: A comparison of the global, grid and island models
ispublished: pub
divisions: UCL
divisions: B04
divisions: C05
divisions: F48
keywords: Genetic algorithms, parallel genetic algorithms, Hadoop MapReduce, global model, grid model, island model, fault prediction.
note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
abstract: The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs’ computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider.
date: 2018-12
date_type: published
official_url: https://doi.org/10.1162/evco_a_00213
oa_status: green
full_text_type: other
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 1614713
doi: 10.1162/EVCO_a_00213
lyricists_name: Sarro, Federica
lyricists_id: FSSAR72
actors_name: Sarro, Federica
actors_id: FSSAR72
actors_role: owner
full_text_status: public
publication: Evolutionary Computation
volume: 26
number: 4
pagerange: 535-567
issn: 1063-6560
citation: Ferrucci, F; Salza, P; Sarro, F; (2018) Using Hadoop MapReduce for parallel genetic algorithms: A comparison of the global, grid and island models. Evolutionary Computation , 26 (4) pp. 535-567. 10.1162/EVCO_a_00213 <https://doi.org/10.1162/EVCO_a_00213>. Green open access

document_url: https://discovery.ucl.ac.uk/id/eprint/10065139/1/ECJ2017.pdf