UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Using Hadoop MapReduce for parallel genetic algorithms: A comparison of the global, grid and island models

Ferrucci, F; Salza, P; Sarro, F; (2018) Using Hadoop MapReduce for parallel genetic algorithms: A comparison of the global, grid and island models. Evolutionary Computation , 26 (4) pp. 535-567. 10.1162/EVCO_a_00213. Green open access

[img]
Preview
Text
ECJ2017.pdf - ["content_typename_Accepted version" not defined]

Download (1MB) | Preview

Abstract

The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs’ computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider.

Type: Article
Title: Using Hadoop MapReduce for parallel genetic algorithms: A comparison of the global, grid and island models
Open access status: An open access version is available from UCL Discovery
DOI: 10.1162/EVCO_a_00213
Publisher version: https://doi.org/10.1162/evco_a_00213
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Genetic algorithms, parallel genetic algorithms, Hadoop MapReduce, global model, grid model, island model, fault prediction.
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: http://discovery.ucl.ac.uk/id/eprint/10065139
Downloads since deposit
14Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item