Anarado, I;
Andreopoulos, Y;
(2016)
Core Failure Mitigation in Integer Sum-of-Product Computations on Cloud Computing Systems.
IEEE Transactions on Multimedia
, 18
(4)
pp. 789-801.
10.1109/TMM.2016.2532603.
Preview |
Text
TMM-Jan-16-6556_2col_1space.pdf - Accepted Version Download (911kB) | Preview |
Abstract
The decreasing mean-time-to-failure estimates in cloud computing systems indicate that multimedia applications running on such environments should be able to mitigate an increasing number of core failures at runtime. We propose a new roll-forward failure-mitigation approach for integer sumof-product computations, with emphasis on generic matrix multiplication (GEMM)and convolution/crosscorrelation (CONV) routines. Our approach is based on the production of redundant results within the numerical representation of the outputs via the use of numerical packing.This differs fromall existing roll-forward solutions that require a separate set of checksum (or duplicate) results. Our proposal imposes 37.5% reduction in the maximum output bitwidth supported in comparison to integer sum-ofproduct realizations performed on 32-bit integer representations which is comparable to the bitwidth requirement of checksummethods for multiple core failure mitigation. Experiments with state-of-the-art GEMM and CONV routines running on a c4.8xlarge compute-optimized instance of amazon web services elastic compute cloud (AWS EC2) demonstrate that the proposed approach is able to mitigate up to one quadcore failure while achieving processing throughput that is: 1) comparable to that of the conventional, failure-intolerant, integer GEMM and CONV routines, 2) substantially superior to that of the equivalent roll-forward failure-mitigation method based on checksum streams. Furthermore, when used within an image retrieval framework deployed over a cluster of AWS EC2 spot (i.e., low-cost albeit terminatable) instances, our proposal leads to: 1) 16%-23% cost reduction against the equivalent checksum-based method and 2) more than 70% cost reduction against conventional failure-intolerant processing on AWS EC2 on-demand (i.e., highercost albeit guaranteed) instances.
Type: | Article |
---|---|
Title: | Core Failure Mitigation in Integer Sum-of-Product Computations on Cloud Computing Systems |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/TMM.2016.2532603 |
Publisher version: | http://dx.doi.org/10.1109/TMM.2016.2532603 |
Language: | English |
Additional information: | Copyright © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Keywords: | integer matrix products, convolution, core failures, multimedia cloud computing |
UCL classification: | UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng |
URI: | https://discovery.ucl.ac.uk/id/eprint/1505955 |
Archive Staff Only
![]() |
View Item |