Liu, Shuanglong;
Fan, Hongxiang;
Ferianc, Martin;
Niu, Xinyu;
Shi, Huifeng;
Luk, Wayne;
(2021)
Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs.
IEEE Transactions on Neural Networks and Learning Systems
10.1109/tnnls.2021.3055240.
(In press).
Preview |
Text
TNNLS2020.pdf - Accepted Version Download (2MB) | Preview |
Abstract
Due to the huge success and rapid development of convolutional neural networks (CNNs), there is a growing demand for hardware accelerators that accommodate a variety of CNNs to improve their inference latency and energy efficiency, in order to enable their deployment in real-time applications. Among popular platforms, field-programmable gate arrays (FPGAs) have been widely adopted for CNN acceleration because of their capability to provide superior energy efficiency and low-latency processing, while supporting high reconfigurability, making them favorable for accelerating rapidly evolving CNN algorithms. This article introduces a highly customized streaming hardware architecture that focuses on improving the compute efficiency for streaming applications by providing full-stack acceleration of CNNs on FPGAs. The proposed accelerator maps most computational functions, that is, convolutional and deconvolutional layers into a singular unified module, and implements the residual and concatenative connections between the functions with high efficiency, to support the inference of mainstream CNNs with different topologies. This architecture is further optimized through exploiting different levels of parallelism, layer fusion, and fully leveraging digital signal processing blocks (DSPs). The proposed accelerator has been implemented on Intel's Arria 10 GX1150 hardware and evaluated with a wide range of benchmark models. The results demonstrate a high performance of over 1.3 TOP/s of throughput, up to 97% of compute [multiply-accumulate (MAC)] efficiency, which outperforms the state-of-the-art FPGA accelerators.
Type: | Article |
---|---|
Title: | Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/tnnls.2021.3055240 |
Publisher version: | https://doi.org/10.1109/TNNLS.2021.3055240 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Hardware, Field programmable gate arrays, Acceleration, Convolution, Computational modeling, Computer architecture, Shape |
UCL classification: | UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies UCL > Provost and Vice Provost Offices > UCL SLASH UCL |
URI: | https://discovery.ucl.ac.uk/id/eprint/10150065 |
Archive Staff Only
View Item |