Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs

Liu, Shuanglong; Fan, Hongxiang; Ferianc, Martin; Niu, Xinyu; Shi, Huifeng; Luk, Wayne; (2021) Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs. IEEE Transactions on Neural Networks and Learning Systems 10.1109/tnnls.2021.3055240. (In press). Green open access

Preview

Text
TNNLS2020.pdf - Accepted Version
Download (2MB) | Preview

Abstract

Due to the huge success and rapid development of convolutional neural networks (CNNs), there is a growing demand for hardware accelerators that accommodate a variety of CNNs to improve their inference latency and energy efficiency, in order to enable their deployment in real-time applications. Among popular platforms, field-programmable gate arrays (FPGAs) have been widely adopted for CNN acceleration because of their capability to provide superior energy efficiency and low-latency processing, while supporting high reconfigurability, making them favorable for accelerating rapidly evolving CNN algorithms. This article introduces a highly customized streaming hardware architecture that focuses on improving the compute efficiency for streaming applications by providing full-stack acceleration of CNNs on FPGAs. The proposed accelerator maps most computational functions, that is, convolutional and deconvolutional layers into a singular unified module, and implements the residual and concatenative connections between the functions with high efficiency, to support the inference of mainstream CNNs with different topologies. This architecture is further optimized through exploiting different levels of parallelism, layer fusion, and fully leveraging digital signal processing blocks (DSPs). The proposed accelerator has been implemented on Intel's Arria 10 GX1150 hardware and evaluated with a wide range of benchmark models. The results demonstrate a high performance of over 1.3 TOP/s of throughput, up to 97% of compute [multiply-accumulate (MAC)] efficiency, which outperforms the state-of-the-art FPGA accelerators.

Type:	Article
Title:	Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1109/tnnls.2021.3055240
Publisher version:	https://doi.org/10.1109/TNNLS.2021.3055240
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Hardware, Field programmable gate arrays, Acceleration, Convolution, Computational modeling, Computer architecture, Shape
UCL classification:	UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies UCL > Provost and Vice Provost Offices > UCL SLASH UCL
URI:	https://discovery.ucl.ac.uk/id/eprint/10150065

Downloads since deposit

0Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item