Fast RNN Inference on an FPGA

Chaitanya Paikara; Philip Harris; Scott Hauck; Shih-Chieh Hsu; Richa Rao; Sioni Summers

Fast RNN Inference on an FPGA

ORAL

Abstract

In this work, we will present the implementation templates for two types of recurrent neural network layers within the HLS4ML library – Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). These templates provide the lower-level hardware implementation for a neural network model based on these layers, allowing them to be mapped to an FPGA. Using the HLS4ML library, the latency per inference and resource utilization can be adjusted for the targeted FPGA, and the application requirements. Several particle physics problems are used to characterize the template and test its efficiency after the High-Level Synthesis. Design space exploration was performed across different features - resource utilization, latency, model performance, and fixed-point precision. As an example, LSTM and GRU based models for the task of jet identification on simulated proton-proton collision at the Large Hadron Collider were considered. Also, the implementation templates were evaluated against varying numbers of model parameters, and synthesized for larger neural network models based on LSTM and external recursion for jet flavor classification in high energy collision for an FPGA.

April 18, 2021, 3:42 PM – April 18, 2021, 3:54 PM

Authors

Chaitanya Paikara

University of Washington
Philip Harris

Massachusetts Institute of Technology
Scott Hauck

University of Washington
Shih-Chieh Hsu

University of Washington, Department of Physics, University of Washington, Seattle, Washington 98195, USA
Richa Rao

University of Washington
Sioni Summers

European Organization for Nuclear Research (CERN)