Accelerator integration in a tile-based SoC: lessons learned with a hardware floating point compression engine

Published in SC-W 23: Proceedings of the SC 23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, 2023

Heterogeneous Intellectual Property (IP) hardware acceleration engines have emerged as a viable path forward to improving performance in the waning of Moore’s Law and Dennard scaling. In this study, we design, prototype, and evaluate the HPC-specialized ZHW floating point compression accelerator as a resource on a System on Chip (SoC). Our full hardware/software implementation and evaluation reveal inefficiencies at the system level that significantly throttle the potential speedup of the ZHW accelerator. By optimizing data movement between CPU, memory, and accelerator, 6.9X is possible compared to a RISC-V64 core, and 2.9X over a Mac M1 ARM core.