Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2020
Speculative TLB and nearest-entry TLB translation prediction by exploiting contiguity in physical address allocation.
Download here
Published in ISLPED: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, 2022
Proposed a unified architecture for FECs and implemented in ASIC.
Download here
Published in IEEE Computer Architecture Letters, 2023
Recent CPU microarchitectural attacks utilize contention over the NoC to mount covert and side-channel attacks on multicore CPUs and leak information from victim applications. We propose NoIR, a dynamic LLC slice selection mechanism using slice remapping to obfuscate interconnect contention patterns. NoIR reduces contention variance by 92.18% and mean IPC degradation due to cache invalidation is limited to 7.38% for SPEC CPU 2017 benchmarks for a 1000-access threshold. While previous defenses focused on redesigning the NoC and routing algorithms, we show that a top-down system-level approach can significantly raise the bar for a NoC security vulnerability with minimal modifications to the NoC hardware.
Download here
Published in SC-W 23: Proceedings of the SC 23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, 2023
Heterogeneous Intellectual Property (IP) hardware acceleration engines have emerged as a viable path forward to improving performance in the waning of Moore’s Law and Dennard scaling. In this study, we design, prototype, and evaluate the HPC-specialized ZHW floating point compression accelerator as a resource on a System on Chip (SoC). Our full hardware/software implementation and evaluation reveal inefficiencies at the system level that significantly throttle the potential speedup of the ZHW accelerator. By optimizing data movement between CPU, memory, and accelerator, 6.9X is possible compared to a RISC-V64 core, and 2.9X over a Mac M1 ARM core.
Download here