Optimizing Across the Stack: From Software to Near-Memory Acceleration (PhD Final Presentation)
14 July 2025, 12:00, CSIT Level 2 - Systems Area
Speaker:
Zixian Cai
(ANU)
Abstract#
In this talk, I share my journey optimizing low-level systems, leading to three major contributions across the computational stack, including an on-DIMM accelerator. I use garbage collection (GC) as a case study.
My first contribution transforms how we attribute the hard-to-measure overheads of critical functionalities. Attribution of overheads can be very complex due to factors like optimizing compilers, and microarchitectural side effects. The lower bound overhead (LBO) methodology I developed estimates overheads by approximating an ideal baseline. Applied to GC, LBO reveals, for the first time, the substantial costs of software GC, reaffirming the potential of specialized architectural support.During accelerator development, I found that conventional debugging methods, such as sampling and logging, are inadequate. Thus, the second part of my talk explores how kernel-based tracing provides the rapid feedback needed to generate and iterate on new optimization ideas.Lastly, I discuss the accelerator itself. Most near-memory accelerators target numerical workloads, and lack support for essential software tasks like memory management. A key challenge is that the memory is sharded across DRAM components. To address this, I designed a distributed accelerator, and implemented it in RTL. FPGA-accelerated simulation shows that my design achieves up to a 5.5× speedup in GC heap traversal on an 8-rank system.These results demonstrate the power of a cross-stack approach, encouraging the community to rethink system design, evaluation, and optimization as we face emerging systems challenges that may benefit from a similar cross-stack approach, such as hyperscale ML systems.Speaker Bio#
Zixian Cai is a PhD student at the Australian National University, advised by Steve Blackburn (Google DeepMind, ANU), Michael Bond (Ohio State), and Martin Maas (Google DeepMind). His research interests lie at the intersection of programming languages and computer architecture. He graduated with a Bachelor of Philosophy from the Australian National University with the University Medal.