I am a graduate student in Computer Engineering at School of Computing, Informatics, and Decisions Systems Engineering (CIDSE), Arizona State University (ASU) beginning Spring 2017. Prior joining the doctoral program, I earned my master's at ASU in Computer Engineering in 2016. My research interest lies in accelerating compute-intensive applications on hardware accelerators in resource-efficient manner by means of optimizations across hardware/software/application computing.
My research at Compiler Micro-architecture Lab (CML), ASU is advised by Prof. Aviral Shrivastava, focusing on "Coarse-Grained Reconfigurable Accelerators (CGRAs) for Domain-customized and General-purpose Computing". CGRAs are popular energy-efficient dataflow accelerators that can efficiently speed-up performance-critical loops of imaging and media processing applications, machine learning models, embedded systems, and even non-vectorizable loop kernels. My current research efforts focuses on developing the tools and techniques for coarse-grained programmable dataflow accelerators for deep learning, including mapping optimizations, architectural design explorations, and efficient hardware/software/model co-designs. My prior industry experiences include compiler optimizations and code generation for embedded systems, as well as RTL design and verification for ASIC and FPGA platforms.
• Programmable Hardware Accelerators • Computer Architecture • Deep Learning • Compiler Design and Optimizations • High-Performance Computing • Hardware/Software/Algorithm Co-Design • Embedded Systems • DNN Model Compression
• Hardware Acceleration of Sparse and Irregular Tensor Computations of Machine Learning Models: A Comprehensive Survey and Insights (https://arxiv.org/abs/2007.00864) Key topics: Sources of sparsity in tensors; implications of irregular or structured sparsity on hardware acceleration; analysis on sparsity-encoding schemes on storage; impact of varying sparsity and tensor shapes of different DNN operations on data reuse; techniques for data extraction and load balancing of effectual computations; sparsity-aware dataflows; leveraging value similarity in tensors; trends and directions for hardware/model codesigns for DNNs, etc.
• dMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators, ACM Transactions on Embedded Computing Systems, 2019 (Special Issue on CODES+ISSS 2019). Key topics: Defining comprehensive hardware/software design space for loop nests; Accelerator cost model for evaluating execution metrics for variations in hardware architecture, dataflows, model layers, Search-space reduction techniques for getting efficient mappings in a few seconds; Generic algorithms for obtaining all unique data reuse scenarios for loop-orderings. (Code for optimization of dataflow mappings for DNNs and design explorations of hardware accelerators: https://github.com/MPSLab-ASU/dMazeRunner)
• RAMP: resource-aware mapping for CGRAs, In 55th annual Design Automation Conference, 2018 (Mapping optimizations for accelerating loops of general-purpose computing).