Date: 26th March 2023
Venue: Vancouver, BC, Canada
Important: ASPLOS requires attendees to specify their intention to attend the tutorial while registering for the conference.
Francisco Muñoz-Martínez (Universidad de Murcia)
Raveesh Garg (Georgia Institute of Technology)
Tushar Krishna (Georgia Institute of Technology)
José L. Abellán (Universidad de Murcia)
Manuel E. Acacio (Universidad de Murcia)
The design of specialized architectures for accelerating the inference procedure of Deep Learning (DL) is a booming area of research nowadays. While first-generation systolic-based accelerator proposals used simple fixed dataflows tailored for dense Deep Neural Networks (DNNs) applications, more recent architectures like MAERI or SIGMA have argued for flexibility to efficiently support a wide variety of layer types, dimensions, and sparsity. In addition, the recent appearance of Graph Neural Networks (GNNs) applications has resulted into multi-phase accelerators that combine the execution of multiple kernels in a pipelined manner, making the architectures much more complex.
As the complexity of these accelerators grows, the analytical models currently being used for design-space exploration are unable to capture execution-time subtleties, leading to inexact results in many cases. This opens up a need for cycle-level simulation tools to allow for fast and accurate design-space exploration of DL accelerators, and rapid quantification of the efficacy of architectural enhancements during the early stages of a design. To this end, STONNE (Simulation Tool for Neural Network Engines) is a cycle-level microarchitectural simulation framework that can plug into any high-level DL framework as an accelerator device and perform full-model evaluation (i.e. we are able to simulate real, complete, unmodified DNN models) of state-of-the-art systolic and flexible DNN accelerators, both with and without sparsity support. STONNE is developed by the University of Murcia and the Georgia Institute of Technology and is open-sourced under the terms of the MIT license.
In this tutorial we demonstrate how STONNE enables research on DNNs accelerators by means of several use cases that range from the microarchitectural networks on-chip present in DNN accelerators to the scheduling strategies that can be utilized to improve energy efficiency in sparse accelerators. Further, we present OMEGA, another framework built on top of STONNE that enables the exploration of dataflows for accelerators for multi-phase GNN applications which are gaining popularity in the AI and HPC community.
Figure above shows a high-level view of STONNE with its three major modules for full-model simulation flows. The input module feeds the simulator with the values to be computed along with the DNN dataflows and the hardware configuration. Then, the simulation engine performs the cycle-level simulation using its internal simulated microarchitectural building blocks. Finally, the output module returns the statistics of the simulation. For more details, please refer to the features of STONNE.
Figure above shows a brief overview of OMEGA framework built on top of STONNE. OMEGA computes GNNs which consist of an SpMM followed by a GEMM phase. STONNE simulator accurately computes the timestamps and buffer accesses for individual phases and these are fed into the inter-phase cost model which computes the final metrics considering the inter-phase (between the phases) dataflow/pipelining strategies. Please refer to the GNN Dataflows page for more details.
Figure above shows an overview of STONNE connection to the memory hierarchy. The requests go through the cache hierarchy instead of scratchpad, thus providing an ability to simulate shared memory heterogeneous systems.
Link to timestamps of individual sections is included in the table below
Time PT(pm) | Agenda | Presenter | Slides | Video (timestamps) |
---|---|---|---|---|
1:40-2:00 | A Communication-Centric Approach to Flexible Accelerator Design | Jose Luis | Welcome and PART1 | Link to timestamp |
2:00-2:20 | Cycle accurate simulation motivation and Overview of STONNE | Jose Luis | PART2 | Link to timestamp |
2:20-3:20 | (Hands-on) STONNE | Francisco | PART3 | Link to timestamp |
3:20-3:40 | Coffe Break | |||
3:40-4:10 | (Hands-on) SST-STONNE | Francisco | Included in PART3 | Link to timestamp |
4:10-4:40 | GNN Dataflow Taxonomy and (Demo) OMEGA framework. | Raveesh | PART4 | Link to timestamp |
4:40-5:00 | Roadmap for future research/development | Tushar | PART5 | Link to timestamp |
docker run -it stonnesimulator/stonne-simulators
git clone https://github.com/stonne-simulator/tutorials (inside docker)
https://github.com/stonne-simulator/stonne
https://github.com/stonne-simulator/omega
https://github.com/stonne-simulator/sst-elements-with-stonne
Francisco Muñoz-Martínez, José L. Abellán, Manuel E. Acacio, and Tushar Krishna. Stonne: Enabling cycle-level microarchitectural simulation for dnn inference accelerators. In 2021 IEEE International Symposium on Workload Characterization (IISWC), 2021. (pdf)
Raveesh Garg, Eric Qin, Francisco Muñoz-Martínez, Robert Guirado, Akshay Jain, Sergi Abadal, José L Abellán, Manuel E Acacio, Eduard Alarcón, Sivasankaran Rajamanickam, and Tushar Krishna. Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2022. (pdf)
Francisco Muñoz-Martínez, Raveesh Garg, Michael Pellauer, José L. Abellán, Manuel E. Acacio, and Tushar Krishna. 2023. Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 252–265. https://doi.org/10.1145/3582016.3582069 .(pdf)
If you use STONNE or OMEGA framework in your research or if you run on flexagon accelerator model in SST-STONNE. Please cite-
@INPROCEEDINGS{STONNE21,
author = {Francisco Mu{\~n}oz-Mart{\'i}nez and Jos{\'e} L. Abell{\'a}n and Manuel E. Acacio and Tushar Krishna},
title = {STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators},
booktitle = {2021 IEEE International Symposium on Workload Characterization (IISWC)},
year = {2021},
volume = {},
number = {},
pages = {},
}
@inproceedings{garg2021understanding,
title={Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators},
author={Garg, Raveesh and Qin, Eric and Mu{\~n}oz-Mart{\'\i}nez, Francisco and Guirado, Robert and Jain, Akshay and Abadal, Sergi and Abell{\'a}n, Jos{\'e} L and Acacio, Manuel E and Alarc{\'o}n, Eduard and Rajamanickam, Sivasankaran and Krishna, Tushar},
booktitle={2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
year={2022}
}
@inproceedings{munoz2023flexagon,
title={Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing},
author={Mu{\~n}oz-Mart{\'\i}nez, Francisco and Garg, Raveesh and Pellauer, Michael and Abell{\'a}n, Jos{\'e} L and Acacio, Manuel E and Krishna, Tushar},
booktitle={Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3},
pages={252--265},
year={2023}
}
Francisco Muñoz-Martínez - francisco.munoz2@um.es
This work was supported by NEC Laboratories Europe, Project grant PID2020-112827GB-I00, by MCIN/AEI 10.13039/501100011033, RTI2018-098156-B-C53 (MCIU/AEI/FEDER,UE), NSF OAC 1909900 and US Department of Energy ARIAA co-design center. The work was also supported by Grant RYC2021-031966-I funded by MCIN/AEI/ 10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR”. F. Muñoz-Martínez was supported by grant 20749/FPI/18 from Fundación Séneca.