Data-driven, automatic design space exploration of neural accelerator architecture is desirable for
specialization and productivity. Previous frameworks focus on sizing the numerical architectural
hyper-parameters while neglect searching the PE connectivities and compiler mappings. We push beyond
searching only hardware hyper-parameters and propose the Neural Accelerator Architecture Search (NAAS),
which fully exploits the hardware design space and compiler mapping strategies at the same time. Unlike prior work
which formulate the hardware parameter search as a pure sizing optimization, NAAS models the co-search as a two-level
optimization problem, where each level is a combination of indexing, ordering and sizing optimization.
To tackle such challenges, we propose an encoding method which is able to encode the non-numerical parameters
such as loop order and parallel dimension chosen as numerical parameters for optimization. Thanks to the low search cost,
NAAS can be easily integrated with hardware-aware NAS algorithm by adding another optimization level, achieving the
joint searching for neural network architecture, accelerator architecture and compiler mapping. Thus NAAS composes highly
matched architectures together with efficient mapping. As a data-driven approach, NAAS rivals
the human design Eyeriss by 4.4x EDP reduction with 2.7% accuracy improvement on ImageNet under the same
computation resource, and offers 1.4x to 3.5x EDP reduction than only sizing the architectural
The overall design space can be categorized into three classes: accelerator space, compiler space, and neural-net design space. Some of these parameters are numerical values, such as array size. Some of these parameters are non-numerical values, such loop order and dataflow.
Challenge: Correlation between Design Spaces varies from accelerator to accelerator.
This table shows the correlation between neural and accelerator architecture, complicated and vary from accelerator to accelerator. A perfectly matched architectures will improve the utilization of the compute array and on-chip memories, maximizing the efficiency and performance.
(N is NVDLA and E is Eyeriss)
NAAS: Neural Accelerator Architecture Search
To achieve holistic optimization, we propose the Neural Accelerator Architecture Search, NAAS. For a specific workload, accelerator architecture search and NAS are conducted in one optimization loop to get tailored hardware and tailored neural net.
Key 1. Embedding vectors of the accelerator design and compiler mappings.
The convolution computation loop nest can be divided into two parts: temporal mapping and spatial parallelism. Loop tiling and loop ordering is reflected in the temporal mapping, and the hardware design can be inferred from the parallelism.
Therefore, the PE connectivity can be modeled as the choices of parallel dimensions. For example, two parallel dimensions indicates an 2D array; parallelism in input channels (C) means a reduction connection of the partial sum accumulation register inside each PE. Parallelism in output channels means a broadcast to input feature register inside each PE.
Hardware encoding vector contains two parts: architecture sizing and connectivity parameters, and the mapping encoding vector contains multiple parts, including loop orders for PE level and loop tiling for each array dimension level.
Key 2. Encoding non-numerical design parameters into numercial values for optimization.
During the experiment, we found that the straight-forward method that uses index to encode the selection of parallel dimensions and loop order does not work well. It is because the increment or decrement of indexes does not convey any physical information. To solve this problem, we proposed the “importance-based” encoding method as follows.
This strategy is interpretable, since the importance value represents the data locality of the dimension: the dimension labeled as most important has the best data locality since it is the outermost loop, while the dimension labeled as least important has the poorest data locality therefore it is the innermost loop.
Consistent improvement on diverse networks and hardware resources.
Joint optimization with Neural Architecture provides better specialization.