Data streaming applications, such as signal processing, multimedia applications, often require high computing capacity, yet also have stringent power constraints, especially in portable devices. General purpose processors can no longer meet these requirements due to their sequential software execution. Although fixed logic ASICs are usually able to achieve the best performance and energy efficiency, ASIC solutions are expensive to design and their lack of flexibility makes them unable to accommodate functional changes or new system requirements. Reconfigurable systems have long been proposed to bridge the gap between the flexibility of software processors and performance of hardware circuits. Unfortunately, mainstream reconfigurable FPGA designs suffer from high cost of area, power consumption and speed due to the routing area overhead and timing penalty of their bit-level fine granularity. In this dissertation, we present an architecture design, application mapping and performance evaluation of a novel coarse-grained reconfigurable architecture, named SmartCell, for data streaming applications. The system tiles a large number of computing cell units in a 2D mesh structure, with four coarse-grained processing elements developed inside each cell to form a quad structure. Based on this structure, a hierarchical reconfigurable network is developed to provide flexible on-chip communication among computing resources: including fully connected crossbar, nearest neighbor connection and clustered mesh network. SmartCell can be configured to operate in various computing modes, including SIMD, MIMD and systolic array styles to fit for different application requirements. The coarse-grained SmartCell has the potential to improve the power and energy efficiency compared with fine-grained FPGAs. It is also able to provide high performance comparable to the fixed function ASICs through deep pipelining and large amount of computing parallelism. Dynamic reconfiguration is also addressed in this dissertation. To evaluate its performance, a set of benchmark applications has been successfully mapped onto the SmartCell system, ranging from signal processing, multimedia applications to scientific computing and data encryption. A 4 by 4 SmartCell prototype system was initially designed in CMOS standard cell ASIC with 130 nm process. The chip occupies 8.2 mm square and dissipates 1.6 mW/MHz under fully operation. The results show that the SmartCell can bridge the performance and flexibility gap between logic specific ASICs and reconfigurable FPGAs. SmartCell is also about 8% and 69% more energy efficient and achieves 4x and 2x throughput gains compared with Montium and RaPiD CGRAs. Based on our first SmartCell prototype experiences, an improved SmartCell-II architecture was developed, which includes distributed data memory, segmented instruction format and improved dynamic configuration schemes. A novel parallel FFT algorithm with balanced workloads and optimized data flow was also proposed and successfully mapped onto SmartCell-II for performance evaluations. A 4 by 4 SmartCell-II prototype was then synthesized into standard cell ASICs with 90 nm process. The results show that SmartCell-II consists of 2.0 million gates and is fully functional at up to 295 MHz with 3.1 mW/MHz power consumption. SmartCell-II is about 3.6 and 28.9 times more energy efficient than Xilinx FPGA and TI's high performance DSPs, respectively. It is concluded that the SmartCell is able to provide a promising solution to achieve high performance and energy efficiency for future data streaming applications.
Worcester Polytechnic Institute
Electrical & Computer Engineering
All authors have granted to WPI a nonexclusive royalty-free license to distribute copies of the work. Copyright is held by the author or authors, with all rights reserved, unless otherwise noted. If you have any questions, please contact firstname.lastname@example.org.
Liang, C. (2009). SmartCell: An Energy Efficient Reconfigurable Architecture for Stream Processing. Retrieved from https://digitalcommons.wpi.edu/etd-dissertations/263
FPGA, energy efficient, CGRA, ASIC, DSP, stream processing