Title: Programmable and Customizable Hardware Accelerators for Self-adaptive Virtual Processors in FPGA
Status: Defended on 10.04.2014
Thesis text: [pdf download] (v15, 239 pages of A4 paper)
Ph.D. Statement: [pdf download] (v12, 34 pages of A5 paper)
Abstract: The roots of all evil are the latencies that are statically unpredictable. Dynamic schedule of operations, constructed on-the-fly in data-driven machines, is needed to overcome them. Microthreading is a unified data-driven and dynamically scheduled model for efficient programming of many-core general-purpose processors. It overcomes unpredictable latencies in off-chip memories (DRAMs) and in on-chip shared interconnect. As silicon chips became power-limited, causing the shift from frequency scaling to many-core scaling, the previous work envisioned large-scale homogeneous many-core chips because it assumed that low-clock frequency silicon is easily scalable in space. However, the contemporary and future power constraints will favour heterogeneous (specialized) rather than homogeneous (general-purpose) many-cores because the thermal design power of a chip could be so low that not all cores may be powered up simultaneously.
Besides the power issues the other negative side-effect of silicon scaling is an increase in latency of interconnect (metal wires) relative to that of gates: new designs are becoming limited by interconnect delays. As the interconnect delays depend on details of physical placement of modules in a chip or in a reconfigurable array they are difficult to predict accurately early on in the design process. Consequently, future hardware will be special-purpose and customized due to the power issues, and it will be data-driven to overcome on-chip interconnect latencies.
This dissertation explores dataflow latency-tolerant techniques with a focus on customized hardware design using reconfigurable hardware arrays. Dataflow is studied at the gate and chip levels: gate-level dataflow overcomes on-chip interconnect delays, and chip-level dataflow allows for the composition of scalable heterogeneous many-cores.
The first contribution is an analysis of a contemporary statically scheduled instruction-driven architecture for customized computing realized in an FPGA. In contrast to the original design bases of the architecture it is shown here that high-frequency instruction issue is needed even in an architecture with batch (vector-based) data processing. The second contribution is a method to achieve the high-frequency instruction issue by using dictionary tables of instruction fragments.
Statically scheduled data-path used to be preferred because all latencies (including interconnect) were assumed to be fully known early in the design time. The third contribution is a new structured and extensible approach for synthesis of hardware controllers from synchronous Petri nets. The fourth contribution is a new technique for dataflow hardware synthesis from Petri nets. The technique is based on augmented synchronous Petri nets with optimal throughput.
The fifth contribution is a technique that combines the data-driven microthreaded procedural computation model with the special-purpose data-driven hardware in structurally programmed reconfigurable arrays. Adaptive transparent migration of microthreads between the general-purpose and special-purpose hardware is demonstrated.