Architecture Paradigms and Programming Languages for Efficient programming of multiple CORES (Apple-CORE)

http://www.apple-core.info/, Funded by EU, FP7-ICT-215215.
My involvement: August 2009 – January 2011.

When I started my Ph.D. study in 2009 the first project I worked on was Apple-CORE. Our research group in UTIA was tasked with developing an FPGA prototype of a micro-threading (~multi-threading) processor based on the specification provided by project partners. The processor that we developed is called UTLEON3. It is based on the LEON3 from Aeroflex Gaisler, a commercial open-source implementation of the SPARCv8 RISC architecture. UTLEON3 was released as an open-source under the GPL license as a supplement to our book about the implementation.

My research work in the project involved defining and developing an interface between the multithreaded processor and custom hardware accelerators. The subsystem is called Hardware Families of Threads (HWFAM). The scheme enables transparent migration of thread families created in the processor to accelerators. The work was presented as a paper at DSD’2011.

The video below shows a demo I put together for the final project review. The demo set-up is ML509 Virtex 5 FPGA development board connected to a monitor via its DVI/VGA output. The FPGA is loaded with a system consisting of two processors: UTLEON3 (our microthreaded CPU) and LEON3 (the original classical CPU). Both processors execute program that computes Mandelbrot fractal set and they both draw pixels on the screen. The trick is that UTLEON3 computes and draws the top half of the screen while LEON3 does the bottom half. In UTLEON3 each pixel is computed and drawn in a separate microthread. The beautiful thing about this particular demo is that you can directly see how microthreads are executed in the processor out of order! The pixels appear on-screen in blocks of 16 because this is the cache-line word length in the write-back D-Cache.

More info about microthreading

concurrency-with-microgrid

The Self-adaptive Virtual Processor (SVP) is a general-purpose concurrency model used to design and program multithreaded multicore systems. The SVP expresses fine-grained concurrency by composition of microthreads. A microthread comprises only a few processor instructions which typically implement a body of a  loop and share only a small portion of the processor register file. A family of threads is an ordered set of threads, all created by one processor instruction, called create. Microthreads can execute in parallel by default. Explicit unidirectional dependencies between successive threads within one family, and between a parent thread and its child family, are allowed only. A family of threads is akin to a parallel for-loop.

By creating families of potentially many fine-grained threads in one event the SVP allows implementations to amortize overheads associated with individual thread management. A family of threads represents a batch of coarse-grained work to be scheduled in the multicore system. Individual threads are executed independently and no restriction is placed on conditional branches.

utleon3

apple-logo