!!Title SaC: Off-the shelf support for data parallelism on multicores !!Authors Clemens Grelck - University of Lübeck Sven-Bodo Scholz - University of Hertfordshrie !!Conference or Journal where the paper was published Proceedings of the 2007 workshop on Declarative aspects of multicore programming !!Summary !!!Motivation/Problem Multicore processors have become commonplace nowadays. Packing a number of von Neumann cores in a shared-memory architecture is THE WAY by which manufacturers hope to make programs run faster. But programming these systems is still a big challenge. The programmer usually must: * Spread the problem among the processing elements; * Choose the granularity of subtasks; * Control memory access to avoid bottlenecks. Some help has been found, although, in areas of application where parallel patterns prevail and compilers are able to automatically identify and map parallelism to multiple processing elements. A promising approach is data parallelism, and this paper presents a functional data-parallel language called SaC (Single-assignment C), along with its optimising compiler. Relevance for my research: there are reported results describing compilation of SaC code directly into FPGA fabric. !!!Solution * SaC is a FUNCTIONAL language that deals with ARRAYS ** Every SaC expression evaluates to an array ** Two sources of parallelism: *** The order of evaluation of expressions is irrelevant *** Operations over arrays are uniform ** Mainly the latter form of parallelism is exploited * Standard C syntax is inherited, but semantics differ * The ATOMIC language construct enabling parallelism is the WITH-loop !!!Implementation * How parallel execution is realized ** Each (index,value) pair is totally independent, and a microthread evaluates its expression ** We schedule microthreads to OS threads, balancing workload and considering data locality ** A master thread executes the sequential part of the program and spawns worker threads to process with-loops as needed :: {img fileId="291" width="650" rel="box[g]"} :: * Optimisations ** We optimise to reduce 2 bad side-effects: too small code for a microthread and too many intermediary arrays ** Loop folding: condense pipelines of array operations :: {img fileId="292" width="400" rel="box[g]"} :: :: {img fileId="293" width="400" rel="box[g]"} :: ** Loop fusion: combines neighbouring loops ** Loop scalarisation, A.K.A deforestation: flattens tree-like structures (nested arrays) (picture) !!!Results The main contribution of the SaC project was to combine a somewhat C-like syntax to the data parallel approach, brought mainly from traditional functional programming languages. Moreover, the integration of SaC code with standard C is very easy. Case studies published in other papers from the same authors demonstrate that the SaC compilation framework is very competitive. Performances very similar to those of hand-optimized FORTRAN code have been observed.