Organizer: Artur Mariano
Exploiting the computing power of the diversity of resources available on heterogeneous systems is mandatory but a very challenging task. The diversity of architectures, execution models and programming tools, together with disjoint address spaces and different computing capabilities, raise a number of challenges that severely impact on application performance and programming productivity. This problem is further compounded in the presence of data parallel irregular applications.
We present a framework that addresses the development and execution of data parallel irregular applications in heterogeneous systems. A unified task-based programming and execution model is proposed, together with inter and intra-device scheduling, which aim to achieve performance scalability across multiple devices, when coupled with a data management system, while maintaining high programming productivity. Intra-device scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels, which enable balancing irregular workloads and increase resource utilization, by allowing dynamic generation and rescheduling of new work units. Comparisons with an alternative framework that targets regular workloads, StarPU, consistently demonstrate significant speedups.