Sampling Policies for Near-Optimal Device Choice in Parallel Simulations on CPU/GPU Platforms

Philipp Andelfinger, Alessandro Pellegrini, and Romolo Marotta


Published in: Proceedings of the 28th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications
pdf Download PDF

Abstract:
Heterogeneous hardware platforms comprised of CPUs, GPUs, and other accelerators offer the opportunity to choose the best-suited device for executing a given scientific simulation in order to minimize execution time and energy consumption. To this end, the recently proposed “Follow the Leader” approach dynamically selects a suitable device based on runtime performance measurements during speculative discreteevent simulations. A currently active “leader” device is periodically challenged by a “follower” device in order to negotiate the new leader. The optimality of the device choices and the associated overhead depends critically on the challenge frequency and timing. Here, we explore policies to schedule challenges with the goal of attaining Pareto-optimal combinations of execution time and energy consumption. Several heuristics are first evaluated in an abstract fashion using a “meta-simulation” by mimicking the progress and energy consumption of an idealized co-execution. In this setting, we optimize the heuristics’ tuning parameters to assess their relative merits in near-optimal configurations when compared to challenge timings based on perfect knowledge. We find that under challenging stochastic workloads based on a class of mean-reverting random walks, the best heuristics can closely approximate the execution time and energy consumption achievable under an optimal device choice. Empirical support for this observation is given by measurements of a CPU/GPU co-execution of the Time Warp algorithm on physical hardware.

BibTeX Entry:

@inproceedings{And24,
author = {Andelfinger, Philipp and Pellegrini, Alessandro and Marotta, Romolo},
title = {Sampling Policies for Near-Optimal Device Choice in Parallel Simulations on CPU/GPU Platforms},
booktitle = {Proceedings of the 28th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications},
year = {2024},
month = oct,
publisher = {IEEE},
series = {DS-RT},
location = {Urbino, Italy}
}