Incremental Checkpointing of Large State Simulation Models with Write-Intensive Events via Memory Update Correlation on Buddy Pages

Romolo Marotta, Federica Montesano, Alessandro Pellegrini, and Francesco Quaglia


Published in: Proceedings of the 27th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications
pdf Download PDF

Abstract:
Checkpointing techniques for speculative parallel simulation of discrete event models have been widely studied in the literature. However, there has been a very marginal attempt to exploit operating system page-protection services, which have instead been largely exploited in the context of checkpointing for fault tolerance. In this article, we discuss how these services can effectively manage simulation models with large states and write-intensive events on zones in the state layout. In particular, we present a solution where the correlation of write operations on buddy pages in the state layout can be exploited for achieving effective incremental checkpointing support, which allows scaling down the costs of operating system services. Our solution does not require any instrumentation of the simulation application code and is usable on any Posix-compliant operating system. We also discuss its integration within the USE (Ultimate-Share-Everything) open-source speculative simulation package and report some experimental data for its assessment.

BibTeX Entry:

@inproceedings{Mar23,
author = {Marotta, Romolo and Montesano, Federica and Pellegrini, Alessandro and Quaglia, Francesco},
title = {Incremental Checkpointing of Large State Simulation Models with Write-Intensive Events via Memory Update Correlation on Buddy Pages},
booktitle = {Proceedings of the 27th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications},
year = {2023},
month = oct,
publisher = {IEEE},
series = {DS-RT},
location = {Singapore},
note = {Shortlisted for the Best Paper Award}
}