Authors: Blaise Omer Yenke, Jean-Franc¸ois Mehaut, Jean Michel Nlong and Rodrigue Chakode
This paper sets out to present the integration of the scheduling of deadline-constrained check pointing in a batch scheduler for dynamic environments such as virtual clusters. The check pointing scheduler implemented focuses on the parallel check pointing on a unique server of long-running independent applications in a virtual cluster made up of free resources for long periods of an intranet network, assuming that the resources must be released within a delay T. As parallel check pointing on a unique server can face bandwidth constraints, the check pointing scheduler uses a function that gives the aggregated bandwidth suitable for the parallel check pointing of m applications of aggregated size V to solve the deadline-constrained check pointing problem within the deadline T. Specifically, we present the integration of the check pointing scheduler in the batch scheduler OAR. This implementation uses data from the OAR database for the check pointing scheduling. It is portable and can be easily modified to interact with any other batch scheduler, provided that the structure of the database is known and an estimator of the bandwidth of the system suitable for parallel check pointing available. Experimental results obtained on a virtual cluster built on GRID 5000 show that the implementation of the check pointing scheduler does not induce a significant overhead on check pointing mechanisms. As a consequence, this work aims at providing HPC platforms for a tool to enhance the quality of services offered to end users.
Keywords: Scheduling, Checkpointing, Batch Scheduler and Dynamic Environments