WfBench: Workflow Benchmarks

WfBench is a generator of realistic workflow benchmark specifications that can be translated into benchmark code to be executed with current workflow systems. it generates workflow tasks with arbitrary performance characteristics (CPU, memory, and I/O usage) and with realistic task dependency structures based on those seen in production workflows.

The generation of workflow benchmakrs is twofold. First, a realistic workflow benchmark specification is generated in the WfFormat. Then, this specification is translated into benchmark code to be executed with a workflow system.

Generating Workflow Benchmark Specifications

The WorkflowBenchmark class uses recipes of workflows (as described in Generating Workflow Recipes) for generating workflow benchmarks with an arbitrary number of tasks:

import pathlib

from wfcommons import BlastRecipe
from wfcommons.wfbench import WorkflowBenchmark

# create a workflow benchmark object to generate specifications based on a recipe
benchmark = WorkflowBenchmark(recipe=BlastRecipe, num_tasks=500)
# generate a specification based on performance characteristics
path = benchmark.create_benchmark(pathlib.Path("/tmp/"), cpu_work=100, data=10, percent_cpu=0.6)

In the example above, the workflow benchmark generator first invokes the WfChef recipe to generate a task graph. Once the task graph has been generated, each task is set to be an instance of the workflow task benchmark. For each task, the following values for the parameters of the workflow task benchmark can be specified:

  • cpu_work: CPU work per workflow task. The cpu-benchmark executable (compiled C++) calculates an increasingly precise value of π up until the specified total amount of computation (cpu_work) has been performed.

  • data: Individual data volumes for each task in a way that is coherent with respect to task data dependencies (in the form of a dictionary of input size files per workflow task type). Alternatively, a total data footprint (in MB) can be defined, i.e., the sum of the sizes of all data files read/written by workflow tasks, in which case uniform I/O volumes are computed for each workflow task benchmark.

  • percent_cpu: The fraction of the computation’s instructions that correspond to non-memory operations.

Translating Specifications into Benchmark Codes

WfCommons provides a collection of translators for executing the benchmarks as actual workflow applications. Below, we provide illustrative examples on how to generate workflow benchmakrs for the currently supported workflow systems.

The Translator class is the foundation for each translator class. This class takes as input either a Workflow object or a path to a workflow benchmark description in WfFormat.

Warning

WfBench leverages stress-ng (https://github.com/ColinIanKing/stress-ng) to execute memory-intensive threads. Therefore, it is crucial to ensure that stress-ng is installed on all worker nodes.

Pegasus

Pegasus orchestrates the execution of complex scientific workflows by providing a platform to define, organize, and automate computational tasks and data dependencies. Pegasus handles the complexity of large-scale workflows by automatically mapping tasks onto distributed computing resources, such as clusters, grids, or clouds. Below, we provide an example on how to generate workflow benchmark for running with Pegasus:

import pathlib

from wfcommons import BlastRecipe
from wfcommons.wfbench import WorkflowBenchmark, PegasusTranslator

# create a workflow benchmark object to generate specifications based on a recipe
benchmark = WorkflowBenchmark(recipe=BlastRecipe, num_tasks=500)

# generate a specification based on performance characteristics
benchmark.create_benchmark(pathlib.Path("/tmp/"), cpu_work=100, data=10, percent_cpu=0.6)

# generate a Pegasus workflow
translator = PegasusTranslator(benchmark.workflow)
translator.translate(output_file_name=pathlib.Path("/tmp/benchmark-workflow.py"))

Warning

Pegasus utilizes the HTCondor framework to orchestrate the execution of workflow tasks. By default, HTCondor does not implement CPU affinity for program threads. However, WfBench offers an extra capability to enforce CPU affinity during benchmark execution. To enable this feature, you need to specify the lock_files_folder parameter when using create_benchmark().

Swift/T

Swift/T is an advanced workflow system designed specifically for high-performance computing (HPC) environments. It dynamically manages task dependencies and resource allocation, enabling efficient utilization of HPC systems. It provides a seamless interface to diverse tools, libraries, and scientific applications, making it easy to integrate existing codes into workflows. Below, we provide an example on how to generate workflow benchmark for running with Swift/T:

import pathlib

from wfcommons import BlastRecipe
from wfcommons.wfbench import WorkflowBenchmark, SwiftTTranslator

# create a workflow benchmark object to generate specifications based on a recipe
benchmark = WorkflowBenchmark(recipe=BlastRecipe, num_tasks=500)

# generate a specification based on performance characteristics
benchmark.create_benchmark(pathlib.Path("/tmp/"), cpu_work=100, data=10, percent_cpu=0.6)

# generate a Swift/T workflow
translator = SwiftTTranslator(benchmark.workflow)
translator.translate(output_file_name=pathlib.Path("/tmp/benchmark-workflow.swift"))