WfBench: Workflow Benchmarks

WfBench is a generator of realistic workflow benchmark specifications that can be translated into benchmark code to be executed with current workflow systems. it generates workflow tasks with arbitrary performance characteristics (CPU, memory, and I/O usage) and with realistic task dependency structures based on those seen in production workflows.

The generation of workflow benchmakrs is twofold. First, a realistic workflow benchmark specification is generated in the WfFormat. Then, this specification is translated into benchmark code to be executed with a workflow system.

Generating Workflow Benchmark Specifications

The WorkflowBenchmark class uses recipes of workflows (as described in Generating Workflow Recipes) for generating workflow benchmarks with an arbitrary number of tasks:

import pathlib

from wfcommons import BlastRecipe
from wfcommons.wfbench import WorkflowBenchmark

# create a workflow benchmark object to generate specifications based on a recipe
benchmark = WorkflowBenchmark(recipe=BlastRecipe, num_tasks=500)

# generate a specification based on performance characteristics
path = benchmark.create_benchmark(pathlib.Path("/tmp/"), cpu_work=100, data=10, percent_cpu=0.6)

In the example above, the workflow benchmark generator first invokes the WfChef recipe to generate a task graph. Once the task graph has been generated, each task is set to be an instance of the workflow task benchmark. For each task, the following values for the parameters of the workflow task benchmark can be specified:

  • cpu_work: CPU work per workflow task. The cpu-benchmark executable (compiled C++) calculates an increasingly precise value of π up until the specified total amount of computation (cpu_work) has been performed.

  • data: Individual data volumes for each task in a way that is coherent with respect to task data dependencies (in the form of a dictionary of input size files per workflow task type). Alternatively, a total data footprint (in MB) can be defined, i.e., the sum of the sizes of all data files read/written by workflow tasks, in which case uniform I/O volumes are computed for each workflow task benchmark.

  • percent_cpu: The fraction of the computation’s instructions that correspond to non-memory operations.

Generate from synthetic workflow instances

WfCommons also allows you to convert synthetic workflow instances into benchmarks directly. The generated benchmark will have exactly the same structure as the synthetic workflow instance:

import pathlib

from wfcommons import BlastRecipe
from wfcommons.wfbench import WorkflowBenchmark

# create a synthetic workflow instance with 500 tasks or use one that you already have
workflow = BlastRecipe.from_num_tasks(500).build_workflow()
# create a workflow benchmark object to generate specifications based on a recipe
benchmark = WorkflowBenchmark(recipe=BlastRecipe, num_tasks=500)
# generate a specification based on performance characteristics and the structure of the synthetic workflow instance
path = benchmark.create_benchmark_from_synthetic_workflow(pathlib.Path("/tmp/"), workflow, cpu_work=100, percent_cpu=0.6)

This is useful when you want to generate a benchmark with a specific structure or when you want benchmarks with the more detailed structure provided by WfChef workflow generation.

Translating Specifications into Benchmark Codes

WfCommons provides a collection of translators for executing the benchmarks as actual workflow applications. Below, we provide illustrative examples on how to generate workflow benchmarks for the currently supported workflow systems.

The Translator class is the foundation for each translator class. This class takes as input either a Workflow object or a path to a workflow benchmark description in WfFormat.

Warning

WfBench leverages stress-ng (https://github.com/ColinIanKing/stress-ng) to execute memory-intensive threads. Therefore, it is crucial to ensure that stress-ng is installed on all worker nodes.

Nextflow

Nextflow is a workflow management system that enables the development of portable and reproducible workflows. It supports deploying workflows on a variety of execution platforms including local, HPC schedulers, and cloud-based and container-based environments. Below, we provide an example on how to generate workflow benchmark for running with Nextflow:

import pathlib

from wfcommons import BlastRecipe
from wfcommons.wfbench import WorkflowBenchmark, NextflowTranslator

# create a workflow benchmark object to generate specifications based on a recipe
benchmark = WorkflowBenchmark(recipe=BlastRecipe, num_tasks=500)

# generate a specification based on performance characteristics
benchmark.create_benchmark(pathlib.Path("/tmp/"), cpu_work=100, data=10, percent_cpu=0.6)

# generate a Nextflow workflow
translator = NextflowTranslator(benchmark.workflow)
translator.translate(output_file_name=pathlib.Path("/tmp/benchmark-workflow.nf"))

Warning

Nextflow’s way of defining workflows does not support tasks with iterations i.e. tasks that depend on another instance of the same abstract task. Thus, the translator fails when you try to translate a workflow with iterations.

Pegasus

Pegasus orchestrates the execution of complex scientific workflows by providing a platform to define, organize, and automate computational tasks and data dependencies. Pegasus handles the complexity of large-scale workflows by automatically mapping tasks onto distributed computing resources, such as clusters, grids, or clouds. Below, we provide an example on how to generate workflow benchmark for running with Pegasus:

import pathlib

from wfcommons import BlastRecipe
from wfcommons.wfbench import WorkflowBenchmark, PegasusTranslator

# create a workflow benchmark object to generate specifications based on a recipe
benchmark = WorkflowBenchmark(recipe=BlastRecipe, num_tasks=500)

# generate a specification based on performance characteristics
benchmark.create_benchmark(pathlib.Path("/tmp/"), cpu_work=100, data=10, percent_cpu=0.6)

# generate a Pegasus workflow
translator = PegasusTranslator(benchmark.workflow)
translator.translate(output_file_name=pathlib.Path("/tmp/benchmark-workflow.py"))

Warning

Pegasus utilizes the HTCondor framework to orchestrate the execution of workflow tasks. By default, HTCondor does not implement CPU affinity for program threads. However, WfBench offers an extra capability to enforce CPU affinity during benchmark execution. To enable this feature, you need to specify the lock_files_folder parameter when using create_benchmark().

Swift/T

Swift/T is an advanced workflow system designed specifically for high-performance computing (HPC) environments. It dynamically manages task dependencies and resource allocation, enabling efficient utilization of HPC systems. It provides a seamless interface to diverse tools, libraries, and scientific applications, making it easy to integrate existing codes into workflows. Below, we provide an example on how to generate workflow benchmark for running with Swift/T:

import pathlib

from wfcommons import BlastRecipe
from wfcommons.wfbench import WorkflowBenchmark, SwiftTTranslator

# create a workflow benchmark object to generate specifications based on a recipe
benchmark = WorkflowBenchmark(recipe=BlastRecipe, num_tasks=500)

# generate a specification based on performance characteristics
benchmark.create_benchmark(pathlib.Path("/tmp/"), cpu_work=100, data=10, percent_cpu=0.6)

# generate a Swift/T workflow
translator = SwiftTTranslator(benchmark.workflow)
translator.translate(output_file_name=pathlib.Path("/tmp/benchmark-workflow.swift"))