WfGen: Generating Workflows

WfGen is a component of WfCommons project that targets the generation of realistic synthetic workflow instances with a variety of characteristics. The WorkflowGenerator class uses recipes of workflows (as described in Generating Workflow Recipes) for creating the realistic synthetic instances. The resulting workflows are represented in the WfFormat, which is already supported by simulation frameworks such as WRENCH.

WfCommons Workflows Recipes

This Python package provides several workflow recipes for generating realistic synthetic workflow instances. The current list of available workflow recipes include:

  • BlastRecipe: from wfcommons.wfchef.recipes import BlastRecipe

  • BwaRecipe: from wfcommons.wfchef.recipes import BwaRecipe

  • CyclesRecipe: from wfcommons.wfchef.recipes import CyclesRecipe

  • EpigenomicsRecipe: from wfcommons.wfchef.recipes import EpigenomicsRecipe

  • GenomeRecipe: from wfcommons.wfchef.recipes import GenomeRecipe

  • MontageRecipe: from wfcommons.wfchef.recipes import MontageRecipe

  • SeismologyRecipe: from wfcommons.wfchef.recipes import SeismologyRecipe

  • SoykbRecipe: from wfcommons.wfchef.recipes import SoykbRecipe

  • SrasearchRecipe: from wfcommons.wfchef.recipes import SrasearchRecipe

The Workflow Instances Generator

Synthetic workflow instances are generated using the WorkflowGenerator class. This class takes as input a WorkflowRecipe object (see in Generating Workflow Recipes), and provides two methods for generating synthetic workflow instances:

  • build_workflow(): generates a single synthetic workflow instance based on the workflow recipe used to instantiate the generator.

  • build_workflows(): generates a number of synthetic workflow instances based on the workflow recipe used to instantiate the generator.

The build methods use the workflow recipe for generating realistic synthetic workflow instances, in which the workflow structure follows workflow composition rules defined in the workflow recipe, and tasks runtime, and input and output data sizes are generated according to distributions obtained from actual workflow execution instances (see WfInstances: Workflow Instances).

Each generated instance is represented as a Workflow object (which in itself is an extension of the NetworkX DiGraph class). The Workflow class provides two methods for writing the generated workflow instance into files:

All workflow recipes provide a common method, from_num_tasks, that defines the lower bound for the total number of tasks in the generated synthetic workflow.

Increasing/Reducing Runtime and File Sizes

Workflow recipes also allow the generation of synthetic workflows with increased/reduced runtimes and/or files sizes determined by a factor provided by the user:

  • runtime_factor: The factor of which tasks runtime will be increased/decreased.

  • input_file_size_factor: The factor of which tasks input files size will be increased/decreased.

  • output_file_size_factor: The factor of which tasks output files size will be increased/decreased.

The following example shows how to create a Seismology workflow recipe in which task runtime is increased by 10%, input files by 50%, and output files reduced by 20%:

from wfcommons.wfchef.recipes import SeismologyRecipe

# creating a Seismology workflow recipe with increased/decreased runtime and file sizes
recipe = SeismologyRecipe.from_num_tasks(num_tasks=100, runtime_factor=1.1, input_file_size_factor=1.5, output_file_size_factor=0.8)

Examples

The following example generates a Seismology synthetic workflow instance os 300 tasks, builds a synthetic workflow instance, and writes the synthetic instance to a JSON file.

import pathlib
from wfcommons.wfchef.recipes import SeismologyRecipe
from wfcommons import WorkflowGenerator

generator = WorkflowGenerator(SeismologyRecipe.from_num_tasks(250))
workflow = generator.build_workflow()
workflow.write_json(pathlib.Path('seismology-workflow.json'))

The example below generates a number of 10 Blast synthetic workflow instances for every size defined in the array num_tasks:

import pathlib
from wfcommons.wfchef.recipes import BlastRecipe
from wfcommons import WorkflowGenerator

num_tasks = [100, 250, 370, 800]

for task in num_tasks:
    generator = WorkflowGenerator(BlastRecipe.from_num_tasks(task))
    workflows = generator.build_workflows(10)

    for i, workflow in enumerate(workflows):
        workflow.write_json(pathlib.Path(f'blast-workflow-{task}-{i}.json'))

The following example generates 10 Epigenomics synthetic workflow instances based on the number of tasks entered by the user (1000), builds the synthetic workflow instances, and writes the synthetic instances to JSON files.

import pathlib
from wfcommons.wfchef.recipes import EpigenomicsRecipe
from wfcommons import WorkflowGenerator

generator = WorkflowGenerator(EpigenomicsRecipe.from_num_tasks(1000))
for i, workflow in enumerate(generator.build_workflows(10)):
    workflow.write_json(pathlib.Path(f'epigenomics-workflow-{i}.json'))

The example below generates a Cycles (agroecosystem) synthetic workflow instance based on the number of tasks entered by the user (250), builds the synthetic workflow instance, and writes the synthetic instance to a JSON file.

import pathlib
from wfcommons.wfchef.recipes import CyclesRecipe
from wfcommons import WorkflowGenerator

generator = WorkflowGenerator(CyclesRecipe.from_num_tasks(250))
workflow = generator.build_workflow()
workflow.write_json(pathlib.Path('cycles-workflow.json'))