Generating Workflows

The second axis of the WfCommons project targets the generation of realistic synthetic workflow instances with a variety of characteristics. The WorkflowGenerator class uses recipes of workflows (as described in Analyzing Instances) for creating many different synthetic workflows based on distributions of workflow task runtime, and input and output file sizes. The resulting workflows are represented in the WfCommons JSON format, which is already supported by simulation frameworks such as WRENCH.

Workflow Recipes

The WfCommons package provides a number of workflow recipes for generating realistic synthetic workflow instances. Each recipe may provide their own methods for instantiating a WorkflowRecipe object depending on the properties that define the structure of the actual workflow. For instance, the code snippet below shows how to instantiate a recipe of the Epigenomics and 1000Genome workflows:

from wfcommons.generator import EpigenomicsRecipe, GenomeRecipe

# creating an Epigenomics workflow recipe
epigenomics_recipe = EpigenomicsRecipe.from_sequences(num_sequence_files=2, num_lines=100, bin_size=10)

# creating a 1000Genome workflow recipe
genome_recipe = GenomeRecipe.from_num_chromosomes(num_chromosomes=3, num_sequences=10000, num_populations=1)

All workflow recipes also provide a common method (from_num_tasks) for instantiating a WorkflowRecipe object as follows:

from wfcommons.generator import EpigenomicsRecipe, GenomeRecipe

# creating an Epigenomics workflow recipe
epigenomics_recipe = EpigenomicsRecipe.from_num_tasks(num_tasks=9)

# creating a 1000Genome workflow recipe
genome_recipe = GenomeRecipe.from_num_tasks(num_tasks=5)

Note that num_tasks defines the upper bound for the total number of tasks in the workflow, and that each workflow recipe may define different lower bound values so that the workflow structure is guaranteed. Please, refer to the documentation of each workflow recipe for the lower bound values.

The current list of available workflow recipes include:

Increasing/Reducing Runtime and File Sizes

Workflow recipes also allow the generation of synthetic workflows with increased/reduced runtimes and/or files sizes determined by a factor provided by the user:

  • runtime_factor: The factor of which tasks runtime will be increased/decreased.

  • input_file_size_factor: The factor of which tasks input files size will be increased/decreased.

  • output_file_size_factor: The factor of which tasks output files size will be increased/decreased.

The following example shows how to create a Seismology workflow recipe in which task runtime is increased by 10%, input files by 50%, and output files reduced by 20%:

from wfcommons.generator import SeismologyRecipe

# creating a Seismology workflow recipe with increased/decreased runtime and file sizes
recipe = SeismologyRecipe.from_num_tasks(num_tasks=100, runtime_factor=1.1, input_file_size_factor=1.5, output_file_size_factor=0.8)

The Workflow Generator

Synthetic workflow instances are generated using the WorkflowGenerator class. This class takes as input a WorkflowRecipe object (see above), and provides two methods for generating synthetic workflow instances:

  • build_workflow(): generates a single synthetic workflow instance based on the workflow recipe used to instantiate the generator.

  • build_workflows(): generates a number of synthetic workflow instances based on the workflow recipe used to instantiate the generator.

The build methods use the workflow recipe for generating realistic synthetic workflow instances, in which the workflow structure follows workflow composition rules defined in the workflow recipe, and tasks runtime, and input and output data sizes are generated according to distributions obtained from actual workflow execution instances (see Analyzing Instances).

Each generated instance is a represented as a Workflow object (which in itself is an extension of the NetworkX DiGraph class). The Workflow class provides two methods for writing the generated workflow instance into files:

Examples

The following example generates a Seismology synthetic workflow instance based on the number of pair of signals to estimate earthquake STFs (num_pairs), builds a synthetic workflow instance, and writes the synthetic instance to a JSON file.

from wfcommons import WorkflowGenerator
from wfcommons.generator import SeismologyRecipe

# creating a Seismology workflow recipe based on the number
# of pair of signals to estimate earthquake STFs
recipe = SeismologyRecipe.from_num_pairs(num_pairs=10)

# creating an instance of the workflow generator with the
# Seismology workflow recipe
generator = WorkflowGenerator(recipe)

# generating a synthetic workflow instance of the Seismology workflow
workflow = generator.build_workflow()

# writing the synthetic workflow instance into a JSON file
workflow.write_json('seismology-workflow.json')

The example below generates a number of Cycles (agroecosystem) synthetic workflow instances based on the upper bound number of tasks allowed per workflow.

from wfcommons import WorkflowGenerator
from wfcommons.generator import CyclesRecipe

# creating a Cycles workflow recipe based on the number of tasks per workflow
recipe = CyclesRecipe.from_num_tasks(num_tasks=1000)

# creating an instance of the workflow generator with the
# Cycles workflow recipe
generator = WorkflowGenerator(recipe)

# generating 10 synthetic workflow instances of the Cycles workflow
workflows_list = generator.build_workflows(num_workflows=10)

# writing each synthetic workflow instance into a JSON file
count = 1
for workflow in workflows_list:
    workflow.write_json('cycles-workflow-{:02}.json'.format(count))
    count += 1