WfGen: Generating Workflows
WfGen is a component of WfCommons project that targets the generation of realistic
synthetic workflow instances with a variety of characteristics. The
WorkflowGenerator
class uses recipes
of workflows (as described in Generating Workflow Recipes)
for creating the realistic synthetic instances. The resulting workflows are represented in the
WfFormat, which is already supported by simulation frameworks such as
WRENCH.
WfCommons Workflows Recipes
This Python package provides several workflow recipes for generating realistic synthetic workflow instances. The current list of available workflow recipes include:
BlastRecipe
:from wfcommons.wfchef.recipes import BlastRecipe
BwaRecipe
:from wfcommons.wfchef.recipes import BwaRecipe
CyclesRecipe
:from wfcommons.wfchef.recipes import CyclesRecipe
EpigenomicsRecipe
:from wfcommons.wfchef.recipes import EpigenomicsRecipe
GenomeRecipe
:from wfcommons.wfchef.recipes import GenomeRecipe
MontageRecipe
:from wfcommons.wfchef.recipes import MontageRecipe
SeismologyRecipe
:from wfcommons.wfchef.recipes import SeismologyRecipe
SoykbRecipe
:from wfcommons.wfchef.recipes import SoykbRecipe
SrasearchRecipe
:from wfcommons.wfchef.recipes import SrasearchRecipe
The Workflow Instances Generator
Synthetic workflow instances are generated using the
WorkflowGenerator
class. This class takes
as input a WorkflowRecipe
object (see in Generating Workflow Recipes), and provides two methods
for generating synthetic workflow instances:
build_workflow()
: generates a single synthetic workflow instance based on the workflow recipe used to instantiate the generator.build_workflows()
: generates a number of synthetic workflow instances based on the workflow recipe used to instantiate the generator.
The build methods use the workflow recipe for generating realistic synthetic workflow instances, in which the workflow structure follows workflow composition rules defined in the workflow recipe, and tasks runtime, and input and output data sizes are generated according to distributions obtained from actual workflow execution instances (see WfInstances: Workflow Instances).
Each generated instance is represented as a Workflow
object (which in itself is an extension of the
NetworkX DiGraph
class). The Workflow
class provides two
methods for writing the generated workflow instance into files:
write_dot()
: write a DOT file of a workflow instance.write_json()
: write a JSON file of a workflow instance.
All workflow recipes provide a common method, from_num_tasks
, that defines the lower
bound for the total number of tasks in the generated synthetic workflow.
Increasing/Reducing Runtime and File Sizes
Workflow recipes also allow the generation of synthetic workflows with increased/reduced runtimes and/or files sizes determined by a factor provided by the user:
runtime_factor
: The factor of which tasks runtime will be increased/decreased.input_file_size_factor
: The factor of which tasks input files size will be increased/decreased.output_file_size_factor
: The factor of which tasks output files size will be increased/decreased.
The following example shows how to create a Seismology workflow recipe in which task runtime is increased by 10%, input files by 50%, and output files reduced by 20%:
from wfcommons.wfchef.recipes import SeismologyRecipe
# creating a Seismology workflow recipe with increased/decreased runtime and file sizes
recipe = SeismologyRecipe.from_num_tasks(num_tasks=100, runtime_factor=1.1, input_file_size_factor=1.5, output_file_size_factor=0.8)
Examples
The following example generates a Seismology synthetic workflow instance os 300 tasks, builds a synthetic workflow instance, and writes the synthetic instance to a JSON file.
import pathlib
from wfcommons.wfchef.recipes import SeismologyRecipe
from wfcommons import WorkflowGenerator
generator = WorkflowGenerator(SeismologyRecipe.from_num_tasks(250))
workflow = generator.build_workflow()
workflow.write_json(pathlib.Path('seismology-workflow.json'))
The example below generates a number of 10 Blast synthetic
workflow instances for every size defined in the array num_tasks
:
import pathlib
from wfcommons.wfchef.recipes import BlastRecipe
from wfcommons import WorkflowGenerator
num_tasks = [100, 250, 370, 800]
for task in num_tasks:
generator = WorkflowGenerator(BlastRecipe.from_num_tasks(task))
workflows = generator.build_workflows(10)
for i, workflow in enumerate(workflows):
workflow.write_json(pathlib.Path(f'blast-workflow-{task}-{i}.json'))
The following example generates 10 Epigenomics synthetic workflow instances based on the number of tasks entered by the user (1000), builds the synthetic workflow instances, and writes the synthetic instances to JSON files.
import pathlib
from wfcommons.wfchef.recipes import EpigenomicsRecipe
from wfcommons import WorkflowGenerator
generator = WorkflowGenerator(EpigenomicsRecipe.from_num_tasks(1000))
for i, workflow in enumerate(generator.build_workflows(10)):
workflow.write_json(pathlib.Path(f'epigenomics-workflow-{i}.json'))
The example below generates a Cycles (agroecosystem) synthetic workflow instance based on the number of tasks entered by the user (250), builds the synthetic workflow instance, and writes the synthetic instance to a JSON file.
import pathlib
from wfcommons.wfchef.recipes import CyclesRecipe
from wfcommons import WorkflowGenerator
generator = WorkflowGenerator(CyclesRecipe.from_num_tasks(250))
workflow = generator.build_workflow()
workflow.write_json(pathlib.Path('cycles-workflow.json'))