WfChef: Workflows Recipes

WfChef is the WfCommons component that automates the construction of synthetic workflow generators for any given workflow application. The input to this component is a set of real workflow instances described in the WfFormat (e.g., instances available in WfInstances). WfChef automatically analyzes a set of real workflow instances for two purposes. First, it discovers workflow subgraphs that represent fundamental task dependency patterns. Second, it derives statistical models of the workflow tasks’ performance characteristics (see WfInstances: Workflow Instances). WfChef then outputs a recipe that will be used by WfGen (see WfGen: Generating Workflows) to generate realistic synthetic workflow instances with any arbitrary number of tasks.

Workflow Recipes

A workflow recipe is a data structure that encodes the discovered pattern occurrences as well as the statistical models of workflow task characteristics. More precisely, a recipe embodies results from statistical analysis and distribution fitting performed for each workflow task type so as to characterize task runtime and input/output data sizes. The recipes also incorporates information regarding the graph structure of the workflows (tasks dependencies and frequency of occurrences), which are automatically derived from the analysis of the workflow instances.

This Python package provides several workflow recipes (see WfCommons Workflows Recipes) for generating realistic synthetic workflow instances.

Generating Workflow Recipes

To create a recipe, WfChef analyzes the real workflow graphs in order to identify subgraphs that represent fundamental task dependency patterns. Based on the identified subgraphs and on measured task type frequencies in the real workflows, WfChef outputs a generator that can generate realistic synthetic workflow instances with an arbitrary numbers of tasks (see WfGen: Generating Workflows).

The code snippet below shows an example of how to create a recipe for the Epigenomics application:

$ wfchef create /path/to/real/instances -o ./epigenomics -v --name Epigenomics

The following flags can be used with this command:

  • -o or --out is a required flag that stands for the name of the directory to be created that is going to contain the recipe.

  • -n or --name is a required flag that stands for the name of the recipe. Tipically, the format used is ApplicationName.

  • -v or --verbose if set, activates status messages.

  • --no-install if set, does not install the recipe automatically.

  • -c or --cutoff takes a number of tasks in the samples the user wants to consider to create the recipe.

Example: --cutoff 4000, it means that all real world instances that will be consider for the creation of the recipe will have 4000 or less tasks. This is a useful flag to use when there is trust that all possible patterns present in this application can be already found in the smaller instances.

Workflow recipes are automatically installed and can be used throughout the system. WfCommons creates a Python package in the directory specified by the flag --out in which the setup.py and recipe.py files are stored. If the flag --no-install is set when creating a package for a specific application, the user will need to manually install the package before using it. The code bellow is an example of how to install/uninstall a package for an application in WfCommons:

# installing the package
$ pip install /path/to/the/package

# uninstalling a package
$ pip uninstall wfcommons.wfchef.recipes.appication_name_workflow

The snippet below shows an example of how to import the recipes:

from wfcommons.wfchef.recipes import EpigenomicsRecipe

To check which recipes are installed in a system and how to import them use:

$ wfchef ls