wfcommons.wfchef

wfcommons.wfchef.chef

wfcommons.wfchef.chef.analyzer_summary(path_to_instances: pathlib.Path) Dict

Creates a dataframe with the Root Mean Square Error of the synthetic instances created based on the correspondent, w.r.t. number of tasks, real-world samples available at WfCommons WfInstances from Pegasus WMS GitHub <https://github.com/wfcommons/pegasus-instances> and from Makeflow WMS GitHub repositories.

Parameters

path_to_instances (pathlib.Path) –

Returns

Return type

Dict

wfcommons.wfchef.chef.compare_rmse(synth_graph: networkx.classes.digraph.DiGraph, real_graph: networkx.classes.digraph.DiGraph) float

Calculate the Root Mean Square Error of a synthetic instance created based on the correspondent (in number of tasks) real-world sample.

Parameters
  • synth_graph (networkX.DiGraph) – a synthetic instance created by WfCommons.

  • real_graph (networkX.DiGraph) – the correspondent (in number of tasks) real-world workflow instance.

Returns

The RMSE between the synthetic instance and the real instance.

Return type

float

wfcommons.wfchef.chef.create_recipe(path_to_instances: Union[str, pathlib.Path], savedir: pathlib.Path, wf_name: str, cutoff: int = 4000, verbose: bool = False, runs: int = 1)

Creates a recipe for a workflow application by automatically replacing custom information from the recipe skeleton.

Parameters
  • path_to_instances (str or pathlib.Path) – name (for samples available in WfCommons) or path to the real workflow instances.

  • savedir (pathlib.Path) – path to save the recipe.

  • wf_name (str) – name of the workflow application.

  • cutoff (bool) – when set, only consider instances of smaller or equal sizes.

  • verbose – when set, prints status messages.

  • verbose – number of times to repeat the err calculation process (due to randomization).

:type runs:bool

wfcommons.wfchef.chef.find_err(workflow: pathlib.Path, err_savepath: Optional[pathlib.Path] = None, always_update: Optional[bool] = False, runs: Optional[int] = 1) pandas.core.frame.DataFrame

Creates a dataframe with the Root Mean Square Error of the synthetic instances created based on the correspondent, w.r.t. number of tasks, real-world samples available at WfCommons WfInstances from Pegasus WMS GitHub <https://github.com/wfcommons/pegasus-instances> and from Makeflow WMS GitHub repositories.

Parameters
  • workflow (pathlib.Path) – name (for samples available in WfCommons) or path to the real workflow instances.

  • err_savepath – path to save the err (rmse) of all instances available into a csv.

  • always_update – flag to set if the err needs to be updated or not (True: if new instances are added, False: otherwise).

  • runs (Optional[bool]) – number of times to repeat the err calculation process (due to randomization).

Returns

dataframe with RMSE of all available instances.

Return type

pd.DataFrame

wfcommons.wfchef.chef.get_parser() argparse.ArgumentParser
wfcommons.wfchef.chef.get_recipe(recipe: str) Module
wfcommons.wfchef.chef.get_recipes() pandas.core.frame.DataFrame
wfcommons.wfchef.chef.ls_recipe()

Inspired by UNIX ls command, it lists the recipes already installed into the system and how to import it to use.

wfcommons.wfchef.chef.main()
wfcommons.wfchef.chef.uninstall_recipe(module_name: str)

Uninstalls a recipe installed in the system.

wfcommons.wfchef.duplicate

exception wfcommons.wfchef.duplicate.NoMicrostructuresError

Bases: Exception

wfcommons.wfchef.duplicate.duplicate(path: pathlib.Path, base: Union[str, pathlib.Path], num_nodes: int) networkx.classes.digraph.DiGraph

Attaches replicated nodes to base graph.

Parameters
  • path (pathlib.Path.) – path to the summary JSON file.

  • base (str or pathlib.Path.) – name (for samples available in WfCommons) or path to the specific graph to be used as base (if not set WfChef chooses the best fitting one).

  • num_nodes (int.) – total amount of nodes desired in the synthetic instance.

Returns

graph with the desired number of tasks.

Return type

networkX DiGraph.

wfcommons.wfchef.duplicate.duplicate_nodes(graph: networkx.classes.digraph.DiGraph, nodes: Set[str]) Dict

Replicates nodes of a graph.

Parameters
  • graph (networkX DiGraph) – graph used to replicate and attach new nodes.

  • nodes (Set[str].) – nodes to be replicated.

Returns

the new nodes replicated.

Return type

Dict[str].

wfcommons.wfchef.find_microstructures

exception wfcommons.wfchef.find_microstructures.ImbalancedMicrostructureError

Bases: Exception

wfcommons.wfchef.find_microstructures.comb(n: int, k: int) int

Calculates the combination of two integers.

Parameters
  • n (int.) – number.

  • k (int.) – number.

Returns

combination of two integers.

Return type

int.

wfcommons.wfchef.find_microstructures.find_microstructure(graph: networkx.classes.digraph.DiGraph, n1: str, n2: str)

Detects a pattern (microstructure).

Parameters
  • graph (netwrokX DiGraph.) – graph.

  • n1 – a node in graph.

  • n1 – a different node in graph.

Returns

sets of n1 related nodes, n2 related nodes, the nodes in common between n1 and n2 and all the nodes involved in the process.

Return type

Set[str], Set[str], Set[str], Set[str].

wfcommons.wfchef.find_microstructures.find_microstructures(graph: networkx.classes.digraph.DiGraph, verbose: bool = False)

Detects the patterns (microstructures) that are used for replication and graph expansion.

Parameters
  • graph (netwrokX DiGraph.) – graph.

  • verbose (netwrokX DiGraph.) – if set, prints status messages.

Returns

patterns (microstructures)

Return type

Set[str].

wfcommons.wfchef.find_microstructures.get_children(graph: networkx.classes.digraph.DiGraph, node: str) List[str]

Gets the children of a node.

Parameters
  • graph (netwrokX DiGraph.) – graph that contains the node.

  • node (str.) – a node.

Returns

list of the node’s children.

Return type

List[str].

wfcommons.wfchef.find_microstructures.get_parents(graph: networkx.classes.digraph.DiGraph, node: str) List[str]

Gets the parents of a node.

Parameters
  • graph (netwrokX DiGraph.) – graph that contains the node.

  • node (str.) – a node.

Returns

list of the node’s parents.

Return type

List[str].

wfcommons.wfchef.find_microstructures.get_relatives(graph: networkx.classes.digraph.DiGraph, node: str) Set[str]

Gets all node’s relatives (children and parents).

Parameters
  • graph (netwrokX DiGraph.) – graph that contains the node.

  • node (str.) – a node.

Returns

set of node’s relative.

Return type

Set[str].

wfcommons.wfchef.find_microstructures.save_microstructures(workflow_path: pathlib.Path, savedir: pathlib.Path, verbose: bool = False, img_type: Optional[str] = 'png', cutoff: int = 4000, highlight_all_instances: bool = False) List[networkx.classes.digraph.DiGraph]
wfcommons.wfchef.find_microstructures.sort_graphs(workflow_path: pathlib.Path, verbose: bool = False) List[networkx.classes.digraph.DiGraph]

Sort graphs in crescent order of number of tasks.

Parameters
  • workflow_path (pathlib.Path.) – path to the JSON instances.

  • verbose (netwrokX DiGraph.) – if set, prints status messages.

Returns

sorted graphs

Return type

List[networkX.DiGraph].

wfcommons.wfchef.utils

wfcommons.wfchef.utils.annotate(g: networkx.classes.digraph.DiGraph) None

Annotates a networkX DiGraph with metadata such as the tasks top-down type hash, bottom-up type hash, and type-hash.

Parameters

path (str or pathlib.Path.) – name (for samples available in WfCommons) or the path to graphs JSON.

Returns

annotated graph.

Return type

networkX DiGraph.

wfcommons.wfchef.utils.combine_hashes(*hashes: str) str
wfcommons.wfchef.utils.create_graph(path: pathlib.Path) networkx.classes.digraph.DiGraph

Creates a networkX DiGraph from a JSON file in the WfFormat.

Parameters

path (pathlib.Path) – name (for samples available in WfCommons) or the path to graphs JSON.

Returns

graph.

Return type

networkX DiGraph.

wfcommons.wfchef.utils.draw(g: networkx.classes.digraph.DiGraph, extension: Optional[str] = 'png', with_labels: bool = False, ax: Optional[matplotlib.axes._axes.Axes] = None, show: bool = False, save: Optional[Union[pathlib.Path, str]] = None, close: bool = False, legend: bool = False, node_size: int = 1000, linewidths: int = 5, subgraph: Set[str] = {}) Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes]

Plots a netwrokX DiGraph.

Parameters
  • g (networkX DiGraph.) – graph to be plotted.

  • extension (extension of the output file.) – str.

  • with_labels (bool.) – if set, it prints the task types over their nodes.

  • ax (plt.Axes.) – plot axes.

  • show (bool.) – if set, displays the plot on screen.

  • save (pathlib.Path.) – path to directory to save the plot.

  • close (bool.) – if set, automatically closes window that displays plot.

  • legend (bool.) – if set, displays legend of the plot.

  • node_size (int.) – size of the nodes (circles) in the plot.

  • linewidths (int.) – thickness of the edges in the plot.

  • subgraph (Set[str].) – nodes that were added by replication and will be colored green.

Returns

the figure and the axis used.

Return type

Tuple[plt.Figure, plt.Axes].

wfcommons.wfchef.utils.string_hash(obj: Hashable) str
wfcommons.wfchef.utils.type_hash(_type: str, parent_types: Iterable[str]) str

wfcommons.wfchef.skeletons.recipe

class wfcommons.wfchef.skeletons.recipe.SkeletonRecipe(data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 3, exclude_graphs: Set[str] = {}, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0, logger: Optional[logging.Logger] = None, base_method: wfcommons.wfchef.wfchef_abstract_recipe.BaseMethod = BaseMethod.ERROR_TABLE, **kwargs)

Bases: wfcommons.wfchef.wfchef_abstract_recipe.WfChefWorkflowRecipe

A Skeleton workflow recipe class for creating synthetic workflow instances.

Parameters
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).

  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.

  • exclude_graphs (Set) –

  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.

  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.

  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.

  • logger (Logger) – The logger where to log information/warning or errors (optional).

_abc_impl = <_abc_data object>