wfcommons.wfchef

wfcommons.wfchef.chef

wfcommons.wfchef.chef.analyzer_summary(path_to_instances: Path) Dict

Creates a dataframe with the Root Mean Square Error of the synthetic instances created based on the correspondent, w.r.t. number of tasks, real-world samples available at WfCommons WfInstances from Pegasus WMS GitHub <https://github.com/wfcommons/pegasus-instances> and from Makeflow WMS GitHub repositories.

Parameters:

path_to_instances (pathlib.Path)

Returns:

Return type:

Dict

wfcommons.wfchef.chef.compare_rmse(synth_graph: DiGraph, real_graph: DiGraph) float

Calculate the Root Mean Square Error of a synthetic instance created based on the correspondent (in number of tasks) real-world sample.

Parameters:
  • synth_graph (networkX.DiGraph) – a synthetic instance created by WfCommons.

  • real_graph (networkX.DiGraph) – the correspondent (in number of tasks) real-world workflow instance.

Returns:

The RMSE between the synthetic instance and the real instance.

Return type:

float

wfcommons.wfchef.chef.create_recipe(path_to_instances: str | Path, savedir: Path, wf_name: str, cutoff: int = 4000, verbose: bool = False, runs: int = 1)

Creates a recipe for a workflow application by automatically replacing custom information from the recipe skeleton.

Parameters:
  • path_to_instances (str or pathlib.Path) – name (for samples available in WfCommons) or path to the real workflow instances.

  • savedir (pathlib.Path) – path to save the recipe.

  • wf_name (str) – name of the workflow application.

  • cutoff (bool) – when set, only consider instances of smaller or equal sizes.

  • verbose – when set, prints status messages.

  • verbose – number of times to repeat the err calculation process (due to randomization).

:type runs:bool

wfcommons.wfchef.chef.find_err(workflow: Path, err_savepath: Path | None = None, always_update: bool | None = False, runs: int | None = 1) DataFrame

Creates a dataframe with the Root Mean Square Error of the synthetic instances created based on the correspondent, w.r.t. number of tasks, real-world samples available at WfCommons WfInstances from Pegasus WMS GitHub <https://github.com/wfcommons/pegasus-instances> and from Makeflow WMS GitHub repositories.

Parameters:
  • workflow (pathlib.Path) – name (for samples available in WfCommons) or path to the real workflow instances.

  • err_savepath – path to save the err (rmse) of all instances available into a csv.

  • always_update – flag to set if the err needs to be updated or not (True: if new instances are added, False: otherwise).

  • runs (Optional[bool]) – number of times to repeat the err calculation process (due to randomization).

Returns:

dataframe with RMSE of all available instances.

Return type:

pd.DataFrame

wfcommons.wfchef.chef.get_parser() ArgumentParser
wfcommons.wfchef.chef.get_recipe(recipe: str) Module
wfcommons.wfchef.chef.get_recipes() DataFrame
wfcommons.wfchef.chef.ls_recipe()

Inspired by UNIX ls command, it lists the recipes already installed into the system and how to import it to use.

wfcommons.wfchef.chef.main()
wfcommons.wfchef.chef.uninstall_recipe(module_name: str, savedir: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/wfcommons/checkouts/latest/wfcommons/wfchef/recipes'))

Uninstalls a recipe installed in the system.

wfcommons.wfchef.wfchef_abstract_recipe

class wfcommons.wfchef.wfchef_abstract_recipe.BaseMethod(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

BIGGEST = 2
ERROR_TABLE = 0
RANDOM = 3
SMALLEST = 1
class wfcommons.wfchef.wfchef_abstract_recipe.WfChefWorkflowRecipe(name: str, data_footprint: int | None, num_tasks: int | None, exclude_graphs: Set[str] = {}, runtime_factor: float | None = 1.0, input_file_size_factor: float | None = 1.0, output_file_size_factor: float | None = 1.0, logger: Logger | None = None, this_dir: str | Path = None, base_method: Enum | None = BaseMethod.ERROR_TABLE)

Bases: WorkflowRecipe

An abstract class of workflow recipes for creating synthetic workflow instances.

Parameters:
  • name (str) – The workflow recipe name.

  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).

  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.

  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.

  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.

  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.

  • logger (Logger) – The logger where to log information/warning or errors (optional).

_abc_impl = <_abc._abc_data object>
_load_base_graph() DiGraph
_load_microstructures() Dict
_workflow_recipe() Dict[str, Any]

Recipe for generating synthetic instances for a workflow. Recipes can be generated by using the InstanceAnalyzer.

Returns:

A recipe in the form of a dictionary in which keys are task prefixes.

Return type:

Dict[str, Any]

build_workflow(workflow_name: str | None = None) Workflow

Generate a synthetic workflow instance.

Parameters:

workflow_name (int) – The workflow name

Returns:

A synthetic workflow instance object.

Return type:

Workflow

classmethod from_num_tasks(num_tasks: int, exclude_graphs: Set[str] = {}, runtime_factor: float | None = 1.0, input_file_size_factor: float | None = 1.0, output_file_size_factor: float | None = 1.0) WfChefWorkflowRecipe

Instantiate a workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.

  • exclude_graphs (Set)

  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.

  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.

  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.

Returns:

A workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

WfChefWorkflowRecipe

generate_nx_graph() DiGraph

wfcommons.wfchef.duplicate

exception wfcommons.wfchef.duplicate.NoMicrostructuresError

Bases: Exception

wfcommons.wfchef.duplicate.duplicate(path: Path, base: str | Path, num_nodes: int) DiGraph

Attaches replicated nodes to base graph.

Parameters:
  • path (pathlib.Path.) – path to the summary JSON file.

  • base (str or pathlib.Path.) – name (for samples available in WfCommons) or path to the specific graph to be used as base (if not set WfChef chooses the best fitting one).

  • num_nodes (int.) – total amount of nodes desired in the synthetic instance.

Returns:

graph with the desired number of tasks.

Return type:

networkX DiGraph.

wfcommons.wfchef.duplicate.duplicate_nodes(graph: DiGraph, nodes: Set[str]) Dict

Replicates nodes of a graph.

Parameters:
  • graph (networkX DiGraph) – graph used to replicate and attach new nodes.

  • nodes (Set[str].) – nodes to be replicated.

Returns:

the new nodes replicated.

Return type:

Dict[str].

wfcommons.wfchef.find_microstructures

exception wfcommons.wfchef.find_microstructures.ImbalancedMicrostructureError

Bases: Exception

wfcommons.wfchef.find_microstructures.comb(n: int, k: int) int

Calculates the combination of two integers.

Parameters:
  • n (int.) – number.

  • k (int.) – number.

Returns:

combination of two integers.

Return type:

int.

wfcommons.wfchef.find_microstructures.find_microstructure(graph: DiGraph, n1: str, n2: str)

Detects a pattern (microstructure).

Parameters:
  • graph (netwrokX DiGraph.) – graph.

  • n1 – a node in graph.

  • n1 – a different node in graph.

Returns:

sets of n1 related nodes, n2 related nodes, the nodes in common between n1 and n2 and all the nodes involved in the process.

Return type:

Set[str], Set[str], Set[str], Set[str].

wfcommons.wfchef.find_microstructures.find_microstructures(graph: DiGraph, verbose: bool = False)

Detects the patterns (microstructures) that are used for replication and graph expansion.

Parameters:
  • graph (netwrokX DiGraph.) – graph.

  • verbose (netwrokX DiGraph.) – if set, prints status messages.

Returns:

patterns (microstructures)

Return type:

Set[str].

wfcommons.wfchef.find_microstructures.get_children(graph: DiGraph, node: str) List[str]

Gets the children of a node.

Parameters:
  • graph (netwrokX DiGraph.) – graph that contains the node.

  • node (str.) – a node.

Returns:

list of the node’s children.

Return type:

List[str].

wfcommons.wfchef.find_microstructures.get_parents(graph: DiGraph, node: str) List[str]

Gets the parents of a node.

Parameters:
  • graph (netwrokX DiGraph.) – graph that contains the node.

  • node (str.) – a node.

Returns:

list of the node’s parents.

Return type:

List[str].

wfcommons.wfchef.find_microstructures.get_relatives(graph: DiGraph, node: str) Set[str]

Gets all node’s relatives (children and parents).

Parameters:
  • graph (netwrokX DiGraph.) – graph that contains the node.

  • node (str.) – a node.

Returns:

set of node’s relative.

Return type:

Set[str].

wfcommons.wfchef.find_microstructures.save_microstructures(workflow_path: Path, savedir: Path, verbose: bool = False, img_type: str | None = 'png', cutoff: int = 4000, highlight_all_instances: bool = False) List[DiGraph]
wfcommons.wfchef.find_microstructures.sort_graphs(workflow_path: Path, verbose: bool = False) List[DiGraph]

Sort graphs in crescent order of number of tasks.

Parameters:
  • workflow_path (pathlib.Path.) – path to the JSON instances.

  • verbose (netwrokX DiGraph.) – if set, prints status messages.

Returns:

sorted graphs

Return type:

List[networkX.DiGraph].

wfcommons.wfchef.utils

wfcommons.wfchef.utils.annotate(g: DiGraph) None

Annotates a networkX DiGraph with metadata such as the tasks top-down type hash, bottom-up type hash, and type-hash.

Parameters:

path (str or pathlib.Path.) – name (for samples available in WfCommons) or the path to graphs JSON.

Returns:

annotated graph.

Return type:

networkX DiGraph.

wfcommons.wfchef.utils.combine_hashes(*hashes: str) str
wfcommons.wfchef.utils.create_graph(path: Path) DiGraph

Creates a networkX DiGraph from a JSON file in the WfFormat.

Parameters:

path (pathlib.Path) – name (for samples available in WfCommons) or the path to graphs JSON.

Returns:

graph.

Return type:

networkX DiGraph.

wfcommons.wfchef.utils.draw(g: DiGraph, extension: str | None = 'png', with_labels: bool = False, ax: Axes | None = None, show: bool = False, save: Path | str | None = None, close: bool = False, legend: bool = False, node_size: int = 1000, linewidths: int = 5, subgraph: Set[str] = {}) Tuple[Figure, Axes]

Plots a netwrokX DiGraph.

Parameters:
  • g (networkX DiGraph.) – graph to be plotted.

  • extension (extension of the output file.) – str.

  • with_labels (bool.) – if set, it prints the task types over their nodes.

  • ax (plt.Axes.) – plot axes.

  • show (bool.) – if set, displays the plot on screen.

  • save (pathlib.Path.) – path to directory to save the plot.

  • close (bool.) – if set, automatically closes window that displays plot.

  • legend (bool.) – if set, displays legend of the plot.

  • node_size (int.) – size of the nodes (circles) in the plot.

  • linewidths (int.) – thickness of the edges in the plot.

  • subgraph (Set[str].) – nodes that were added by replication and will be colored green.

Returns:

the figure and the axis used.

Return type:

Tuple[plt.Figure, plt.Axes].

wfcommons.wfchef.utils.string_hash(obj: Hashable) str
wfcommons.wfchef.utils.type_hash(_type: str, parent_types: Iterable[str]) str

wfcommons.wfchef.skeletons.recipe

class wfcommons.wfchef.skeletons.recipe.SkeletonRecipe(data_footprint: int | None = 0, num_tasks: int | None = 3, exclude_graphs: Set[str] = {}, runtime_factor: float | None = 1.0, input_file_size_factor: float | None = 1.0, output_file_size_factor: float | None = 1.0, logger: Logger | None = None, base_method: BaseMethod = BaseMethod.ERROR_TABLE, **kwargs)

Bases: WfChefWorkflowRecipe

A Skeleton workflow recipe class for creating synthetic workflow instances.

Parameters:
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).

  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.

  • exclude_graphs (Set)

  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.

  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.

  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.

  • logger (Logger) – The logger where to log information/warning or errors (optional).

_abc_impl = <_abc._abc_data object>