wfcommons.wfchef
wfcommons.wfchef.chef
- wfcommons.wfchef.chef.analyzer_summary(path_to_instances: pathlib.Path) Dict
Creates a dataframe with the Root Mean Square Error of the synthetic instances created based on the correspondent, w.r.t. number of tasks, real-world samples available at WfCommons WfInstances from Pegasus WMS GitHub <https://github.com/wfcommons/pegasus-instances> and from Makeflow WMS GitHub repositories.
- Parameters
path_to_instances (pathlib.Path) –
- Returns
- Return type
Dict
- wfcommons.wfchef.chef.compare_rmse(synth_graph: networkx.classes.digraph.DiGraph, real_graph: networkx.classes.digraph.DiGraph) float
Calculate the Root Mean Square Error of a synthetic instance created based on the correspondent (in number of tasks) real-world sample.
- Parameters
synth_graph (networkX.DiGraph) – a synthetic instance created by WfCommons.
real_graph (networkX.DiGraph) – the correspondent (in number of tasks) real-world workflow instance.
- Returns
The RMSE between the synthetic instance and the real instance.
- Return type
float
- wfcommons.wfchef.chef.create_recipe(path_to_instances: Union[str, pathlib.Path], savedir: pathlib.Path, wf_name: str, cutoff: int = 4000, verbose: bool = False, runs: int = 1)
Creates a recipe for a workflow application by automatically replacing custom information from the recipe skeleton.
- Parameters
path_to_instances (str or pathlib.Path) – name (for samples available in WfCommons) or path to the real workflow instances.
savedir (pathlib.Path) – path to save the recipe.
wf_name (str) – name of the workflow application.
cutoff (bool) – when set, only consider instances of smaller or equal sizes.
verbose – when set, prints status messages.
verbose – number of times to repeat the err calculation process (due to randomization).
:type runs:bool
- wfcommons.wfchef.chef.find_err(workflow: pathlib.Path, err_savepath: Optional[pathlib.Path] = None, always_update: Optional[bool] = False, runs: Optional[int] = 1) pandas.core.frame.DataFrame
Creates a dataframe with the Root Mean Square Error of the synthetic instances created based on the correspondent, w.r.t. number of tasks, real-world samples available at WfCommons WfInstances from Pegasus WMS GitHub <https://github.com/wfcommons/pegasus-instances> and from Makeflow WMS GitHub repositories.
- Parameters
workflow (pathlib.Path) – name (for samples available in WfCommons) or path to the real workflow instances.
err_savepath – path to save the err (rmse) of all instances available into a csv.
always_update – flag to set if the err needs to be updated or not (True: if new instances are added, False: otherwise).
runs (Optional[bool]) – number of times to repeat the err calculation process (due to randomization).
- Returns
dataframe with RMSE of all available instances.
- Return type
pd.DataFrame
- wfcommons.wfchef.chef.get_parser() argparse.ArgumentParser
- wfcommons.wfchef.chef.get_recipe(recipe: str) Module
- wfcommons.wfchef.chef.get_recipes() pandas.core.frame.DataFrame
- wfcommons.wfchef.chef.ls_recipe()
Inspired by UNIX ls command, it lists the recipes already installed into the system and how to import it to use.
- wfcommons.wfchef.chef.main()
- wfcommons.wfchef.chef.uninstall_recipe(module_name: str)
Uninstalls a recipe installed in the system.
wfcommons.wfchef.duplicate
- exception wfcommons.wfchef.duplicate.NoMicrostructuresError
Bases:
Exception
- wfcommons.wfchef.duplicate.duplicate(path: pathlib.Path, base: Union[str, pathlib.Path], num_nodes: int) networkx.classes.digraph.DiGraph
Attaches replicated nodes to base graph.
- Parameters
path (pathlib.Path.) – path to the summary JSON file.
base (str or pathlib.Path.) – name (for samples available in WfCommons) or path to the specific graph to be used as base (if not set WfChef chooses the best fitting one).
num_nodes (int.) – total amount of nodes desired in the synthetic instance.
- Returns
graph with the desired number of tasks.
- Return type
networkX DiGraph.
- wfcommons.wfchef.duplicate.duplicate_nodes(graph: networkx.classes.digraph.DiGraph, nodes: Set[str]) Dict
Replicates nodes of a graph.
- Parameters
graph (networkX DiGraph) – graph used to replicate and attach new nodes.
nodes (Set[str].) – nodes to be replicated.
- Returns
the new nodes replicated.
- Return type
Dict[str].
wfcommons.wfchef.find_microstructures
- exception wfcommons.wfchef.find_microstructures.ImbalancedMicrostructureError
Bases:
Exception
- wfcommons.wfchef.find_microstructures.comb(n: int, k: int) int
Calculates the combination of two integers.
- Parameters
n (int.) – number.
k (int.) – number.
- Returns
combination of two integers.
- Return type
int.
- wfcommons.wfchef.find_microstructures.find_microstructure(graph: networkx.classes.digraph.DiGraph, n1: str, n2: str)
Detects a pattern (microstructure).
- Parameters
graph (netwrokX DiGraph.) – graph.
n1 – a node in graph.
n1 – a different node in graph.
- Returns
sets of n1 related nodes, n2 related nodes, the nodes in common between n1 and n2 and all the nodes involved in the process.
- Return type
Set[str], Set[str], Set[str], Set[str].
- wfcommons.wfchef.find_microstructures.find_microstructures(graph: networkx.classes.digraph.DiGraph, verbose: bool = False)
Detects the patterns (microstructures) that are used for replication and graph expansion.
- Parameters
graph (netwrokX DiGraph.) – graph.
verbose (netwrokX DiGraph.) – if set, prints status messages.
- Returns
patterns (microstructures)
- Return type
Set[str].
- wfcommons.wfchef.find_microstructures.get_children(graph: networkx.classes.digraph.DiGraph, node: str) List[str]
Gets the children of a node.
- Parameters
graph (netwrokX DiGraph.) – graph that contains the node.
node (str.) – a node.
- Returns
list of the node’s children.
- Return type
List[str].
- wfcommons.wfchef.find_microstructures.get_parents(graph: networkx.classes.digraph.DiGraph, node: str) List[str]
Gets the parents of a node.
- Parameters
graph (netwrokX DiGraph.) – graph that contains the node.
node (str.) – a node.
- Returns
list of the node’s parents.
- Return type
List[str].
- wfcommons.wfchef.find_microstructures.get_relatives(graph: networkx.classes.digraph.DiGraph, node: str) Set[str]
Gets all node’s relatives (children and parents).
- Parameters
graph (netwrokX DiGraph.) – graph that contains the node.
node (str.) – a node.
- Returns
set of node’s relative.
- Return type
Set[str].
- wfcommons.wfchef.find_microstructures.save_microstructures(workflow_path: pathlib.Path, savedir: pathlib.Path, verbose: bool = False, img_type: Optional[str] = 'png', cutoff: int = 4000, highlight_all_instances: bool = False) List[networkx.classes.digraph.DiGraph]
- wfcommons.wfchef.find_microstructures.sort_graphs(workflow_path: pathlib.Path, verbose: bool = False) List[networkx.classes.digraph.DiGraph]
Sort graphs in crescent order of number of tasks.
- Parameters
workflow_path (pathlib.Path.) – path to the JSON instances.
verbose (netwrokX DiGraph.) – if set, prints status messages.
- Returns
sorted graphs
- Return type
List[networkX.DiGraph].
wfcommons.wfchef.utils
- wfcommons.wfchef.utils.annotate(g: networkx.classes.digraph.DiGraph) None
Annotates a networkX DiGraph with metadata such as the tasks top-down type hash, bottom-up type hash, and type-hash.
- Parameters
path (str or pathlib.Path.) – name (for samples available in WfCommons) or the path to graphs JSON.
- Returns
annotated graph.
- Return type
networkX DiGraph.
- wfcommons.wfchef.utils.combine_hashes(*hashes: str) str
- wfcommons.wfchef.utils.create_graph(path: pathlib.Path) networkx.classes.digraph.DiGraph
Creates a networkX DiGraph from a JSON file in the WfFormat.
- Parameters
path (pathlib.Path) – name (for samples available in WfCommons) or the path to graphs JSON.
- Returns
graph.
- Return type
networkX DiGraph.
- wfcommons.wfchef.utils.draw(g: networkx.classes.digraph.DiGraph, extension: Optional[str] = 'png', with_labels: bool = False, ax: Optional[matplotlib.axes._axes.Axes] = None, show: bool = False, save: Optional[Union[pathlib.Path, str]] = None, close: bool = False, legend: bool = False, node_size: int = 1000, linewidths: int = 5, subgraph: Set[str] = {}) Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes]
Plots a netwrokX DiGraph.
- Parameters
g (networkX DiGraph.) – graph to be plotted.
extension (extension of the output file.) – str.
with_labels (bool.) – if set, it prints the task types over their nodes.
ax (plt.Axes.) – plot axes.
show (bool.) – if set, displays the plot on screen.
save (pathlib.Path.) – path to directory to save the plot.
close (bool.) – if set, automatically closes window that displays plot.
legend (bool.) – if set, displays legend of the plot.
node_size (int.) – size of the nodes (circles) in the plot.
linewidths (int.) – thickness of the edges in the plot.
subgraph (Set[str].) – nodes that were added by replication and will be colored green.
- Returns
the figure and the axis used.
- Return type
Tuple[plt.Figure, plt.Axes].
- wfcommons.wfchef.utils.string_hash(obj: Hashable) str
- wfcommons.wfchef.utils.type_hash(_type: str, parent_types: Iterable[str]) str
wfcommons.wfchef.skeletons.recipe
- class wfcommons.wfchef.skeletons.recipe.SkeletonRecipe(data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 3, exclude_graphs: Set[str] = {}, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0, logger: Optional[logging.Logger] = None, base_method: wfcommons.wfchef.wfchef_abstract_recipe.BaseMethod = BaseMethod.ERROR_TABLE, **kwargs)
Bases:
wfcommons.wfchef.wfchef_abstract_recipe.WfChefWorkflowRecipe
A Skeleton workflow recipe class for creating synthetic workflow instances.
- Parameters
data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
num_tasks (int) – The upper bound for the total number of tasks in the workflow.
exclude_graphs (Set) –
runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
logger (Logger) – The logger where to log information/warning or errors (optional).
- _abc_impl = <_abc_data object>