wfcommons.wfchef
wfcommons.wfchef.chef
- wfcommons.wfchef.chef.analyzer_summary(path_to_instances: Path) Dict
Creates a dataframe with the Root Mean Square Error of the synthetic instances created based on the correspondent, w.r.t. number of tasks, real-world samples available at WfCommons WfInstances from Pegasus WMS GitHub <https://github.com/wfcommons/pegasus-instances> and from Makeflow WMS GitHub repositories.
- Parameters:
path_to_instances (pathlib.Path)
- Returns:
- Return type:
Dict
- wfcommons.wfchef.chef.compare_rmse(synth_graph: DiGraph, real_graph: DiGraph) float
Calculate the Root Mean Square Error of a synthetic instance created based on the correspondent (in number of tasks) real-world sample.
- Parameters:
synth_graph (networkX.DiGraph) – a synthetic instance created by WfCommons.
real_graph (networkX.DiGraph) – the correspondent (in number of tasks) real-world workflow instance.
- Returns:
The RMSE between the synthetic instance and the real instance.
- Return type:
float
- wfcommons.wfchef.chef.create_recipe(path_to_instances: str | Path, savedir: Path, wf_name: str, cutoff: int = 4000, verbose: bool = False, runs: int = 1)
Creates a recipe for a workflow application by automatically replacing custom information from the recipe skeleton.
- Parameters:
path_to_instances (str or pathlib.Path) – name (for samples available in WfCommons) or path to the real workflow instances.
savedir (pathlib.Path) – path to save the recipe.
wf_name (str) – name of the workflow application.
cutoff (bool) – when set, only consider instances of smaller or equal sizes.
verbose – when set, prints status messages.
verbose – number of times to repeat the err calculation process (due to randomization).
:type runs:bool
- wfcommons.wfchef.chef.find_err(workflow: Path, err_savepath: Path | None = None, always_update: bool | None = False, runs: int | None = 1) DataFrame
Creates a dataframe with the Root Mean Square Error of the synthetic instances created based on the correspondent, w.r.t. number of tasks, real-world samples available at WfCommons WfInstances from Pegasus WMS GitHub <https://github.com/wfcommons/pegasus-instances> and from Makeflow WMS GitHub repositories.
- Parameters:
workflow (pathlib.Path) – name (for samples available in WfCommons) or path to the real workflow instances.
err_savepath – path to save the err (rmse) of all instances available into a csv.
always_update – flag to set if the err needs to be updated or not (True: if new instances are added, False: otherwise).
runs (Optional[bool]) – number of times to repeat the err calculation process (due to randomization).
- Returns:
dataframe with RMSE of all available instances.
- Return type:
pd.DataFrame
- wfcommons.wfchef.chef.get_parser() ArgumentParser
- wfcommons.wfchef.chef.get_recipe(recipe: str) Module
- wfcommons.wfchef.chef.get_recipes() DataFrame
- wfcommons.wfchef.chef.ls_recipe()
Inspired by UNIX ls command, it lists the recipes already installed into the system and how to import it to use.
- wfcommons.wfchef.chef.main()
- wfcommons.wfchef.chef.uninstall_recipe(module_name: str, savedir: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/wfcommons/checkouts/latest/wfcommons/wfchef/recipes'))
Uninstalls a recipe installed in the system.
wfcommons.wfchef.wfchef_abstract_recipe
- class wfcommons.wfchef.wfchef_abstract_recipe.BaseMethod(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
- BIGGEST = 2
- ERROR_TABLE = 0
- RANDOM = 3
- SMALLEST = 1
- class wfcommons.wfchef.wfchef_abstract_recipe.WfChefWorkflowRecipe(name: str, data_footprint: int | None, num_tasks: int | None, exclude_graphs: Set[str] = {}, runtime_factor: float | None = 1.0, input_file_size_factor: float | None = 1.0, output_file_size_factor: float | None = 1.0, logger: Logger | None = None, this_dir: str | Path = None, base_method: Enum | None = BaseMethod.ERROR_TABLE)
Bases:
WorkflowRecipe
An abstract class of workflow recipes for creating synthetic workflow instances.
- Parameters:
name (str) – The workflow recipe name.
data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
num_tasks (int) – The upper bound for the total number of tasks in the workflow.
runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
logger (Logger) – The logger where to log information/warning or errors (optional).
- _abc_impl = <_abc._abc_data object>
- _load_base_graph() DiGraph
- _load_microstructures() Dict
- _workflow_recipe() Dict[str, Any]
Recipe for generating synthetic instances for a workflow. Recipes can be generated by using the
InstanceAnalyzer
.- Returns:
A recipe in the form of a dictionary in which keys are task prefixes.
- Return type:
Dict[str, Any]
- build_workflow(workflow_name: str | None = None) Workflow
Generate a synthetic workflow instance.
- Parameters:
workflow_name (int) – The workflow name
- Returns:
A synthetic workflow instance object.
- Return type:
- classmethod from_num_tasks(num_tasks: int, exclude_graphs: Set[str] = {}, runtime_factor: float | None = 1.0, input_file_size_factor: float | None = 1.0, output_file_size_factor: float | None = 1.0) WfChefWorkflowRecipe
Instantiate a workflow recipe that will generate synthetic workflows up to the total number of tasks provided.
- Parameters:
num_tasks (int) – The upper bound for the total number of tasks in the workflow.
exclude_graphs (Set)
runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
- Returns:
A workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.
- Return type:
WfChefWorkflowRecipe
- generate_nx_graph() DiGraph
wfcommons.wfchef.duplicate
- exception wfcommons.wfchef.duplicate.NoMicrostructuresError
Bases:
Exception
- wfcommons.wfchef.duplicate.duplicate(path: Path, base: str | Path, num_nodes: int) DiGraph
Attaches replicated nodes to base graph.
- Parameters:
path (pathlib.Path.) – path to the summary JSON file.
base (str or pathlib.Path.) – name (for samples available in WfCommons) or path to the specific graph to be used as base (if not set WfChef chooses the best fitting one).
num_nodes (int.) – total amount of nodes desired in the synthetic instance.
- Returns:
graph with the desired number of tasks.
- Return type:
networkX DiGraph.
- wfcommons.wfchef.duplicate.duplicate_nodes(graph: DiGraph, nodes: Set[str]) Dict
Replicates nodes of a graph.
- Parameters:
graph (networkX DiGraph) – graph used to replicate and attach new nodes.
nodes (Set[str].) – nodes to be replicated.
- Returns:
the new nodes replicated.
- Return type:
Dict[str].
wfcommons.wfchef.find_microstructures
- exception wfcommons.wfchef.find_microstructures.ImbalancedMicrostructureError
Bases:
Exception
- wfcommons.wfchef.find_microstructures.comb(n: int, k: int) int
Calculates the combination of two integers.
- Parameters:
n (int.) – number.
k (int.) – number.
- Returns:
combination of two integers.
- Return type:
int.
- wfcommons.wfchef.find_microstructures.find_microstructure(graph: DiGraph, n1: str, n2: str)
Detects a pattern (microstructure).
- Parameters:
graph (netwrokX DiGraph.) – graph.
n1 – a node in graph.
n1 – a different node in graph.
- Returns:
sets of n1 related nodes, n2 related nodes, the nodes in common between n1 and n2 and all the nodes involved in the process.
- Return type:
Set[str], Set[str], Set[str], Set[str].
- wfcommons.wfchef.find_microstructures.find_microstructures(graph: DiGraph, verbose: bool = False)
Detects the patterns (microstructures) that are used for replication and graph expansion.
- Parameters:
graph (netwrokX DiGraph.) – graph.
verbose (netwrokX DiGraph.) – if set, prints status messages.
- Returns:
patterns (microstructures)
- Return type:
Set[str].
- wfcommons.wfchef.find_microstructures.get_children(graph: DiGraph, node: str) List[str]
Gets the children of a node.
- Parameters:
graph (netwrokX DiGraph.) – graph that contains the node.
node (str.) – a node.
- Returns:
list of the node’s children.
- Return type:
List[str].
- wfcommons.wfchef.find_microstructures.get_parents(graph: DiGraph, node: str) List[str]
Gets the parents of a node.
- Parameters:
graph (netwrokX DiGraph.) – graph that contains the node.
node (str.) – a node.
- Returns:
list of the node’s parents.
- Return type:
List[str].
- wfcommons.wfchef.find_microstructures.get_relatives(graph: DiGraph, node: str) Set[str]
Gets all node’s relatives (children and parents).
- Parameters:
graph (netwrokX DiGraph.) – graph that contains the node.
node (str.) – a node.
- Returns:
set of node’s relative.
- Return type:
Set[str].
- wfcommons.wfchef.find_microstructures.save_microstructures(workflow_path: Path, savedir: Path, verbose: bool = False, img_type: str | None = 'png', cutoff: int = 4000, highlight_all_instances: bool = False) List[DiGraph]
- wfcommons.wfchef.find_microstructures.sort_graphs(workflow_path: Path, verbose: bool = False) List[DiGraph]
Sort graphs in crescent order of number of tasks.
- Parameters:
workflow_path (pathlib.Path.) – path to the JSON instances.
verbose (netwrokX DiGraph.) – if set, prints status messages.
- Returns:
sorted graphs
- Return type:
List[networkX.DiGraph].
wfcommons.wfchef.utils
- wfcommons.wfchef.utils.annotate(g: DiGraph) None
Annotates a networkX DiGraph with metadata such as the tasks top-down type hash, bottom-up type hash, and type-hash.
- Parameters:
path (str or pathlib.Path.) – name (for samples available in WfCommons) or the path to graphs JSON.
- Returns:
annotated graph.
- Return type:
networkX DiGraph.
- wfcommons.wfchef.utils.combine_hashes(*hashes: str) str
- wfcommons.wfchef.utils.create_graph(path: Path) DiGraph
Creates a networkX DiGraph from a JSON file in the WfFormat.
- Parameters:
path (pathlib.Path) – name (for samples available in WfCommons) or the path to graphs JSON.
- Returns:
graph.
- Return type:
networkX DiGraph.
- wfcommons.wfchef.utils.draw(g: DiGraph, extension: str | None = 'png', with_labels: bool = False, ax: Axes | None = None, show: bool = False, save: Path | str | None = None, close: bool = False, legend: bool = False, node_size: int = 1000, linewidths: int = 5, subgraph: Set[str] = {}) Tuple[Figure, Axes]
Plots a netwrokX DiGraph.
- Parameters:
g (networkX DiGraph.) – graph to be plotted.
extension (extension of the output file.) – str.
with_labels (bool.) – if set, it prints the task types over their nodes.
ax (plt.Axes.) – plot axes.
show (bool.) – if set, displays the plot on screen.
save (pathlib.Path.) – path to directory to save the plot.
close (bool.) – if set, automatically closes window that displays plot.
legend (bool.) – if set, displays legend of the plot.
node_size (int.) – size of the nodes (circles) in the plot.
linewidths (int.) – thickness of the edges in the plot.
subgraph (Set[str].) – nodes that were added by replication and will be colored green.
- Returns:
the figure and the axis used.
- Return type:
Tuple[plt.Figure, plt.Axes].
- wfcommons.wfchef.utils.string_hash(obj: Hashable) str
- wfcommons.wfchef.utils.type_hash(_type: str, parent_types: Iterable[str]) str
wfcommons.wfchef.skeletons.recipe
- class wfcommons.wfchef.skeletons.recipe.SkeletonRecipe(data_footprint: int | None = 0, num_tasks: int | None = 3, exclude_graphs: Set[str] = {}, runtime_factor: float | None = 1.0, input_file_size_factor: float | None = 1.0, output_file_size_factor: float | None = 1.0, logger: Logger | None = None, base_method: BaseMethod = BaseMethod.ERROR_TABLE, **kwargs)
Bases:
WfChefWorkflowRecipe
A Skeleton workflow recipe class for creating synthetic workflow instances.
- Parameters:
data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
num_tasks (int) – The upper bound for the total number of tasks in the workflow.
exclude_graphs (Set)
runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
logger (Logger) – The logger where to log information/warning or errors (optional).
- _abc_impl = <_abc._abc_data object>