Workflow steps can always be run sequentially one step after another, we run steps in parallel to reduce wall clock time. Parallel execution of workflow steps makes the workflow much more complex and should be avoided.
So far in my serie on parallel execution of workflows I have described parallel execution of entire workflows or relatively simple parallel dependencies between workflow steps in my jobscheduler AWAP (Advanced Workflow Administrations Processor), where workflows are called schedules and steps are called jobs .
Before you read this post you should enjoy the previous post . In this post I will extend parallel execution a little by introducing nested jobs to allow strict sequential execution within parallel running jobs and submit schedule to start other workflows (a)synchronously .
You can achieve the correct result by sequential execution or setting up prerequisites for individual jobs. But it is much easier (and cleaner) to nest jobs. This is the representation of Workflow 3 in my jobscheduler:
If you compare this schedule with example 1 in part 1 you may notice I stripped away some default directives like parallel=’no’ on job C,D and E. So within job A, job C and D is executed sequentially
, and job A is not done until C and D have finished.
This is a graphical representation of the schedule:
This picture is created by a PHP ad hoc program I did for fun generating dot code for graphviz . I had planned to publish the code here as an example of bad PHP code but when I looked at it it’s some 500 lines and that is little to much .
This is the log extract from running this schedule:
If you look at the log and follow the execution of jobs you see job A and B starts in parallel, the C and D executes sequentially, after D has finished A finish. When B is finished the external schedule examplePP1 is submitted for execution and job E can start, since it’s predecessors A and B has successfully finished.
By combining the parallel techniques in these ‘parallel’ posts, you can pretty much create any type of parallel workflow pattern imaginable. But why then should parallel (or complex workflows) be avoided (as stated above)? If you have ever tried to mend a crashed complex JCL mainframe workflow 03:30 in the morning, or a standstill complex SAP workflow you would not ask the question. But real life is complex and sometimes those complexities can not be avoided, and then it’s nice to express those complex execution dependencies in simple yet powerful rules.
In the next exciting post on parallel processing of workflows, I introduce iterators that allow me to cut up a too big job into right sized pieces and parallel process those pieces. Map and reduce, event driven parallel execution with queue handling.