In the previous post I described how we cut the time of extracting projects data from SAP. In this post I will show the new improved workflow coded in Integration Tag Language where workflows are coded in schedules. The first part of the project's schedule looks like this:
This is just initialisation of constants and logic needed for the execution of the schedule. The second part:
Extracts the PROJ table from SAP and creates an iterator with all projects. The last of these jobs ‘generate_driver’ creates a PHP array called driver0 which is the iterator used to drive the BAPI jobs. The first set of BAPI jobs are the quick ones that runs on less than 10 minutes:
These jobs run in parallel, you only insert a parallel=’yes’ directive on the job statement. The iteratorn from the generate_driver job is declared by the <mydriver> tag and iterates the BAPI once for each project in the array0 iterator. The id of the project is transferred via the @PROJECT_DEFINITION variable, as you can see inside the <rfc> section. In the <sql> section the wanted tables are declared, default are all tables, BAPIs often creates a lot of ‘bogus’ tables so we explicitly state which tables we like to import to the Data Warehouse. The next job is a bit more complex it is the long running BAPI_PROJECTS_GETINFO job:
In part one we decided to distribute the workload over 9 workers, by doing so we need to truncate the tables upfront since we do not want our 9 workers to do 9 table truncations. First we create a dummy job as a container for our jobs, which we declare parallel=’yes’ so the job run in parallel with the preceding jobs. Inside the dummy job there is a table truncate job and subsequently the BAPI extraction job. Here the iterator array0 is defined with the <forevery> tag, the iterator is split up in 9 chunks which all will be executed in parallel. The rows in each chunk are transferred as before by the <mydriver> iterator which is given a chunk by the piggyback declaration. If you study this job carefully you will see there are some very complex processing going on, if you want a more detailed description I have written a series of posts on parallel execution. I am very happy about the piggyback coupling of the iterators, by joining the iterators a complex workflow is described both succinct and eloquent.
The 5th and last part of the schedule shows a job similar to the one just described, this time we only need to run the BAPI job in two workers:
If you take the time and study this ITL workflow you will find there are some advanced parallel processing in there reducing the run time from about one and an half hour to less than 10 minutes. But so far we have used brute force to decrease the run time, by applying some amount of cleverness we can reduce the time even further and make it more stable. I hope to do this another weekend and if I do I write a post about that too.