I am restructuring some long running Data Warehouse extraction workflows. In total these workflows takes some 4 hours today, which is ridiculous. One part of restructuring the workflows is to use modern Integration Tag Language idioms, so newcomers can understand, the present archaic syntax is a bit tricky. I have rewritten the second part in much the same way as the first.
So far I cut down execution time from 4 hours to 30 minutes. This is achieved by parallelizing the jobs running the BAPIs. I have rewritten the last parts of the workflow much in the same way I rewrote the first part.
The result is good, but not good enough Still the runtime is dependent on the amount of objects defined in SAP, in a few years when projects have doubled so will the runtime. I strongly advocate full load over delta load, since full load is much simpler to set up and is self healing if something goes wrong. But here is a case when full load is not performant enough, 30 minutes and growing by the day. I will rewrite these workflows from full load into a hybrid delta load where I aim at a more stable run time below 10 minutes.
One job in the last rewrite is of interest: SAP information BAPIs are structured with one BAPI giving a list of keys to all objects and then you have an army of BAPIs giving detailed information about individual objects. BAPI_NETWORK_GETINFO is a bit different it takes an array of network identities and respond with detail info of all objects in one go, here the € list operator comes to the rescue, it takes a PHP array and reformats it into a RFC import array.
The BAPI_NETWORK_GETINFO is run once for all networks in sequence.
- The <forevery> job iterator creates the iterator from the NWDRVR mysql table.
- Then runs the BAPI_NETWORK_GETINFO BAPI for each row in the SQL result table one by one. (Addressed by @J_DIR/driver1)
- Lastly stores all results in corresponding MYSQL tables
If the list of network objects is large enough you have to spit the array into chunks and execute them separate to overcome SAP session limits and performance problems. We have some 9000 projects and that is to many in our environment to execute in one go.
A small rewrite of the job will split the SQL result into 8 chunks and distribute them over separate workers and execute the in parallel:
Here BAPI_NETWORK_GETINFO is run in 8 parallel workers.
- The <forevery> iterator splits the SQL result into 8 chunks, each chunk is executed by a separate worker in parallel
- Each worker then runs the BAPI_NETWORK_GETINFO BAPI for each row in the SQL result table of the worker one by one. (Addressed by @R_DIR/driver1)
- Lastly each worker store all results in corresponding MYSQL tables
With this slight modification the run time for the job is cut by a factor of 8. This is really parallel programing made easy. Compare this with visual workflow programing so popular today, I think you will agree with me this is easier to set up.
No comments:
Post a Comment