2012-08-23

Using MySQL SET to squeeze out material reservations from SAP with RFC_READ_TABLE.

The RESB table in SAP contains order dependent requirements of materials. The key is the reservation number (RSNUM). Over time this table will be very large and do a full table scan for active reservations is not a very good idea. The reservation number is incremented for each new reservation, so if you like to fetch reservations done after reservation 10, you only need to fetch reservations >10.  When a reservation is ‘allocated’ it will never go active again, so if you are looking for active reservation you can safely start from the earliest active reservation.  
I used these facts to set up a performant  extraction routine. We have a MySQL table containing RESB active reservations.
0021006667 is the lowest active reservation so I do not need to inspect lower RSNUM, this is a good start since I can bypass +21 million rows. But still I have more than 2 million (23053466 – 21006667) RSNUM to look at. I do not want to inspect these +2 million rows in one go, I want to chop this up in smaller intervals and process them in parallel. For this I created My SQL statements that extracts min and max RSNUM from the MySQL RESB table and chop this interval into smaller chunks.  And subsequently run all these intervals in parallel against SAP with RFC_READ TABLE, scanning for active reservations and reload the MySQL table with the results.
The first job above ‘getHiLow’ retrieves min and max RSNUM and calculates how many intervals is needed to cover the SAP RESB table from low RSNUM.  The second job ‘createDriver’ creates a result table with low and high RSNUM for each interval. Note! the MySQL limit option  in the last select which stops the cross join from go into an infinite loop. ( These SQL statements are somewhat complex and could be more elegantly expressed in a stored procedure.  But when I created these jobs I did not really understood the problem or  how to solve it. So I started to fiddle around with MySQL SET until I had something that worked well together with RFC_READ_TABLE, and this is the result. )    
The next task is to feed these intervals into RFC_READ_TABLE .                     
The job ‘loadRESB’ takes the result table (driver0) from the createDriver job, run each interval in parallel, extracts requested demands and inserts the result rows into the MySQL RESB table. If you look at the selection (the OPTION array) you see what I mean by active  reservations. ( The only appropriate index I have for this query  is RSNUM. If I had a suitable index this hole exercise would be unnecessary. )
The job takes about 50 minutes to run,(which is too much). If we look at the log we see execution 58869 ended with a bad (0,zero, boolean false) result. This was due to a timeout condition for one of the intervals. (Tasks in SAP have an execution time-limit.) To make this job go faster we can shrink the intervals, run more in parallel and/or analyze the problem and solve it.
Anyway 50 minutes is too long, something should be done to make this go faster. If you have a more performant way to extract these material requirements  please drop me a mail or add a comment.
UPDATE - A closer inspection of the log revealed the typical duration is below nine minutes, which is good enough. These extreme execution times probably shows database locking problems.
The jobs above are executed by my PHP job scheduler  .

2012-08-20

PHP, MySQL, Iterators, Jobs ,Templates and Relation sets.

There are some simple integer ranges or sets that are important in programming; especially when databases are involved:
  1. Zero
  2. Zero or more
  3. One or Zero
  4. One
  5. One or more
Before you start code you should establish relations between entities. E.g. in an web trade system you should define the relations between Products and Sales orders. Normally a customer must add at least one Product to a Sales order, a Sales order contains one or more  products. A product on the other hand may or may not have Sales orders, a product can appear on zero or more  Sales orders. We can describe the relation between products and Sales orders with two of my aforementioned sets. Of course there are more in this relation than our two  ‘relation-sets’, but it is a very good start and it is essential to sort out these set relations before you start modeling your database and logic in your system.
In my job scheduling system  an   iterator  is a table (PHP array) that determines how many times a job should execute. I found it very convenient to validate an iterator against relation set definitions, i.e.  the iterator’s rows matching a relation set. For this my iterators have a ‘rowrule’, e.g. ‘rowrule=1+’ means this iterator must have one or more rows otherwise the iterator is invalid. (Since iterators are used to traverse objects, they describes a relation in some way, I think all iterators should implement  rowrules, they are very handy.)  In this hypothetical example; If the <forevery>  iterator fails the rowrule test, the job fails.
  Here you see how the rowrule is implemented in my PHP code:
As you see apart from the relation set tests, it is also possible to specify a boolean expression which is useful but it looks awful to specify rowrule=’(“$day” ==”Monday”)’ , since this validation has nothing to do with number of rows. I have thought about implement a ‘ onlyif ’ alias to rowrule to make this look prettier. ( Onlyif  is the general versioning condition in my job scheduler.)
Note the very important GOTO statement at the end. If an iterator fails validation subsequent iterator definitions are tried until one pass the test or we run out of iterators. This opens up a varity of posibilities to implement logical conditions in jobs. If we take the first example and add an alternative <forevery> iterator we come up with an even more hypothetical example, if the first SQL iterator returns zero rows, we proceed to the next iterator which produce one row, so we run the job once with the query(s) found in ‘report_queries/NoQueries.SQL’.
Now we look at a real example, we have a mysql database table with report names and schedule event (when these reports should run like ‘DAY’, ‘MONTH’ etc.). The queries are stored in a query library with the same name as the reportname in the database table.
We like to schedule a job that run all ACTIVE=’YES’ reports for a given EVENT, but we also
like to run a single report (this is if a report fails and we must rerun the report, of course this never happens but just in case…).The result of each report should be logged.
Standard logging granularity is job level,  so we need to run each report in a separate job. To do this we must create a job per report on the fly,  for that we use  a job template . The purpose of <template>  is to generate jobs from a template with the help of a template iterator.
There is a lot under the hood when processing this schedule. There are two variants (run time parameters) both have a default value, and a validate list which the given parameter is tested against. The REPORTNAME  variant validation list is created by a SQL query! After the variants are validated, the template iterators are created and validated according to the rowrule and the right iterator is selected; now the template job definition is multiplied accordingly. Executing this schedule with event=’TEST’ gives the following expanded schedule:
As you see we found two TEST report queries CAMILLA_TEST_REPORT and CAMILLA_TEST_REPORT2.  All template definitions are replaced by comments telling the boundaries of the template. This expanded schedule is now ready for XML parsing :
This is the ‘syntax tree’ of the parsed schedule. The jobs (defined in the nested  ‘job’ array), are executed one by one and the SQL queries defined by the ‘file’ entry are submitted to MySQL and the result is processed. At the end the result of the execution is logged in a MySQL database tables.
And the log entries:
If you have read my post about my ideas about return codes  you know ‘1’ is success. If you follow the execution of this schedule you see there is quite some logic and processing packed in this tiny XML script (here is  the uncluttered version).
Now I will try to use the same technique on a more advanced application. A report application that sends Excel reports as email attachments, for this I need to build a more complex database with reports, recipients and scheduled events.  If I succeed I post about it.

2012-08-05

Amazon got recommendations for me

A few weeks ago I happened to hear a cool recording of Baby please don’t go  by Muddy Waters and The Rolling Stones   http://www.youtube.com/watch?v=Mbao_laqF8E

Inspired by the music I spent some  hours listening to these guys and others favourites like Lightnin’ Hopkins and Champion Jack Dupree mostly via YouTube.

Then this week I got recommendations from Amazon:

Wow, CJD isn’t the most popular artist these days, Amazon must have found my interest in CJD via me looking for him on Youtube . (I’m pretty sure I have not searched for CJD on Amazon. I also wrote a mail where I mention CJD, but Amazon hopefully do not scan my mails that would be a step too far :-)

Anyway this recommendation probably comes from  some Business Intelligence activity.Of course I was curious how Amazon found out of my interest in CJD, so I sent a query to Amazon:

08/01/12 22:19:46

Your Name: Lars Johansson

Comments: I'm curious about your recommendation to buy CD's by Champion Jack Dupree.<p></p>I would appreciate an explanation of the process resulting in you recommending me to buy Champion Jack Dupree CDs.

And I got this response:

Hello,

I'm sorry to hear you're unhappy with some of the products that were recommended for you on Amazon. To prevent certain purchases from affecting your recommendations, please follow the instructions here:

http://www.amazon.com/gp/help/customer/display.html?nodeId=13316081#improving

To prevent certain items you've viewed recently from affecting your recommendations, please follow the instructions here:

http://www.amazon.com/gp/help/customer/display.html?nodeId=13316081#yourbrowsing

If you don't want to see our recommendations, you can log out of your Amazon.com account by moving your cursor to the "Your Account" menu at the top of the home page and selecting "Sign Out" from the expanded list of options. On the next page, leave the e-mail and password spaces blank and click the Amazon.com logo. Your recommendations won't appear on the website until the next time you sign into your Amazon.com account.

I hope this helps! We look forward to seeing you again soon.

Thank you for your inquiry. Did I solve your problem?

If yes, please click here:

http://www.amazon.com/gp/help/survey?p=A3BEERLL87AJC2&k=hy

If no, please click here:

http://www.amazon.com/gp/help/survey?p=A3BEERLL87AJC2&k=hn

 

But I was not unhappy  about the recommendation, I was curious how they found my interest for CJD, so I once again asked Amazon how they did. I’m still waiting for Amazon’s reply.


Working with Business Intelligence I have analyzed our customers behavior to come up with Sales Leads. I’m not using some sophisticated web scanner, I only try to analyze existing customers purchases e.g. have the customer bought spare parts but no service? When did they buy our products? Have they bought the new model? etc. But we have plans to be more active and come up with better Sales Leads and targeted advertisements using our Sales and Marketing statistics.

Many years ago I worked as a DBA, administering the Sales database of a TV-shop. I got complaints from the TV-shop, that the database was down early Sunday mornings. I took down the database for housekeeping like reorg etc. Sunday morning 04.00, and the database was down for an hour. To my surprise I was told there are many (lonely?) drunkards buying like crazy the night between Saturday and Sunday, this was a respectable TV-shop selling normal TV-shop merchandise . So I had to reschedule the jobs to the night after between Sunday to Monday, which was a low activity night.

P.s.

If Amazon’s recommendation had come the night while I was listening to CJD I would probably had bought some music. Business Intelligence is very much a question of speed. The analysts need the reports now, not tomorrow.