2012-08-20

PHP, MySQL, Iterators, Jobs ,Templates and Relation sets.

There are some simple integer ranges or sets that are important in programming; especially when databases are involved:
  1. Zero
  2. Zero or more
  3. One or Zero
  4. One
  5. One or more
Before you start code you should establish relations between entities. E.g. in an web trade system you should define the relations between Products and Sales orders. Normally a customer must add at least one Product to a Sales order, a Sales order contains one or more  products. A product on the other hand may or may not have Sales orders, a product can appear on zero or more  Sales orders. We can describe the relation between products and Sales orders with two of my aforementioned sets. Of course there are more in this relation than our two  ‘relation-sets’, but it is a very good start and it is essential to sort out these set relations before you start modeling your database and logic in your system.
In my job scheduling system  an   iterator  is a table (PHP array) that determines how many times a job should execute. I found it very convenient to validate an iterator against relation set definitions, i.e.  the iterator’s rows matching a relation set. For this my iterators have a ‘rowrule’, e.g. ‘rowrule=1+’ means this iterator must have one or more rows otherwise the iterator is invalid. (Since iterators are used to traverse objects, they describes a relation in some way, I think all iterators should implement  rowrules, they are very handy.)  In this hypothetical example; If the <forevery>  iterator fails the rowrule test, the job fails.
  Here you see how the rowrule is implemented in my PHP code:
As you see apart from the relation set tests, it is also possible to specify a boolean expression which is useful but it looks awful to specify rowrule=’(“$day” ==”Monday”)’ , since this validation has nothing to do with number of rows. I have thought about implement a ‘ onlyif ’ alias to rowrule to make this look prettier. ( Onlyif  is the general versioning condition in my job scheduler.)
Note the very important GOTO statement at the end. If an iterator fails validation subsequent iterator definitions are tried until one pass the test or we run out of iterators. This opens up a varity of posibilities to implement logical conditions in jobs. If we take the first example and add an alternative <forevery> iterator we come up with an even more hypothetical example, if the first SQL iterator returns zero rows, we proceed to the next iterator which produce one row, so we run the job once with the query(s) found in ‘report_queries/NoQueries.SQL’.
Now we look at a real example, we have a mysql database table with report names and schedule event (when these reports should run like ‘DAY’, ‘MONTH’ etc.). The queries are stored in a query library with the same name as the reportname in the database table.
We like to schedule a job that run all ACTIVE=’YES’ reports for a given EVENT, but we also
like to run a single report (this is if a report fails and we must rerun the report, of course this never happens but just in case…).The result of each report should be logged.
Standard logging granularity is job level,  so we need to run each report in a separate job. To do this we must create a job per report on the fly,  for that we use  a job template . The purpose of <template>  is to generate jobs from a template with the help of a template iterator.
There is a lot under the hood when processing this schedule. There are two variants (run time parameters) both have a default value, and a validate list which the given parameter is tested against. The REPORTNAME  variant validation list is created by a SQL query! After the variants are validated, the template iterators are created and validated according to the rowrule and the right iterator is selected; now the template job definition is multiplied accordingly. Executing this schedule with event=’TEST’ gives the following expanded schedule:
As you see we found two TEST report queries CAMILLA_TEST_REPORT and CAMILLA_TEST_REPORT2.  All template definitions are replaced by comments telling the boundaries of the template. This expanded schedule is now ready for XML parsing :
This is the ‘syntax tree’ of the parsed schedule. The jobs (defined in the nested  ‘job’ array), are executed one by one and the SQL queries defined by the ‘file’ entry are submitted to MySQL and the result is processed. At the end the result of the execution is logged in a MySQL database tables.
And the log entries:
If you have read my post about my ideas about return codes  you know ‘1’ is success. If you follow the execution of this schedule you see there is quite some logic and processing packed in this tiny XML script (here is  the uncluttered version).
Now I will try to use the same technique on a more advanced application. A report application that sends Excel reports as email attachments, for this I need to build a more complex database with reports, recipients and scheduled events.  If I succeed I post about it.

No comments:

Post a Comment