2014-06-22

Xeno intellect and computer intelligence

After college I started to work and took some evening classes at the university in maths. I had a shift job and couldn’t sleep during day time. I was a sleep deprived zombie. To do math you actually have to think and to think you need to sleep, so I turned to something simpler computer science. After some boring classes about Hollerith code, speed of drum memories etc. we were given a short introduction to Basic and access to computer terminals. The first exercise we were to do, ring the bell of the terminal make it play ‘twinkle little star’. I decided to something else, make a program that could play the card game ‘blackjack’. I quit the computer science program and developed the blackjack program and become a programmer. The program became quite good at playing blackjack, but I never gave it credit for being intelligent.


If you read Alan Turing writings from the early 50ties, you will notice he was almost obsessed with computer based artificial intelligence. He even devised a computer ‘intelligence test’, if a human communicating with a computer (programme) cannot decide if it is a computer or another human he/she communicates with, then that computer (programme) holds true intelligence. To my knowledge no computer have been capable of that yet. Defeating humans in chess is kids play compared with converse as a human. There are ambitious attempts to create humanoids, like IPsoft’s self learning Amelia, that one day maybe chitchat like humans. During the mid 80ties I developed a program which scanned logs in a IBM mainframe and was able to interact with the operator console, it saved possible ‘action items’ in a database, which then could be associated with automatic operator interventions, this semi self learning pattern I used in some applications, a typical case; validation of transactions, instead of develop a framework of rules, I intercepted all transactions, a human had to approve a transaction as valid, then all intercepted transactions of that type were approved. This ‘validation pattern’ can produce very flexible transaction systems, but also very complex programs behind, e.g. how to handle transactions no longer valid. It is much simpler to validate transactions against a coded set of rules. And that is what ERP systems on the market do, and they are picky, transactions must follow the rules 100% and the same thing goes for master data, e.g. if  parts are not defined correctly, anything can happen when you run an MRP process, there is no fine tuned black box of logic that thinks ‘X must be a purchased part even though it is defined as manufactured.


Turing and many other great minds have speculated what happens if and when computers become smarter than humans? Today the break even moment when computers become smarter than man, the singularity, is supposed to be here around 2040. The scenarios vary, them being our obedient servers or we are merging together into a new species or the smarter computers take over and annihilate man. One question that arises in my mind is it possible to be smarter than man? To me it seems we can figure out anything, it takes time but together humans seems to be able to be able to figure out anything that follows the laws of logic. How can you be smarter than that? Faster than us yes, but smarter? It will not take long before computers and robots will be better in warfare than us. Inventions come first to warefare, we humans seems to be very imaginative and creative when it comes to war, it’s probably hard wired into our genes. But still warfare is just a more advanced chess play, it is a long way from launching a missile to understand quantum mechanics. ”I think I can safely say nobody understands quantum mechanics”, but one day we will, if it’s logical we will, we actually have come a long way on that particular quest.


If I recall this right Turing did not think computer intellect must mimic the intellect of man. I try to imagine how another type of intellect looks like, a superior intellect that makes us look like a schimpanse not able to figure out to put two sticks together to get the banana, but I can’t. Man has the ultimate intellect. I think it’s very hard to (dis)prove this statement. Maybe time will tell. We invented the modern computer plus sixty years before now and we have speculated in computer based intelligence ever since, but we have not yet produced real intelligence. During the 50ties the computer guys thought they were close to achieve intelligence, they only needed a bit faster computers with larger memories. Today we know more, it is more than better hardware we need to create truly intelligent computers.
We will be able to mimic human intelligence and intellect within computers of that I’m certain, but will that intellect be superior to our? I doubt.
If there are other logical types of intellect that are superior to our’s, we will figure out how to construct them. Are those intellects still superior to our intellect when we have done so? I doubt.

2014-06-15

pChart eye candy graphs from the Data Warehouse

Some weeks ago I implemented support for Python in the Data Warehouse for no reason at all. That was not entirely true I had in mind to introduce graphical support in the Data Warehouse. For some reason I thought there should be better alternatives with Python. I wanted to implement eye candy, a simple way to produce good looking static graphs. A problem with graphs there are so many parameters you have to set to produce nice looking graphs and that is what eye candy is all about. The graphs I had in mind should be practical for the eye rather than the neocortex, for those graphs we use the heavy artillery Qlikview. While searching for a free, easy to use graphical package I found pChart 2.0 written in PHP. You can do all kinds of advanced graphical stuff with pChart, much more than required for my eye candy. pChart is a great software product, while working with it I like it more and more.

I decided to implement a bar chart with optional superimposed line-graphs and make it pretty.
I decided to create a special sql converter for pChart bar and line graphs, which pre-formats SQL result for pChart and feed the result into a PHP program tailored for pChart eye candy. This is the ITL script I used for my first attempt:
<job name='crtGraph01' type='script' pgm='pgraph04.php' >
   <image name='dwstat.png' lh='1500x345' type='bar01' sdwhight='56'  header='Data Warehouse statistics' footer='Data Warehouse statistics - @MONTH'>
   </image>
   <bardata columns='@J_crtData01/stem0'  axisname='Jobs' series='column.Month'>
     <labels array='@J_crtData/report1'/>
   </bardata>
   <linedata columns='@J_crtData01/stem1'/>
 </job>

Pgraph04.php is my php script producing the graph. Bar- and line-data comes from a preceding sql job, producing job and MySQL statistics from the Data Warehouses. The other tags are parameters to Pgraph.
And this the result:

Here you  the graph on twitter. I think the result is great but the text is too small to read. After some tweaking I came up with this improved Graph

:

It is still hard to read the text in twitter format, but it is much better now. Still these graphs are not intended for this small format. I had mails and other reports in mind for these eye candy graphs. and this is how the ITL script looks:
<job name='crtGraph01' type='script' pgm='pgraph04.php' >
   <image name='dwstat.png' lh='1500x345' type='bar01' sdwhight='56' fontsize='13' footer='Data Warehouse statistics - @MONTH'>
   </image>
   <bardata columns='@J_crtData01/stem0'  axisname='Jobs' series='column.Month'>
     <labels array='@J_crtData/report1'/>
   </bardata>
   <linedata columns='@J_crtData01/stem1'/>
   <arrowlabel center='1373,52' color='blue' label='@MONTH'/>
 </job>

If you take a close look at the upper right corner, you see the result of the <arrowlabel> tag. If you change the <bardata> tag to <stackedbardata>, the result will look like this:

The format is too wide for this stacked bar graph to look pretty. By changing the length/height ratio to 1300x800:
<job name='crtGraph01' type='script' pgm='pgraph04.php' >
   <image name='dwstat.png' lh='1300x800' type='bar01' sdwhight='56' fontsize='13' footer='Data Warehouse statistics - @MONTH'>
   </image>
   <stackedbardata columns='@J_crtData01/stem0'  axisname='Jobs' series='column.Month' displayvalues='yes'>
     <labels array='@J_crtData/report1'/>
   </stackedbardata>
   <linedata columns='@J_crtData01/stem1'/>
   <arrowlabel center='1175,52' color='blue' label='@MONTH'/>
 </job>

You get a graph like this:
Stacked.png

This result is at least prettier, you have to play a little with graph type and size depending on your data to get a nice looking graph.
With very little effort I can produce eye candy. I have not done graphics programming before and it has taken much time, many more hours than I anticipated to write the code that produce these graphs. It has been a very tedious trial and error process, but I learned a lot and it’s nice to have a simple workflow to create these graphs. 
In the next post I'll present the SQL converter and PHP script that creates the eye candy graphs.

2014-06-06

Why are ETL procedures so complex?

Of course I think my way is better, otherwise I would have walked another way.


The last couple of years I have looked into some modern ETL products. One thing that strikes me is the complexity of these packages, or rather setting up ETL processes are in my view overly complex. These systems have drag a drop point and click graphical user interfaces. You build workflows by dragging job steps into a pane and connect them by arrows, connectors etc, I think you got the picture. This looks very appealing, with just a few mouse clicks you have created an ETL workflow, no dull programming at all. But this is just an empty skeleton, which you have to dress up with real functionality, often in a clunky scripting language specially crafted for the ETL product.Then you need to define meta data and map source data with the ETL datastore, (which most often are Data Warehouses). This mapping is typically done in stages gradually refining the data into extended super duper information cubes. These mapping steps are often a combination of drag and drop, click and insert clunky code, which can be surprisingly complex, and the clunky code is spread all over. SQL which still is the Lingua Franca of data stores is often hidden away from the workflow developers, in favor of a clunky scripting language. A human analogy is learning some esoteric dialect of klingon instead of english. The analogy limps, but I think you get the idea.
SQL is a precise succinct and mostly logical language (it got it’s quirks I know). Replacing SQL by a clunky language and some graphical symbols is not good for anyone, except those who thrive on the ETL tool.


One big problem for enterprises today is analysing the information in their ERP system. The first hurdle is to grab the information and get it out of the source ERP systems, so it can be massaged and imported into a Data Warehouse. Having a clunky tool to extract data from the source system does not help much. I think those of you who works with this, recognize it takes days if not weeks to get a new simple piece of information from request until it is usable in the Data Warehouse. It’s not only the complexity of the ETL tool who is to blame, the entire governance of these business- and IT-processes seldom supports agile and rapid development.
I’m not pointing fingers to any product or organization, it’s more and dissatisfaction with how ETL tools and corporate organisations work in general.


This post emanated from a discussion I had with some attendees of a large international Business Intelligence user meeting and a case of missing country information in our Data Warehouse.
I needed the country names for a report and to my stupefaction I found this info missing in the Data Warehouse. I do not like hard coding so I looked around for country info.Via Google I found the info I needed in SAP table T005T. Then I search for an existing Data Warehouse workflow that could act as a template and I found this one:


<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<schedule notify='admfail.xml' logmsg='SAP Plant/Branch Info'>
  
<variant name='sap' validate='acta_prod,acta_test' default='acta_prod'/>
   <tag name='DW_DB' value=’LOGISTICSBI'/>
   
   <job name=’PlantInfo' type='script' pgm='sap2.php'>
<tag name='SAPTABLE' value='T001W'/>
<sap>
   <rfc>
<name>Z_LJ_READ_TABLE</name>
<import>
   ('QUERY_TABLE','@SAPTABLE')
   ,('DELIMITER',';')
</import>
   </rfc>
</sap>
<sql>
   <autoload>replace</autoload><database>@DW_DB</database><truncate>yes</truncate>
   <table>@SAPTABLE</table>
</sql>
   </job>
</schedule>
  
This workflow doesn’t look sexy I know, you may even call it clunky but to me it is just plain simple, and I find simplicity beautiful, I call it ITL, an XML based language designed for computer backend or batch workflows. Two features of ITL, it’s text and a workflow is contained in one script so it’s easy to create new workflows by copy. We seldom create new ITL scripts from scratch, we copy from old workflows to create new ones.


A bowl of spaghetti is a bowl of spaghetti, no matter if it is code or graphical connectors.
To be able to develop in  ITL, you need a basic understanding of XML and you should be well versed in SQL, ITL per se is small and simple (it got it’s quirks though). As a workflow control language it is immensely powerful.
To understand the SAP connector used in this example you got  to have a good understanding of  SAP and SAP Remote Function Call and how those functions are structured.
To be a successful ITL developer you need programming abilities, and this is very deliberate. Over the years I found programmers superior constructing logical and coherent workflows. This should come as no surprise programmers are trained for that. When I search for Business Intelligence developers I search for business people with programming abilities, this is probably the prime key factor for the success of the Data Warehouse.  


Anyway I copied and modified the workflow above like this:
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<schedule notify='admfail.xml' logmsg='SAP Country Info'>
  
<variant name='sap validate='acta_prod,acta_test' default='acta_prod'/>
   <tag name='DW_DB' value=’MASTERDATA'/>
   
   <job name='CountryInfo' type='script' pgm='sap2.php'>
<tag name='SAPTABLE' value='T005T'/>
<sap>
   <rfc>
<name>Z_LJ_READ_TABLE</name>
<import>
   ('QUERY_TABLE','@SAPTABLE')
   ,('DELIMITER',';')
</import>
   </rfc>
</sap>
<sql>
   <autoload>replace</autoload><database>@DW_DB</database><truncate>yes</truncate>
   <table>@SAPTABLE</table>
</sql>
   </job>
</schedule>


logmsg I changed to SAP Country info
The DW_DB I changed to MASTERDATA
job name I changed to CountryInfo
SAPTABLE I changed to T005T


Then I saved this new workflow as rfcReadT005T.xml.
Finally I run the new workflow schedule manually:


And this is the result:


The entire development and execution of this data extraction from SAP took less than 15 minutes. It is a very simple workflow, yes absolutely, but for most organisations 15 minutes is still fast; very, very fast. And extracting the data from SAP which most people consider ‘closed almost impossible to extract data from’. SAP is my system of preference when I need data. SAP is very open.
If you look closely at the workflows you find a line:
<variant name='sap' validate='acta_prod,acta_test' default='acta_prod'/>
This is a runtime parameter definition (sap) which defaults to ‘acta_prod’, which points to a SAP production system. We do most Data Warehouse or Business Intelligence development in production environments. If you want to be agile, you do not have time to transport programs from test over quality to production. When we need a Data Warehouse test environment we create an ad hoc data mart for the purpose. We address the SAP test environment by adding “sap=acta_test” when we start the workflow.


You can also see that there is no regular SQL statements in this workflow, mapping source data into Data Warehouse SQL tables is a tiresome process we gladly avoid, but there is nothing stopping you from importing the data with regular SQL, sometimes we do, there is always edge cases our automatic import does not support. This is important; regular SQL is always supported.


Finally I scheduled this new workflow rfcReadT005T.xml for monthly execution by inserting a new line in a monthly Cron shell script:


#!/bin/bash

echo "starting mngr_month_rfc"
date
pwd

nohup ./scriptS.php schedule=rfcReadT005T.xml logmode=warning onscreen=no &

./scriptS.php schedule=Quota_month.xml logmode=warning onscreen=no
./scriptS.php schedule=control_month.xml logmode=warning onscreen=no
nohup ./scriptS.php schedule=jm2ordstockmonth.xml logmode=warning onscreen=no &
./scriptS.php schedule=PURCHASINGBI_MONTH.xml logmode=warning

#porder extra for Tierp factory monthly closing
./scriptS.php schedule=linux_db3prodorder.xml logmode=warning
./scriptS.php schedule=Siewertz_report_2.xml logmode=warning onscreen=no
./scriptS.php schedule=Siewertz_report.xml logmode=warning onscreen=no
./scriptS.php schedule=purchasing_paydays_update.xml logmode=warning
./scriptS.php schedule=purchasing_paydays_updateCB.xml logmode=warning
./scriptS.php schedule=mail_CZT_NOTSC_MONTH.xml logmode=warning
./scriptS.php schedule=mail_IYI_STAT.xml logmode=warning
./scriptS.php schedule=mail_NORDIC_MONTH.xml logmode=warning
./scriptS.php schedule=cb1artstatinventory_month.xml logmode=warning  
./scriptS.php schedule=db3artstatinventory_month.xml logmode=warning
./scriptS.php schedule=henriktest_op_registeraccmonth.xml logmode=warning
 
#Dont forget to set correct prereqs when adding a schedule here
echo "script completed"


This took me another 5 minutes to do since I’m a very slow typist.

This is a tiny bit of month end closing activity, here you can see a regular days night activity, and here you can see some stats, and here you follow the progression of the Data Warehouse.


p.s.
I will use the new country information for a new small app, I hopefully blog about this later on. This new app is more interesting than the dull extraction workflow shown in this post.