2014-03-30

Web service as an integration tool

At the company we are setting up a Master Data Management System, and intend to use web services to distribute (master) data from this MDM system. I like to test ideas and concepts before I deploy them. I discussed this with a colleague Petr Hutar, who runs the MDM system at Business Area MR. We decided currency rates would be a good entity to test the web service data distribution. Petr wrote and deployed the web service, and I sat up a client web service importing the rates into the Data Warehouse:
The ITL workflow  for importing currency rates into the Data Warehouse. The idea is to start this schedule and let it sit and wait until the new period’s currency rates becomes available. When the rates are available the job getRates fetches the rates and the job loadRates inserts them into a Mysql table.
There are some tags of interest worth explaining:


<tag name='WSDL' value='http://IPaddress/IBS/IBS.ASMX?WSDL'/>
The WSDL tag points to the WSDL of the SOAP web service we use to extract the currency rates. As you can see it’s an ASP.NET developed web service which is called from the Data Warehouse Linux/PHP environment.


<prereqwait sleep='600' for='40' until='17:00:00'/>
<prereq type='pgm' name='SoapWsdlNewCurRates.php' parm='@WSDL'/>
The <prereqwait> will repeat checking the prereq(s) until they are satisfied. This prereqwait will check the prereq every 10 minutes (600 seconds) maximum 40 times or the clock strikes 17:00:00. whichever happens first.
The <prereq> calls SoapWsdlNewCurRates.php:




The job getRates calls getIBCurrRatesWebserv2.php:
Comments to the code. I had to write my own error and exception handler to catch error in the SOAP client. This should not be necessary, maybe I did something wrong.

As you see it’s dead simple to consume a SOAP webservice, just point to the WSDL and setup the parameters and call the method you are interested in. This web service test is a success and I will definitely explore web services as an integration tool. I’m already experimenting with Nodejs/HAPI creating REST APIs to the Data Warehouse. I wouldn’t be surprised if I start creating Web pages one of these days:)

2014-03-16

Business Intelligence agility, system & organisation


If you follow the data warehouse  tweets, you may have noticed there are often failed jobs in the job statistics. It is actually worse than what you see, jobs which never start due to failing prerequisites is not registered, e.g. a necessary source system job ends unsuccessfully. Still very seldom this is a problem since we only have a production environment, most tests are also run in the production environment, and the majority of failed jobs are tests. You should turn off logging when you test, but who remembers that? And who cares?

In an ERP system landscape it makes sense to have separate testing and production environments for very obvious reasons.You never want to put a transaction or program of any kind into ERP production without proper testing, since ERP transactions often updates the database and it is very hard to test and debug ERP transactions in an everchanging ERP environment. In a test environment you can set up your own scenarios and test the transactions without the risk of other updating your test data. There is also a security aspects, e.g. no sane owner of an application register financial transactions would grant ‘free’ access to developers into the production environment. Thus it makes sense to set up a permanent test environment for ERP systems which also is a good a playground for user training.

For BI or Data Warehouse applications the situation is different, most transactions are read-only (most updates are done in controlled and relatively static batches) and debugging is very hard without current production data. Since the data is stable testing can preferably be done in the the production environment. More important agility is key in BI. Developing directly in production (together with the user) is much faster when you can see the result in real time, than developing in a test environment. I question if you can develop together with users in a test environment at all. Anyway agile it is not. When you need to test updating programs you create a personal mart or take a snapshot, that is one of the key features of a Data Warehouse. And then you test in your mart or snapshot. You need plenty of disk-space for this (and easily accessible backups, for fast restore when you screw up). What use are good data mart facilities if you cannot use them?

If you develop in a separate environment, do not develop together with users, if you as a developer cannot create data marts and/or snapshots on the fly or do not have simple access to backups; you do not have agile Business Intelligence, no matter what your BI vendor says.

There is yet another aspect that can aggravate the agility of the BI, the development methodology. In ERP requirements and program specifications are good things, but in BI specifications other than very high level are evil.
If you develop according to the ERP specification model it is very likely you do not have Business Intelligence at all, it is more likely you are doing standard reporting.


When I want a new analysis of stock movements since 2006 I want it in hours not tomorrow. I do not necessarily care about 100% veracity, I care about velocity!
A Data Warehouse user.


It’s not only the BI system, it is very much your BI organization that determines to what degree you are agile.


The loophole:

The observant reader may have seen a flagrant loophole in this post,  ‘the security aspect of BI development‘. Shouldn’t the BI developer be kept out of the production environment?
I argue you cannot keep the BI developer away from the production data, since debugging and testing often is very hard and sometimes almost impossible without access to production data. There are situations where you cannot allow the developers access to the production data, but then you will not have a very agile BI system. You have to trust the BI developer if you want speedy agile BI systems, and accept the fact the BI developer knows more about the business than they maybe should. 
Business Intelligence developers should sign an agreement of non disclosure.

2014-03-10

Have you rebooted your PC?


Last week we had two terrible days in the Data Warehouse, for no obvious reasons at all the database server slow down to almost a standstill, like someone had thrown glue in the server. I was on a business trip and could not attend the problem from start.
First I suspected the network interfaces since we have a known problem with one of the interfaces but that was a red herring. Then I was looking at the memory unilization, I could not see anything alarming. (I find interpreting memory utilization in Linux a bit of a challenge.) I could not see any sign of excessive paging or swapping, I was looking at vmstat swap but si and so was most of the time zero. I noticed free memory was very low but I do not know what is useful and not harmful use of memory (as cache) in Linux, and I do not know what these figures looks like when the system is ok.
Anyway the server behaved very bad, MySQL actually crashed twice during the time. I stopped all unnecessary processes in the server, checked network cables but all in vain. Later I remember the remedy for all problems when you sit in a help desk. ‘Is the power cord attached?,   ‘Is the network cable connected?’’ and lastly the classic  ‘Have you rebooted your PC?
I had already checked the power and network cables, but I had not rebooted the server. And sure enough after restart all was back to normal. But I still do not understand what the root cause of the problem was and that is very unsatisfactory. But as I use to say - Once is no time, twice is one time too many  -  next time, which God forbid, this happens I should be better prepared, or at least reboot quicker.



While this malady was ongoing I noticed MySQL queries was in state statistics or something similar, which I interpreted as MySQL where optimizing access paths. We only run dynamic queries since we very seldom run the same query more than once in the same context, but there are some programs that run the same query over and over again, maybe we could gain something by prepare the query and then execute the query. This is a reworked prepare query program:


Sadly it only performs on par with the original dynamic query program  (which runs a mysqli->query command). It was a good try though.

2014-03-04

At last my first usable Node.js / Hapi program

I have for a long time been thinking about writing an http communication channel to the Data Warehouse, mainly to be able to write outbound web services. Today I actually did the first serious attempt. This is barely beyond ‘Hello World’, it is a small step for mankind, but it was a giant step for me. Here is the complete web server:


It actually took quite some time to put this together. I’m not really on speaking terms with Javascript. I find it a bit confusing. It’s my inexperience (and probably my age) that is the problem. You do not grow quicker by the years unfortunatly. I need to change user for each inbound request, depending on which data warehouse being served and that took a while to figure out how to do.


What is this web server good for?

This server can be contacted from e.g. VBA code in Excel sheets to trigger load jobs in the data warehouse. To be really useful I need to add file upload capabilities to the server but that has to wait to next weekend.
Any comments about the code is most welcome. There is probably some ugly gronk in there, I have not really tested the code yet.