2012-05-12

Integration between the Business Intelligence storage and Qlikview

In this post I first give a background which is our  experience with the Qlikview program product, and then our strategy for Qlikview integration with our Business Intelligence storage.

Background

Our Business Intelligence data storage and viewers.

In our Business Intelligence application The Data Warehouse (DW), we isolate the data from viewer applications. The data is stored in a SQL database (MySQL), and viewers are any application that can connect to Mysql using standard interfaces  like ODBC or JDBC. Today the primary viewer is Microsoft’s Excel. Qlikview is the other viewer in use today, and it is spreading like a prairie fire among our users. The reason is the users like Qlikview and you can develop advanced analytical apps quickly.
Note!  All though this post is about  Qlikview integration, we do not exclude other viewers from our Data Warehouse. We invite users to use viewers of their  liking not ours .

Qlikview the good and the bad.

We use the program product Qlikview as a viewer into our BI storage. Qlikview as other apps has it’s (dis)advantages. From an   evaluation  some years ago done by some Belgian colleagues  we could learn Qlikview is strong on data visualization, it is a user friendly and flexible tool for analyzing data. But it lacks in enterprise integration  i.e.  workflows and ETL procedures tend to be complex.
It is cumbersome to  create, maintain and debug complex Workflows in Qlikview.  Nor is Qlikview a good data storage, it is a closed environment, once Qlikview has slurped up the data it is inaccessible for other applications.

Qlickview parallel processing.

Qlickview also have a problem with parallel processing, as a rule-of-thumb  you should not run more than 10 ETL processes in parallel [1] . Good parallel processing capabilities and good scalability is essential for large scale ETL. Now Qlikview consists of two servers, the Publisher which is the ETL engine and the Database server which handles user requests, by splitting these servers into separate physical computers you alleviate the problems with parallel processing, but still you should not run more than ten ETL processes in parallel [2] . Having separated the twain Qlikview server parts, loading data does no longer affect user interaction, but the loading itself is still a problem. There is probably more you can do but for us right now we are contented with Qlikview loading performance. (We try to avoid heavy ETL processing in Qlikview.)

Data views.

Data can be viewed from different angles, a production planner see the world from a different perspective than a logistic analyst  which have another view than a purchaser .   N ot to mention the business controllers which often  like to see different perspectives in the same view, the need for financial report structures is endless or so it seems.

What is in a warehouse storage bin?

The controller may want to know the value of the physical parts of the bin, while  the production planner wants to know if there are enough allocatable parts  to produce X products, while  the logistic analyst wants all parts including those being produced the next two weeks, while  the purchaser wants to include parts on the road from the vendor,  stock figures  actually are more complex but this will do as an example of different data views.
Business rules . Data views are determined by business rules, and you want the data views to be consistent, between viewers. In our case we have other  viewers apart from Qlickview. Data views are not static, they change all the time, influenced by things like the real world, type of production, management, layout of workshop, business model , the analysts themselves etc. etc. E.g. change in management often implies change in data views.
To create data views you often have to join data from different sources, calculate ratings like  ( 1/ price * 100 +  1 / delivery time * 10  + quality * 0.25) or classify observation by rules like if vendor  rating < 10 and  vendors = 1 then ‘critical’ else if… .

Data transformations for viewers.

Viewers need formatting, e.g. in one application you want material number  to be called product number . Aggregation is another transformation  that affect viewers, how detailed information is useful?  Transpose or cross tabulation or pivot tables is a powerful transformation very common in BI. The data transformations for views goes from the very simple as changing the default label, to the very complex, transpose a table is not easy [3] .
Viewers prefer data to be organized in specific ways, Qlikview often wants all data or most of the data for an application in one big table.
(Viewers often has an internal format, Qlikview have a proprietary compressed format, tailored for performat Qlikview application. This transformation is done internally when data is load into Qlikview.)  

Oceans of space .

Our BI system the DW runs on servers designed for BI, which (among other things) means the database servers have many large and low-cost  hard disks . I cannot stress enough the importance of economical  disk space for BI storage.  
We run Qlikview in a hosted  environment where dis k space is performant and expensive .  This is good for our Qlikview apps, but   oceans of space  is out of the questions for our Qlikview environment.  So we try to only store information in Qlikview necessary for the apps them self. Using Qlikview as a general data store will be costly, due to the environment we run Qlikview in.

Accessibility, quality and reuse of data (and definitions).

First the BI store must be available. If the BI store is down there is not much accessibility, quality or reuse.
There is a consensus among everyone 100% quality  of data is essential (all experienced BI professionals have ‘extrapolated’  one time or another, show me one that has not, and I ’ ll show you a liar). Data quality in BI is important if users find that the data is invalid they stop using the system, or pester you with ‘there is invalid figures in the Bi system’, whenever a discrepancy is found [4] .
The reuse  of BI data is grossly exaggerated (mostly by IT professionals). For rapid, lenient and agile application development it is often preferable to copy data to a new database. Copies must be replicated correctly, which means good workflows, monitoring and consolidating facilities to safe guard 100% quality.
The importance of BI data accessibility  is often neglected/diminished (mostly by  IT professionals). IT professional often only want to use one BI application (the one they know) . Most often this is a closed proprietary system, which hinders free use of application data. Ideally a BI data store respects de facto database interface standards like ODBC, JDBC and has good extendable APIs.

Strategy for Qlikview integration.

Above I have tried to depict the background and the prerequisites for Qlikview integration in our BI echo system. Our integration strategy is based on the conditions that applies to us here and now.  We try to use the best tool for the task .   F or Qlikview apps this means use Qlikview for visualization and the Data Warehouse (DW) for ETL and workflows. With our strategy we try to accomplish a cost effective environment for Qlikview development and maintenance.
Do not use Qlikview as a  ‘data funnel’.  It is much better to gather data from various sources in the DW where we have better workflows, traceability etc. Consolidate the data in the DW and send it in one go to Qlikview.  This way we also have a natural Qlikview backup.
Do not store more data in Qlikview than necessary . Qlikview is not a data store, it is better to store data in the DW.
Do not use Qlikview for defining data views .  It is much easier to create, maintain and reconcile the data views in the DW, where the data can be reused, shared and copied.
Try to do  transformations in the DW . This is most simpler transformations as changing the format or default labels of data, it is a question of transferring  similar code from Qlikview to the DW. By preparing data sent to Qlikview we can minimize time spent on Qlikview development and make Qlikview code cleaner. Do not overdo this, it’s not the end of the world if transformations are done in Qlikview.
Do aggregations in  Qlikview . Qlikview needs detail data and has excellent functions for aggregation and cross tabulating.
Try to organize Qlikview application data into a specific DW database . This way it is simple to isolate the Qlikview application and developer (often a remote consultant) from other data and still present all application data to him or her.
Do not try to reuse and share data between Qlikview applications . With  separated data you create cleaner Qlikview applications. Sooner or later the shared/reused data and definitions will deviate anyway. It is not worth the effort to overdo the sharing. Share and reuse is better done in the BI layer.
Examples.  

These pictures are included  just to give an idea what transformation code may look like in Qlickview and SQL.

Example of simple Qlikview  formatting that partly can be moved to the DW database layer:
Example of simple Qlikview formatting done in the DW database layer:

[1]  I have heard Qlikview consultants say it is a problem with Windows, but I think it’s more a Qlikview problem. Not being an expert I think  Windows Server 2008 R2 is  good at parallel processing.
[2]  When overloaded the Publisher seems to run into a stalemate, the computer runs at full speed but very little happens, I have not analyzed the situation but it looks like a classical overload situation.
[3]  If I get time I will write about a ‘poor man’s SQL transpose engine’ which I’m building in LUA on top of MySQL. I have done 60% of the work, so we see if I be able to finalize it.   This is based on work done by Gui sepp e  Maxia .
[4]   Every complaint about data quality should be fully investigated. Good BI administrators put lots of efforts in quality complaints. In about 98-99% of the cases it is not a problem in the BI. In most cases it is interpretation of figures that cause quality complaints.

No comments:

Post a Comment