Last week we had a project close for the creation of a Business Intelligence Data Warehouse. A great project close, a lunch at a nice restaurant from 13.00 to 21.00, The business manager and project sponsor started by saying "We maid it, on record time, ahead of an optimistic schedule far under budget. We now have a Data Warehouse exceeding even our most optimistic expectations." Very seldom you hear superlative like that from the business when closing an IT-project .
My contribution to this Data warehouse was rather small, but I helped the three major contributors actually to start their IT/BI careers, The BI manager, the Data Warehouse architect and the infrastructure architect.
The new Data Warehouse is cloud based and built from scratch on Amazon Web Services using Python for logic and Redshift for storage. The Data Warehouse was developed on Windows, Mac and Amazon WorkSpaces platforms. Together with the Data Warehouse a Tableau front end was developed by the BI manager and a Tableau developer. Together these four BI workers created the new BI solution and replaced the old one in eight months. Only the Data Warehouse architect/developer worked full time on the project.
This Data Warehouse is structured like a 'one dimensional snowflake scheme', which strictly speaking is not a snowflake, but a fact table with a number of directly connected dimensional tables, (so far) they have avoided a multidimensional snowflake by not adding dimensions to the dimensions. For those of you knowing me I'm very critical to elaborate snowflake models they are hard to create and maintain and more important hard to understand. I favor redundant data in 'business query sets', which is my fancy expression for independent super sets for reports. My model is simpler and easier for users to understand, more important though more robust than any snowflake model, if the creation of a business query set fails it only affects a limited number of reports, if the update of a snowflake fails it may affect the entire Data Warehouse. This is very important. With my model you do not need a test environment you can without risk of crashing the entire Data Warehouse develop directly in the production environment, which significantly reduce time to market probably the most important success factor of a Data Warehouse. Speed is often more important than than other factors like accuracy.
The one dimensional snowflake scheme is a good compromise, it is relatively simple to maintain and understand, and you can argue it is based on proven and widely accepted theories, this is important when you ask for money to develop a Data Warehouse and the critics say this is not Kimball or Inmon or any other BI guru's scheme, with the one dimensional snowflake you can fend off any such critic with 'This is X's Top Down or X's Bottom Up scheme" where X is any of those gurus.
However the one dimensional snowflake is as brittle as other snowflakes, you need a test environment for development, that is my main criticism against this model.
There are pros and cons with all BI models; all things considered the one dimensional snowflake is a pretty good structure and a sane compromise between my simple structured Data Warehouse and the more elaborate structured Data Warehouses you find on the market.
No comments:
Post a Comment