2012-07-10

Business Intelligence and Hardware

Hardware is often neglected in applications design.  In best cases you divide components of an application into separate servers, move the data to a separate SAN and add some extra RAM for performance and that’s it. The server infrastructure of applications is often well suited for transactional systems. Transactional systems do lot of random read/write of small chunks of data and very little processing on those small chunks.Business Intelligence systems on the other hand does not write very much, but reads a lot. Both reads and writes are mostly done in large chunks.  

 

Since the middle of the nineties I have been interested in BI applications hardware infrastructure. My interest for hardware started with the   spintronic  revolution, I realized that hard disks (HDD) and memory RAM was going to be larger, faster and cheaper in future. HDD was the first in the spintronic wave. HDD had been too small, too slow and too expensive to allow for modern BI; with the SATA HDD we had cheap, fast and large HDD. The seek time (find the data to read or write) on these cheap SATA HDD is not impressive, so if you need to do lots of small random read/write it is not a wise choice. So the traditional servers still use expensive and small HDD with good seek time.  These HDD are not very good for BI. BI are more interested in low transfer time (from disc into memory) than low seek time, and the SATA protocol can deliver more data than the discs can spin. More important SATA disks today are large, you can get 3 TB SATA disks for about 200€ and size matters for BI operations.  SATA HDD are better suited for BI than more expensive and smaller HDD  found in servers. Of course we will use Solid State Disks in a few years’ time, but still these disks are too expensive and too small for my liking .

Lots of RAM more than compensate for the slower  SATA disks (compared with SAS and SSD),  today you get 24GB SSD3 for about 300€. It is important to have enough of RAM to keep the active data set in memory to keep physical I/O low. Our BI database server needs 16GB to perform well. First day of the month I have noticed increased response times and some ‘peculiar’ MySQL behavior. I will install more memory and see if that helps. If not I have to find the root cause for the increased response times which most likely is missing or not good enough indexes. I still see recommendation about being careful with adding indexes. For BI systems this is wrong, wrong, wrong. You should sprinkle your database with indexes. All frequent queries should have optimized indexes. For the experienced DBA I recommend   Relational Database Index Design and the Optimizers’ by Tapio Lahdenmaki and Mike Leach . This is serious, heavy and good reading about indexes. But I first try to mend performance problems with hardware, it is simpler to add RAM, than to analyze bottlenecks.

Processors today are so powerful any modern multicore processor will do just fine. I use high quality workstation motherboards. For my last database server I used an ASUS P9X79 DELUXE X79 S-2011 ATX motherboard   for about 300€. It performs beautifully for our Business Intelligence system.

With such components it is an easy task to build high performance BI servers. I build the servers as simple as possible, this means I deliberately build them as single-point-of-failure. Simple server means few things that can crash and my servers are remarkably stable. The only things that have crashed so far are HDD raids and raid controllers (two times in eight years). Today disks are so big so we do not need to raid them anymore. Hardware development goes very fast; last year’s top notch hardware is ready for the scrap heap the next year. Using inexpensive servers give me the luxury to replace them more often. The normal lifetime for a server is three years and the life time cycle is test-production-backup-scrap heap. The database server I try to replace more often.

I’m aware of most experts do not approve of my ideas. But I created a working system after these ideas and it performs beautifully. I have put lots of efforts into my system; thinking, testing, measuring. More important than hardware is the database design. I have completely removed the traditional snowflake or whatever it is called design. The traditional BI database design patterns were conceived when hardware were expensive and disks were small. With today’s cheap hardware the old design patterns are a millstone around the BI system’s neck, we will see new simpler databases with more redundancies as in-memory computing becomes a reality. I should probably post about this, since I am an humble pioneer in this field.  

This is how my ‘tin cans’ look 2012-12-23.

No comments:

Post a Comment