In a previous post I wrote about replicating Business Intelligence information to Local Area Networks satellites. A BI satellite is a database server with replicated BI information from a master BI system. The BI satellite must be affordable and simple to operate, otherwise you are unlikely to get funds for the satellite. And you need a backup a plan B, if which God forbid the satellite breaks down.
Our Business Intelligence system The Data Warehouse is designed with this in mind. The BI data is stored in a MySQL database, day2day reporting and the legal financial reporting is done by ODBC/Excel, which all users have on their PCs. So all we have to do is set up a simple Mysql database server and replicate the database(s) from the Master Data Warehouse to the satellite system. The master system is also the backup, if the satellite breaks down, we reroute the users to the master via the DSN entry.
Note! all user definitions must be identically defined in the master for this to work. This is a manual process I do not have a good solution for yet.
The satellite server
The ‘server’ for this particular satellite is a Dell Optiplex 790 with an Intel I3 processor 4GB RAM and two 1TB SATA disks. On this computer I installed Ubuntu 12.04 LTS server with SSH and LAMP. I failed to install the second disk, my plan is to put /var on the second disk, this is something I still have not figured out to how to do. I also installed phpMyAdmin on the server. This was my first install of an Ubuntu Linux, I had some initial problems (as the second disk), but most problems are solved and I have a working and tested prototype satellite system.
Replication
The absolute simplest way to replicate data is Rsync, a brilliant piece of software. Our Business Intelligence information is updated in controlled nightly (whatever that means when you are global) batch jobs. This means we do not have to use more complicated database replication schemes, we can replicate in controlled batches. The MySQL storage engine MyISAM is perfect for such replication since it uses the directory structure on disk as index to databases and tables. Just stop mysql, replicate and start mysql again. We like to initiate the replication from our master data warehouse when time is due. The satellite system decide where the data should go. This can be achieved by some simple shell scripts (not done yet) and a Rsync Daemon in the satellite server. This replication not only simple it is also self-healing. If something goes wrong, just replicate again.
Setup
The Rsync daemon is configured by three files.
1 - The first file rsyncd.conf contains the actual configuration. With some embarrassment I have to admit I do not understand all parameters yet. (Go to everythinglinux.org for a detailed explanation).
Note! the path parameter points directly into the MySql database SATDW
rsyncd.conf
pid file = /var/run/rsync.pid
log file = /urs/local/etc/rsyncd.log
lock file = /var/run/rsyncd.lock
[satwd]
comment = Replica of the Data Warehouse
path = /var/lib/mysql/SATDW/
transfer logging = yes
hosts allow = nn.nn.nn.nn
auth users = userid
transfer logging = yes
read only = no
uid = mysql
gid = mysql
use chroot = yes
strict modes = no
secrets file = rsync.scrt
End-of-rsyncd.conf
2 -The next file rsync.motd just contains a greeting message.
rsyncd.motd
Welcome to the Data Warehouse server Rsync Daemon
End-of-rsyncd.motd
3 - The last file rsync.scrt contains userid and passwords. Note! these userids are internal to Rsync daemon.
rsync.scrt
userid:password
End-of-rsync.scrt
With these three files defined its just to start up the Rsync Daemon and start replicate.
Replication
My first attempt was with one table.
rsync --stats --archive --delete -H /db/mysql/DW/table* userid@SATDW::satdw
Number of files: 4 Number of files transferred: 4 Total file size: 49600686 bytes Total transferred file size: 49600686 bytes Literal data: 49600686 bytes Matched data: 0 bytes File list size: 89 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 49607129 Total bytes received: 84 sent 49607129 bytes received 84 bytes 62913.40 bytes/sec total size is 49600686 speedup is 1.00 |
This was kind of depressing, it took some 17 minutes to replicate this one file the speed 62913.40 bytes/sec was all too slow. I added --compress to my replication
Compression, need for speed
rsync --stats --archive --delete --compress -H /db/mysql/DW/table* userid@SATDW::satdw
This time I got:
Number of files: 4 Number of files transferred: 4 Total file size: 49600686 bytes Total transferred file size: 49600686 bytes Literal data: 49600686 bytes Matched data: 0 bytes File list size: 89 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 5438439 Total bytes received: 84 sent 5438439 bytes received 84 bytes 59437.41 bytes/sec total size is 49600686 speedup is 9.12 |
With compression the replication took about 90 seconds, much better, but not good enough. My Slave satellite concept will not work if I cannot double the transfer rate. It seems I only get about 60Kb/sec which is much less than expected. I will talk to the Network guys to see what they can do to speed things up. I suspect there is some kind of limitation enforced upon my replication. If I could get at least 100Kb/sec I think my replication will work. I expected +200Kb/sec so clearly this is a disappointment.
Next
The next thing to do is to talk with the network guys, I need more speed. And then I have to automate the replication with some shell scripts. Then I write another post , I really hope this will fly :-|
No comments:
Post a Comment