2012-06-09

Replicating MySQL with Rsync

In a previous post  I wrote about replicating Business Intelligence information to Local Area Networks satellites. A BI satellite is a database server with replicated BI information from a master BI system. The BI satellite must be affordable and simple to operate, otherwise you are unlikely to get funds for the satellite. And you need a backup a plan B, if which God forbid the satellite breaks down.

Our Business Intelligence system The Data Warehouse is designed with this in mind. The BI data is stored in a MySQL database, day2day reporting and the legal financial reporting is done by ODBC/Excel, which all users have on their PCs. So all we have to do is set up a simple Mysql database server and replicate the database(s) from the Master Data Warehouse to the satellite system. The master system is also the backup, if the satellite breaks down, we reroute the users to the master via the DSN entry.

Note! all user definitions must be identically defined in the master for this to work. This is a manual process I do not have a good solution for yet.

The satellite server

The ‘server’ for this particular satellite is a Dell Optiplex 790 with an Intel I3 processor 4GB RAM and  two 1TB SATA disks. On this computer I installed Ubuntu 12.04 LTS server with SSH and LAMP. I failed to install the second disk, my plan is to put /var  on the second disk, this is something I still have not figured out to how to do. I also installed phpMyAdmin on the server. This was my first install of an Ubuntu Linux, I had some initial problems (as the second disk), but most problems are solved and I have a working and tested prototype satellite system.

Replication

The absolute simplest way to replicate data is Rsync, a brilliant piece of software. Our Business Intelligence information is updated in controlled nightly (whatever that means when you are global) batch jobs. This means we do not have to use more complicated database replication schemes, we can replicate in controlled batches. The MySQL storage engine MyISAM is perfect for such replication since it uses the directory structure on disk as index to databases and tables. Just stop mysql, replicate and start mysql again. We like to initiate the replication from our master data warehouse when time is due. The satellite system decide where the data should go. This can be achieved by some simple shell scripts (not done yet) and a Rsync Daemon in the satellite server. This replication not only simple it is also self-healing. If something goes wrong, just replicate again.

Setup

The Rsync daemon is configured by three  files.

1  - The first file rsyncd.conf  contains the actual configuration. With some embarrassment I have to admit I do not understand all parameters yet.  (Go to everythinglinux.org  for a detailed explanation).

Note! the path parameter points directly into the MySql database SATDW

rsyncd.conf

pid file = /var/run/rsync.pid

log file = /urs/local/etc/rsyncd.log

lock file = /var/run/rsyncd.lock

[satwd]

            comment = Replica of the Data Warehouse

            path = /var/lib/mysql/SATDW/

            transfer logging = yes

            hosts allow =  nn.nn.nn.nn

            auth users = userid

            transfer logging = yes

            read only = no

            uid = mysql

            gid = mysql

            use chroot = yes

            strict modes = no

            secrets file = rsync.scrt

End-of-rsyncd.conf

2  -The next file rsync.motd  just contains a greeting message.

rsyncd.motd

Welcome to the Data Warehouse server Rsync Daemon    

End-of-rsyncd.motd

3  - The last file rsync.scrt  contains userid and passwords. Note! these userids are internal to Rsync daemon.

rsync.scrt

userid:password

End-of-rsync.scrt

With these three files defined its just to start up the Rsync Daemon and start replicate.

Replication

My first attempt was with one table.

rsync --stats --archive --delete -H /db/mysql/DW/table* userid@SATDW::satdw

Number of files: 4

Number of files transferred: 4

Total file size: 49600686 bytes

Total transferred file size: 49600686 bytes

Literal data: 49600686 bytes

Matched data: 0 bytes

File list size: 89

File list generation time: 0.001 seconds

File list transfer time: 0.000 seconds

Total bytes sent: 49607129

Total bytes received: 84

sent 49607129 bytes  received 84 bytes  62913.40 bytes/sec

total size is 49600686   speedup is 1.00

This was kind of depressing, it took some 17 minutes to replicate this one file the speed 62913.40 bytes/sec was all too slow. I added --compress to my replication

Compression, need for speed

rsync --stats --archive --delete --compress  -H /db/mysql/DW/table* userid@SATDW::satdw

This time I got:

Number of files: 4

Number of files transferred: 4

Total file size: 49600686 bytes

Total transferred file size: 49600686 bytes

Literal data: 49600686 bytes

Matched data: 0 bytes

File list size: 89

File list generation time: 0.001 seconds

File list transfer time: 0.000 seconds

Total bytes sent: 5438439

Total bytes received: 84

sent 5438439 bytes  received 84 bytes  59437.41 bytes/sec

total size is 49600686   speedup is 9.12

With compression the replication took about 90 seconds, much better, but not good enough. My Slave satellite concept will not work if I cannot double the transfer rate. It seems I only get about 60Kb/sec which is much less than expected. I will talk to the Network guys to see what they can do to speed things up. I suspect there is some kind of limitation enforced upon my replication. If I could get at least 100Kb/sec I think my replication will work. I expected +200Kb/sec so clearly this is a disappointment.  

Next

The next thing to do is to talk with the network guys, I need more speed. And then I have to automate the replication with some shell scripts. Then I write another post , I really hope this will fly :-|

 

No comments:

Post a Comment