OpenSolaris ZFS replication
I’ve had this goal for quite some time now, every since my employer went with Sun X4540 storage systems to serve as our data storage for backup applications. The goal was to handle data replication at the ZFS file system level, removing the need for application-level awareness of the file system replication.
A couple products from Sun seemingly accomplish this, one being ZFS via the ‘zfs send/recv’ function, and the other being the Sun Availability Suite (AVS).
From a technical standpoint, AVS is a very mature product, and has many features above and beyond ZFS send/recv. However, since AVS is largely file system agnostic, it poses some problems when using it to replicate ZFS file systems. Namely, a ZFS resilver is a block level operation, AVS would see this as data change, and replicate the entire resilver process over the network.
For my application, that is an undesirable situation, as we will be replicating the data off-site via a rather expensive private leased network.
This brings us to the native ZFS send/recv options. There are many resources online about how it works technically, so I would suggest reading up a bit as I wont explain here.
I explored the ZFS Timeslider tools, which have the ability for every local snapshot you take, to execute an additional command (such as zfs send/recv via SSH). That worked for a while, but it was not designed to handle ZFS replication as part of the suite. When my snapshot sizes began to grow, the send/recv operation would take longer than the window before the next local snapshot was taken.
This caused the service to always enter maintenance mode, as conflicting operations would happen.
Then I found a help blog by Sun Engineer Constantin Gonzalez (http://blogs.sun.com/constantin/entry/zfs_replicator_script_new_edition)
Where he described and made available a script that would handle parts of ZFS replication, from the initial snapshot, to sending it to a remote hosts over SSH. However, the same issues haunted me there, the send/recv operation would run past the scheduled window, and subsequent jobs would step on each other and cause issues.
Clearly, these tools accomplished a lot of great things, but some additional logic could be added to ensure jobs can run past their window, without risk of additional jobs trying to take snapshots while others are in progress.
Enter ZFS user properties; you can set arbitrary properties on a per filesystem, or per snapshot level. For example, you can “lock” a file system so that your programs will check to see if a flag exists, and if so, quit gracefully and notify you.
Jobs running past their window will always happen, and a simple check to see if an existing job is running on your data set would avoid conflicting snapshots, failed jobs, etc.
Short of using an enterprise job scheduling program like Control-M, this functionality is simple to add to existing shell scripts!
But theres more, why not use the ZFS user properties to assign additional flags, such as flagging it when all operations complete, or if a snapshot depends on another for incremental sends, or if the local snapshot has been replicated fully.
I took examples from the tools previously created (see above), and added some of those checks and flags. I’ve posted the script on my site, in hopes others will find the additions helpfull, and hopefully improve on some of the incorrect ways I’ve done things.
By no means am I great at writings scripts or programs, so if you see any bugs, or improvements you can make, please make them!
Download: replicate.ksh
Again, suggest or make any improvements, and enjoy!
Posted: March 29th, 2009 under Guides, Software.
Tags: opensolaris zfs replication
Comments
Comment from Brent Jones
Time April 1, 2009 at 9:50 am
Sturban,
Correct, block level activity gets recorded by AVS, and sent to your secondary node.
Supposedly you can just do manual synchronizations periodically instead of real-time, but then what benefit does AVS have over ZFS snapshots…
It would just add a lot of complexity. We had a few Sun engineers come to our site, and dissuaded us from AVS for those exact reasong (though Sun’s sales people said AVS was the best thing since sliced bread)
Comment from chris
Time May 29, 2009 at 4:05 pm
i just downloaded your script to play with and when i run
root@CAM-SAN-T21:~# ./replicate.ksh SAN/ISCSI/cam-vmfs-samba SAN/Replication/cam-vmfs-samba 10.66.227.200
: unknown option
Comment from chris
Time May 29, 2009 at 4:10 pm
ahh i figured it out i had to run the file through dos2unix
Comment from Maurice
Time November 6, 2009 at 6:14 am
Why not dispense with cron and have the script enter a never-ending loop to do the snapshot? Then the next snapshot always begins immediately after the last one finished.
Comment from sturban
Time April 1, 2009 at 3:28 am
Is that true about the AVS and ZFS resilvering?? -I was considering AVS but that has put me off considerably now.