Tuesday, 29 May 2012

Time Machine, Netatalk, ZFS and OpenIndiana

OK, so I probably spent too much time on this. Maybe it would have been better just to have bought a NAS thingy with supported Time Machine out of the box. But where's the challenge? Where's the learning opportunity in that?

My setup is now working, and I have a few observations.

Firstly, do not, under any circumstances, enable de-dupe in ZFS. It absolutely kills performance. I found it quicker to copy off the data I needed, delete the pool and re-creating, than just deleting the directory tree.

Now, there probably are circumstances where you might want to enable it, on a properly specced system, but please adopt a "do not enable" default position :) believe me, it is a good move.

Secondly, watch out for dodgy hardware. When you get an error you can't explain, seems random and no one else has experienced, then check your hardware. I was getting weird AFP commands the server didn't understand and weird memcpy errors, and eventually, after complete silence from the mailing list, decided to try another piece of hardware, and voila, it worked perfectly and has continued to work perfectly so far.

Now ZFS is wonderful, and the main reason for me using Open Indiana on my TM backup box. I just know that the data on the disk is correct, hasn't succumbed to bit rot and will be recoverable should the rot set in. I was also able to move the disk sets from one machine to another with great ease - just a zpool export, unplug the disk, then zpool import to start the pool up on the target machine. Annoyingly, hotplug doesn't seem to work on the hardware we have, so I had to restart the machine to be able to see the disks, however given its nature as a backup machine this shouldn't be too much of an issue in the future.

And I also used zfs send/recv to move a file system off one pool onto another, which worked very well.

I had the server backup, which isn't Time Machine, but BackupPC running on the old OpenIndiana server, the one with the dodgy hardware, so wanted to move that off. Initially tried using zfs send thus:


zfs send srcfs@snap1 |ssh id@host pfexec zfs recv -F dstfs


(courtesy of the surprisingly good Oracle docs)

however this was taking too long, so I stopped the backup running, zfs exported the pool and imported it into the new machine, and ran it disk to disk, and it worked much quicker (as you'd expect!)

So now I have everything running quietly in the background, monitoring and working lovely. I need to do a few tests to ensure that things are properly recoverable (since the only actual backup is one which actually restores properly) but I'm reasonably sure this will be fine.

Tick!

Monday, 23 April 2012

Bad Unicorn!

Unicorn is what we use for our application serving, and overall, we're very pleased - it does no-downtime reloads and generally makes life nice and easy.

However, it was causing an issue whereby it was getting caught in a restart loop, which was getting logged, then, about lunchtime, filling up the whole server with about 170GB of log file. Not healthy.

Eventually tracked it down to a problem with logrotate - I had configured it thus:

/var/log/unicorn/appname.*.log {
    daily
    missingok
    rotate 52
    compress
    delaycompress
    notifempty
    create 640 vault vault
    postrotate
[ ! -f /var/run/unicorn/appname.pid ] || kill -USR1 `cat /var/run/unicorn/appname.pid`
    endscript
}

However, after seeing this post, I discovered that the postrotate part was at fault - it was doing the first rotation, then sending the signal which reopened all the file handles. However, before that could be completed, it finished the next logfile, sent the same signal which caught unicorn in the midst of reopening files and tried to make it do the same again. The upshot of which was that it kept restarting and failing and logging the fact.

So I've changed postrotate to lastaction, which will only runs the script after all 3 log files have been rotated - USR1 signal will reopen all logfiles, so there's no need to send it more than once.

Should be all fixed now - with Chef updated so that all the right scripts are in place.

One nice side effect of this, though, is that I've also worked some more on using Graphite to display more nice graphs. When I started seeing this again last week, I knew if I spotted a graph of rapidly rising disk usage that would alert me to the problem, and it could be fixed before space ran out. Nice in theory, though in practice, since it happened at a weekend, I wasn't looking. However I now have nice graphite graphs displayed on our monitoring view, which was powered by CactiView, but it is now powered by a mixture of CactiView and a hacked up version which displays graphite graphs as well. Needs some more work, to make it nicer, its there and it works. Graphite is great, and more on this in a later post.

Sunday, 18 March 2012

CouchDB Re-Index

On Friday we discovered that our CouchDB instances had inconsistant results for queries to their indexes. ie the same query on two different servers gave different results on the same data.

Checking the logs revealed that on of them had problems in the index.

Reindexing for most of our databases takes a few minutes, however one of them has about 5 million documents in it, and this takes a good 12 hours to re-index completely.

So to re-index a CouchDB database, take 4 steps:

1. Take it offline - move the work to another one in the cluster
2. Delete the current indexes - they're in a sub-directory of the CouchDB data directory, preceded by a . , named after the database and has _design appended. Renaming is another option for the faint hearted
3. Restart Couchdb, so it notices that the indexes have gone
4. Access one view for each design document - you only need to do one, since it will index against all the views at once.

I've done a script which automate step 4. One of the issues with CouchDB replication is that a replicated document doesn't update the view (unlike a directly saved document), so I've a script which pokes each view every minute or so to stop any staleness building up. We use couchrest model, so it uses those classes to do the accessing. This means the script is a little specific to us, so I hope you'll get the idea


#!/usr/bin/env ruby
# poke all the classes in the database
require 'couchrest'
require 'couchrest_model'
require 'will_paginate'
require 'will_paginate_couchrest'
SERVER = CouchRest.new("http://localhost:5984")
DB     = SERVER.database!("databasename")
require '/opt/local/apps/couchrestclasses.rb'
while true do
 
  puts "waking .... Starting CouchrestClass"
  begin
    blah=CouchrestClass.all(:limit => 10)
  rescue
    puts "CouchrestClass Done"
  end
 
               # repeat the above once for each couchrest class
  puts "done... sleeping"
  sleep 20
end

Thursday, 15 March 2012

Lion Weirdness

We've got 2 Lion machines in the office - one is my laptop, and I discovered one of the newer iMacs is also running Lion.

I obviously started experimenting with backups on my laptop and used a named account (chris) to do the Time Machine backup, and it worked OK. Everywhere else uses a generic "tm" account to connect to the server, and this also works.

However, I thought, I should standardise, so I changed the ownership of my backup on the server and connected with the "tm" user, and it errored! Said that the server didn't support some AFP features it needed. So I changed the ownership back to "chris" and it went back to working.

So, today, I try to do the other Lion machine in the office, and it gives the same error! Creating a new user account on the server and connecting with that made it work.

Lots of weirdness, since, as far as I can tell, there really isn't much of a difference.

But hey, it works :)

Only 6 more machines to setup...

Tuesday, 13 March 2012

Time Machine on OpenIndiana

We've been using a few ReadyNAS boxes for Time Machine for a while now, but they're not without their problems - they're slow, and a bit limited in capacity. The are getting on - they're 1000S model, so pretty much original - they even pre-date Netgear's acquisition of Infrant! It also doesn't work very well with Lion for Time Machine.

So I've been playing with OpenSolaris, and now OpenIndiana, since I love ZFS - we use Sun/Oracle 7000 series storage on our production systems, so having something equivalent in the office is sensible.

So to build a Time Machine the main component needed is and Apple Filing Protocol  server.

The Open Indiana has a good wiki for documentation, and http://wiki.openindiana.org/oi/Using+OpenIndiana+as+a+storage+server and http://wiki.openindiana.org/oi/Netatalk are very good guides to get Netatalk up and running.

Netatalk works out of the box - merely create a tm user account, and a line in
/usr/local/etc/netatalk/AppleVolumes.default
consisting of:
/space2/timemachine/ timemachine options:tm

where /space2/timemachine/ is a zfs filesystem specially created.

One final tweak needed is in /usr/local/etc/netatalk/afpd.conf is to change the following line

- -tcp -noddp -uamlist uams_dhx.so,uams_dhx2.so -nosavepassword -setuplog "default log_debug"

to

- -tcp -noddp -uamlist uams_dhx.so,uams_dhx2_passwd.so -nosavepassword -setuplog "default log_debug"

This stops the daemon crashing as odd points.

Time Machine Monitoring

Time Machine is great - just set it up and forget about it, and it backs up all you files automatically.

In theory.

In practice it can just stop, and not backup for no apparant reason. The backup disk could fill up. Someone could disable it. There could be errors. And my users don't necessarily report this and so I've got no way of knowing.

So, we need monitoring.

I found this post:
http://smoove-operator.blogspot.com/2010/09/monitoring-timemachine-backups-with.html
which grepped the logs looking for backup ending, uploaded this information to the nagios server which monitored it.

Interesting, I thought, but not quite what I'm after. Most of our Macs are desk bound, so they're on the office network. And they're switched off at night, generally, so any periodic job is likely to fail to run at the right moment.

All the script is doing is to return a date stamp of the last backup done, so how's about we use snmp to return that to the monitoring.

Easy enough to adapt the script:


#!/usr/bin/env ruby
# Get the last backup time we have, with no newlinelast_backup = `/usr/bin/syslog -T sec -F '\$Time - \$Sender -\$Message' | grep backupd | grep 'Backup completed' | tail -1`last_backup.chomp
# Make sure it exists - exit if notif !last_backup.eql? ""
  # Get the unix timestamp out of the last message  backup_stamp = (last_backup.split "-")[0]
  puts backup_stamp
else  puts 0end

First off I've re-written it in Ruby - just a personal (and company) preference. Now just outputs the seconds since the epoch of the last successful backup.

stick in into /usr/local/bin/tm_check and add into /etc/snmp/snmpd.conf:

exec tm_check /usr/local/bin/tm_check
startup snmpd:

sudo launchctl load -w /System/Library/LaunchDaemons/org.net-snmp.snmpd.plist


and the client side is ready to go.

We use Nagios for monitoring, which we use to monitor our servers etc. However there are a few caveats which occurred to be when thinking about desktop machines:

1. I don't care if the machine is up or down. In fact, I really don't want to be in a position where that is recorded at all. It is too close to watching what the employee is doing - ie when they're in in the morning and leave at night. Not my job! So need to stop host checks.

2. Similary, if the snmp probe doesn't return, then the machine is probably off, so let's not worry. So the check script records the last backup date in a file, and if the snmp times out then file cache date is returned - this is valid since that date is the worst case date.

3. Don't tell me by email and especially not SMS. We have a screen on the wall which shows current alerts (using NagLite) and any failures will be shown there. So, my host template has:


  notification_options          n
  active_checks_enabled         0

included in it, and

  notification_options          n

in the service template.

Finally, the check script - I've just adapted one of the existing ones to give a framework and added in the snmpget to get the last date, thus its still in perl :) Get it here.

The cache directory is in: /var/cache/nagios3/tm_cache/ - there's one file per host (to prevent file updating race conditions)

And that's about it!