However, it was causing an issue whereby it was getting caught in a restart loop, which was getting logged, then, about lunchtime, filling up the whole server with about 170GB of log file. Not healthy.
Eventually tracked it down to a problem with logrotate - I had configured it thus:
/var/log/unicorn/appname.*.log {
daily
missingok
rotate 52
compress
delaycompress
notifempty
create 640 vault vault
postrotate
[ ! -f /var/run/unicorn/appname.pid ] || kill -USR1 `cat /var/run/unicorn/appname.pid`
endscript
}
However, after seeing this post, I discovered that the postrotate part was at fault - it was doing the first rotation, then sending the signal which reopened all the file handles. However, before that could be completed, it finished the next logfile, sent the same signal which caught unicorn in the midst of reopening files and tried to make it do the same again. The upshot of which was that it kept restarting and failing and logging the fact.
So I've changed postrotate to lastaction, which will only runs the script after all 3 log files have been rotated - USR1 signal will reopen all logfiles, so there's no need to send it more than once.
Should be all fixed now - with Chef updated so that all the right scripts are in place.
One nice side effect of this, though, is that I've also worked some more on using Graphite to display more nice graphs. When I started seeing this again last week, I knew if I spotted a graph of rapidly rising disk usage that would alert me to the problem, and it could be fixed before space ran out. Nice in theory, though in practice, since it happened at a weekend, I wasn't looking. However I now have nice graphite graphs displayed on our monitoring view, which was powered by CactiView, but it is now powered by a mixture of CactiView and a hacked up version which displays graphite graphs as well. Needs some more work, to make it nicer, its there and it works. Graphite is great, and more on this in a later post.