I'm using Varnish to cache the frontend of a site that a client is busy promoting. It does a great job of reducing requests to my backend but is prone to random crashes. I normally get about two weeks of uptime on this particular server, which is significantly lower than other places that I've deployed Varnish.
I just don't have enough information to work with to try and solve why the random crash is occurring. The system log shows that a child process doesn't respond to CLI and so is killed. The child never seems to be able to be brought up again.
My /var/log/messages file looks like this:
Which doesn't give me a lot to work with. I couldn't find anything in the documentation about this sort of problem. I don't want to uninstall Varnish so I decided to rather look for a way to monitor the process.
I first tried Monit but after about two weeks my site was down. After sshing onto the box and restarting Varnish I checked the monit logs. Although it was able to recognize that Varnish had crashed, it was not able to successfully bring it back up.
My Monit log looked like this:
My problem sounded a lot like this one on ServerFault so I looked for another way to monitor the process other than using Monit.
Instead of using daemonize, supervisord, or another similar program I'm trying out a simple shell script that I found at http://blog.unixy.net/2010/05/dirty-varnish-monitoring-script/. The author says it's dirty, and I suppose it is, but it has the advantage of being dead simple and easy to control. I've set it up as a cron job to run every five minutes. Hopefully this will be a more effective way to make sure that Varnish doesn't stay dead for very long.
In case the source file goes down I saved a copy as a Gist:
I just don't have enough information to work with to try and solve why the random crash is occurring. The system log shows that a child process doesn't respond to CLI and so is killed. The child never seems to be able to be brought up again.
My /var/log/messages file looks like this:
08:31:45 varnishd[7888]: Child (16669) not responding to CLI, killing it.
08:31:45 varnishd[7888]: Child (16669) died signal=3
08:31:45 varnishd[7888]: child (25675) Started
08:31:45 varnishd[7888]: Child (25675) said Child starts
08:31:45 varnishd[7888]: Child (25675) said SMF.s0 mmap'ed 1073741824 bytes of 1073741824
08:32:19 varnishd[7888]: Child (25675) not responding to CLI, killing it.
08:32:21 varnishd[7888]: Child (25675) not responding to CLI, killing it.
08:32:21 varnishd[7888]: Child (25675) died signal=3
Which doesn't give me a lot to work with. I couldn't find anything in the documentation about this sort of problem. I don't want to uninstall Varnish so I decided to rather look for a way to monitor the process.
I first tried Monit but after about two weeks my site was down. After sshing onto the box and restarting Varnish I checked the monit logs. Although it was able to recognize that Varnish had crashed, it was not able to successfully bring it back up.
My Monit log looked like this:
[BST Apr 23 09:07:24] error : 'varnish' process is not running
[BST Apr 23 09:07:24] info : 'varnish' trying to restart
[BST Apr 23 09:07:24] info : 'varnish' start: /etc/init.d/varnish
[BST Apr 23 09:07:54] error : 'varnish' failed to start
[BST Apr 23 09:08:54] error : 'varnish' process is not running
[BST Apr 23 09:08:54] info : 'varnish' trying to restart
[BST Apr 23 09:08:54] info : 'varnish' start: /etc/init.d/varnish
[BST Apr 23 09:09:24] error : 'varnish' failed to start
[BST Apr 23 09:10:24] error : 'varnish' process is not running
[BST Apr 23 09:10:24] info : 'varnish' trying to restart
[BST Apr 23 09:10:24] info : 'varnish' start: /etc/init.d/varnish
[BST Apr 23 09:10:54] error : 'varnish' failed to start
[BST Apr 23 09:11:54] error : 'varnish' service restarted 3 times within 3 cycles(s) - unmonitor
My problem sounded a lot like this one on ServerFault so I looked for another way to monitor the process other than using Monit.
Instead of using daemonize, supervisord, or another similar program I'm trying out a simple shell script that I found at http://blog.unixy.net/2010/05/dirty-varnish-monitoring-script/. The author says it's dirty, and I suppose it is, but it has the advantage of being dead simple and easy to control. I've set it up as a cron job to run every five minutes. Hopefully this will be a more effective way to make sure that Varnish doesn't stay dead for very long.
In case the source file goes down I saved a copy as a Gist:
Comments
Post a Comment