Skip to main content

Monitoring Varnish for random crashes

I'm using Varnish to cache the frontend of a site that a client is busy promoting.  It does a great job of reducing requests to my backend but is prone to random crashes.  I normally get about two weeks of uptime on this particular server, which is significantly lower than other places that I've deployed Varnish.

I just don't have enough information to work with to try and solve why the random crash is occurring.  The system log shows that a child process doesn't respond to CLI and so is killed.  The child never seems to be able to be brought up again.

My /var/log/messages file looks like this:

 08:31:45 varnishd[7888]: Child (16669) not responding to CLI, killing it.  
 08:31:45 varnishd[7888]: Child (16669) died signal=3  
 08:31:45 varnishd[7888]: child (25675) Started  
 08:31:45 varnishd[7888]: Child (25675) said Child starts  
 08:31:45 varnishd[7888]: Child (25675) said SMF.s0 mmap'ed 1073741824 bytes of 1073741824  
 08:32:19 varnishd[7888]: Child (25675) not responding to CLI, killing it.  
 08:32:21 varnishd[7888]: Child (25675) not responding to CLI, killing it.  
 08:32:21 varnishd[7888]: Child (25675) died signal=3  

Which doesn't give me a lot to work with.  I couldn't find anything in the documentation about this sort of problem.  I don't want to uninstall Varnish so I decided to rather look for a way to monitor the process.

I first tried Monit but after about two weeks my site was down.  After sshing onto the box and restarting Varnish I checked the monit logs.  Although it was able to recognize that Varnish had crashed, it was not able to successfully bring it back up.

My Monit log looked like this:

 [BST Apr 23 09:07:24] error  : 'varnish' process is not running  
 [BST Apr 23 09:07:24] info   : 'varnish' trying to restart  
 [BST Apr 23 09:07:24] info   : 'varnish' start: /etc/init.d/varnish  
 [BST Apr 23 09:07:54] error  : 'varnish' failed to start  
 [BST Apr 23 09:08:54] error  : 'varnish' process is not running  
 [BST Apr 23 09:08:54] info   : 'varnish' trying to restart  
 [BST Apr 23 09:08:54] info   : 'varnish' start: /etc/init.d/varnish  
 [BST Apr 23 09:09:24] error  : 'varnish' failed to start  
 [BST Apr 23 09:10:24] error  : 'varnish' process is not running  
 [BST Apr 23 09:10:24] info   : 'varnish' trying to restart  
 [BST Apr 23 09:10:24] info   : 'varnish' start: /etc/init.d/varnish  
 [BST Apr 23 09:10:54] error  : 'varnish' failed to start  
 [BST Apr 23 09:11:54] error  : 'varnish' service restarted 3 times within 3 cycles(s) - unmonitor  

My problem sounded a lot like this one on ServerFault so I looked for another way to monitor the process other than using Monit.

Instead of using daemonize, supervisord, or another similar program I'm trying out a simple shell script that I found at http://blog.unixy.net/2010/05/dirty-varnish-monitoring-script/.  The author says it's dirty, and I suppose it is, but it has the advantage of being dead simple and easy to control.   I've set it up as a cron job to run every five minutes.  Hopefully this will be a more effective way to make sure that Varnish doesn't stay dead for very long.

In case the source file goes down I saved a copy as a Gist:

Comments

Popular posts from this blog

Separating business logic from persistence layer in Laravel

There are several reasons to separate business logic from your persistence layer.  Perhaps the biggest advantage is that the parts of your application which are unique are not coupled to how data are persisted.  This makes the code easier to port and maintain. I'm going to use Doctrine to replace the Eloquent ORM in Laravel.  A thorough comparison of the patterns is available  here . By using Doctrine I am also hoping to mitigate the risk of a major version upgrade on the underlying framework.  It can be expected for the ORM to change between major versions of a framework and upgrading to a new release can be quite costly. Another advantage to this approach is to limit the access that objects have to the database.  Unless a developer is aware of the business rules in place on an Eloquent model there is a chance they will mistakenly ignore them by calling the ActiveRecord save method directly. I'm not implementing the repository pattern in all its ...

"Word of the Day" PHP script (with word list)

I was looking around for a way to generate a word of the day on the web and didn't find anything. So I coded a quick and dirty script to do it. Just in case anybody does a Google search and manages to find my blog: here is my Word of the Day PHP script : Copy this code snippet into a wordoftheday.php file: $file = fopen("interesting_words.txt","r"); $raw_string = fread($file,filesize("interesting_words.txt")); fclose($file); $words_array = explode("|",$raw_string); echo $words_array[array_rand($words_array)]; Of course the real issue I had was finding a list of interesting words in the right format. Here is the list of interesting words that I used: Copy this into a file called interesting_words.txt : ubiquitous : being or seeming to be everywhere at the same time; omnipresent| ecdysiast : a striptease artist| eleemosynary : of, relating to, or dependent on charity| gregious : c...

Solving Doctrine - A new entity was found through the relationship

There are so many different problems that people have with the Doctrine error message: exception 'Doctrine\ORM\ORMInvalidArgumentException' with message 'A new entity was found through the relationship 'App\Lib\Domain\Datalayer\UnicodeLookups#lookupStatus' that was not configured to cascade persist operations for entity: Searching through the various online sources was a bit of a nightmare.  The best documentation I found was at  http://www.krueckeberg.org/  where there were a number of clearly explained examples of various associations. More useful information about association ownership was in the Doctrine manual , but I found a more succinct explanation in the answer to this question on StackOverflow . Now I understood better about associations and ownership and was able to identify exactly what sort I was using and the syntax that was required. I was implementing a uni-directional many to one relationship, which is supposedly one of the most simpl...