I'm using Varnish to cache a fairly busy property site. Varnish works like a bomb for normal users and has greatly improved our page load speed.
For bots that are scraping the site, presumably to add the property listings to their own site, though the cache is next to useless since the bots are sequentially trawling through the whole site.
I decided to use fail2ban to block IP's who hit the site too often.
The first step in doing so was to enable a disk based access log for Varnish so that fail2ban will have something to work with.
This means setting up varnishncsa. Add this to your /etc/rc.local file:
This starts up varnishncsa in daemon mode and appends Varnish access attempts to /var/log/varnish/access.log
Now edit or create /etc/logrotate.d/varnish and make an entry to rotate this access log:
Install fail2ban. On Ubuntu you can apt-get install fail2ban
Edit /etc/fail2ban/jail.conf and add a block like this:
This means that if a person has 300 (maxretry) requests in 300 (findtime) seconds then a ban of 600 (bantime) seconds is applied.
We need to create the filter in /etc/fail2ban/filter.d/http-get-dos.conf to create the pattern to match the jail:
Now lets test the regex against the log file so that we can see if it is correctly picking up the IP addresses of the visitors:
fail2ban-regex /var/log/varnish/access.log /etc/fail2ban/filter.d/http-get-dos.conf
You should see a list of IP addresses and times followed by summary statistics.
When you restart fail2ban your scraper protection should be up and running.
For bots that are scraping the site, presumably to add the property listings to their own site, though the cache is next to useless since the bots are sequentially trawling through the whole site.
I decided to use fail2ban to block IP's who hit the site too often.
The first step in doing so was to enable a disk based access log for Varnish so that fail2ban will have something to work with.
This means setting up varnishncsa. Add this to your /etc/rc.local file:
varnishncsa -a -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid
This starts up varnishncsa in daemon mode and appends Varnish access attempts to /var/log/varnish/access.log
Now edit or create /etc/logrotate.d/varnish and make an entry to rotate this access log:
/var/log/varnish/*log {
create 640 http log
compress
postrotate
/bin/kill -USR1 `cat /var/run/varnishncsa.pid 2>/dev/null` 2> /dev/null || true
endscript
}
Install fail2ban. On Ubuntu you can apt-get install fail2ban
Edit /etc/fail2ban/jail.conf and add a block like this:
[http-get-dos]
enabled = true
port = http,https
filter = http-get-dos
logpath = /var/log/varnish/access.log
maxretry = 300
findtime = 300
#ban for 5 minutes
bantime = 600
action = iptables[name=HTTP, port=http, protocol=tcp]
This means that if a person has 300 (maxretry) requests in 300 (findtime) seconds then a ban of 600 (bantime) seconds is applied.
We need to create the filter in /etc/fail2ban/filter.d/http-get-dos.conf to create the pattern to match the jail:
# Fail2Ban configuration file
#
# Author: http://www.go2linux.org
#
[Definition]
# Option: failregex
# Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.
# You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.
failregex = ^<HOST>.*"GET
# Option: ignoreregex
# Notes.: regex to ignore. If this regex matches, the line is ignored.
# Values: TEXT
#
ignoreregex =
Now lets test the regex against the log file so that we can see if it is correctly picking up the IP addresses of the visitors:
fail2ban-regex /var/log/varnish/access.log /etc/fail2ban/filter.d/http-get-dos.conf
You should see a list of IP addresses and times followed by summary statistics.
When you restart fail2ban your scraper protection should be up and running.
Comments
Post a Comment