// archives

forbidden

This tag is associated with 1 posts

Blocking web crawlers on lighttpd

Nutch did ignore my robots.txt (for whatever reason, I was unable to figure out why), so I had to find another way to forbid those directories for the crawler. I finally came up with this neat piece of config for lighty: - throws an HTTP 403 when matching our defined User Agent and URL.