Note: The information contained in this post may be outdated!
Nutch did ignore my robots.txt (for whatever reason, I was unable to figure out why), so I had to find another way to forbid those directories for the crawler.
I finally came up with this neat piece of config for lighty:
$HTTP["useragent"] =~ "(Nutch|Google|FooBar)" { $HTTP["url"] =~ "^(/one/|/two/|/three/)" { url.access-deny = ( "" ) } }
– throws an HTTP 403 when matching our defined User Agent and URL.