Note: The information contained in this post may be outdated! Here’s a small shell script for doing the recrawl process in nutch. You might have to change certain lines because I did some customizations, but it should work for you too 🙂
In order to get our meta descriptions displayed in the results we need to write a plugin that extends 2 different extension points.
I thought an ideal solution would be telling nutch to ignore specific sections. A good and common practice doing this kind of stuff is creating HTML comment tags.