Recrawl script for nutch

Note: The information contained in this post may be outdated!

Here’s a small shell script for doing the recrawl process in nutch. You might have to change certain lines because I did some customizations, but it should work for you too 🙂

recrawl.sh

2 Gedanken zu „Recrawl script for nutch“

  1. It looks like you run nutch parse, but your fetcher isn’t called with -noParsing, meaning that the call to parse isn’t needed as fetcher will parse by default.

  2. I want to use the automatic crawling in nutch-1.0 by using timer.
    How can i use your recrawl script?
    Please give me a few hint.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.