http.agent.name nutch sgripon.net plugin.includes protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic|indexer-elastic Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. In order to use HTTPS please enable protocol-httpclient, but be aware of possible intermittent problems with the underlying commons-httpclient library. elastic.host localhost The hostname to send documents to using TransportClient. Either host and port must be defined or cluster. elastic.port 9300 The port to connect to using TransportClient. elastic.cluster elasticsearch The cluster name to discover. Either host and potr must be defined or cluster. elastic.index nutch The name of the elasticsearch index. Will normally be autocreated if it doesn't exist. elastic.max.bulk.docs 250 The number of docs in the batch that will trigger a flush to elasticsearch. elastic.max.bulk.size 2500500 The total length of all indexed text in a batch that will trigger a flush to elasticsearch, by checking after every document for excess of this amount.