Posts

Issues I met when running NutchIndexing and How I fixed them

Image
1. Background Recently I set up a Big Data cluster (using  Bigtop 1.3.0 ) with three Arm servers and tried to HiBench. My goal is to make all menchmarkings in  HiBench 7.0  run and pass on Arm servers. It all went well until it comes to NutchIndexing. NutchIndexing is a benchmark which "tests the indexing sub-system in Nutch, a popular open source (Apache project) search engine. The workload uses the automatically generated Web data whose hyperlinks and words both follow the Zipfian distribution with corresponding parameters. " This post lists all the issues I met when running NutchIndexing, and also how I fixed them. Information about how to setup a cluster with Bigtop 1.3.0 and how to install and run HiBench7.0, please see my other posts, listed  here . Overall, my test set up is like this, with 'Head Node' is master, and 'Node 2' and 'Node 3' are slaves. From here on, I will refer them as node-001, node-002, and node-003. Profile s