HiBench 7.0 Build and Run, a step-by-step guidance


This blog serves as a step-by-step guidance about how to build and run HiBench 7.0 (https://github.com/intel-hadoop/HiBench) on a Big Data cluster.

Big Data Cluster used here is deployed through Bigtop 1.3.0. Components I installed are: 
  • Hadoop: 2.8.4
  • Spark: 2.2.1

Regarding how I deploy Bigtop 1.3.0 on multiple physical nodes, I have another blog, which is posted on https://collaborate.linaro.org/pages/viewpage.action?pageId=115311164

1. HiBench Build


This is to install HiBench 7.0 on Bigtop master node.

1.1. Install Maven

, which indicates 3.5.x version is required for Bigtop.
Refer to:
, for how to download and install.
# cd /usr/local/src
# tar -xf apache-maven-3.5.4-bin.tar.gz
# mv apache-maven-3.5.4/ apache-maven/
# cd /etc/profile.d/
# vim maven.sh
# Apache Maven Environment Variables
# MAVEN_HOME for Maven 1 - M2_HOME for Maven 2
export M2_HOME=/usr/local/src/apache-maven
export PATH=${M2_HOME}/bin:${PATH}
# chmod +x maven.sh
# source /etc/profile.d/maven.sh
# mvn --version
Apache Maven 3.5.4 ...

1.2. Build HiBench-7.0:

$ cd HiBench
$ git checkout -b working-hibench-7.0 HiBench-7.0
$ sudo yum -y install bc vim
$ mvn -Dspark=2.2 -Dscala=2.11 clean package  (Ref.)
[INFO] BUILD SUCCESS
Note: spark version 2.2 comes from bigtop.bom.

2. HiBench Benchmarking


2.1. Config Hadoop.conf and Spark.conf

$ cd HiBench

2.1.1. Hadoop.conf

$ cp conf/hadoop.conf.template conf/hadoop.conf
$ vi conf/hadoop.conf

keydescriptionvalue
hibench.hadoop.homeThe Hadoop installation location/usr/lib/hadoop
hibench.hadoop.executableThe path of hadoop executable. For Apache Hadoop, it is/YOUR/HADOOP/HOME/bin/hadoop{hibench.hadoop.home}/bin/hadoop
hibench.hadoop.configure.dirHadoop configuration directory. For Apache Hadoop, it is/YOUR/HADOOP/HOME/etc/hadoop{hibench.hadoop.home}etc/hadoop
hibench.hdfs.masterThe root HDFS path to store HiBench data, i.e. hdfs://localhost:8020/user/usernamehdfs://d05-001.bigtop.deploy:8020
hibench.hadoop.releaseHadoop release provider. Supported value: apache, cdh5, hdpapache

2.1.2. Spark.conf

$ cp conf/spark.conf.template conf/spark.conf
$ vi conf/spark.conf
hibench.spark.home      /usr/lib/spark

2.2. Additional Steps to Fix Known Issues

Please go through each of these subsections and do what they requested. Otherwise, problems may show up when running the benchmarking.

2.2.1. Set `hibench.hadoop.examples.test.jar`

Bigtop deployed hadoop is located in /usr/lib/hadoop, which has a different location for `hibench.hadoop.examples.test.jar`. Need to modify it.
  • Solution 1:
Add its setting into
$ vi conf/hibench.conf
hibench.hadoop.examples.test.jar                 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.8.4-tests.jar
  • Solution 2:
Modify
$ vi bin/functions/load_config.py
diff --git a/bin/functions/load_config.py b/bin/functions/load_config.py
index 61101dc..041e8e6 100755
--- a/bin/functions/load_config.py
+++ b/bin/functions/load_config.py
@@ -423,7 +423,7 @@ def probe_hadoop_examples_test_jars():
        examples_test_jars_candidate_hdp0 = HibenchConf[
            'hibench.hadoop.home'] + "/../hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient-tests.jar"
        examples_test_jars_candidate_hdp1 = HibenchConf[
-            'hibench.hadoop.home'] + "/../hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar"
+            'hibench.hadoop.home'] + "/../hadoop-mapreduce/hadoop-mapreduce-client-jobclient*-tests.jar"

        examples_test_jars_candidate_list = [
            examples_test_jars_candidate_apache0,

2.2.2. JAVA_HOME not set

Need to set JAVA_HOME.
$ sudo vi /etc/profile.d/javahome.sh
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
$ sudo chmod a+x /etc/profile.d/javahome.sh
$ . /etc/profile.d/javahome.sh

2.2.3. HDFS Permission Denied

Message like this happens when run wordcount prepare.sh as 'guodong':
org.apache.hadoop.security.AccessControlException: Permission denied: user=guodong, access=WRITE, inode="/user":hdfs:hadoop:drwxr-xr-x
It means: user 'guodong' want to WRITE into this hdfs folder '/user'. However, folder '/user' is owned by user 'hdfs' and by group 'hadoop', with permission drwxr-xr-x.
‘guodong' doesn't belong to group 'hadoop'. So to fix the issue, Need to change folder '/user' permission to be WRITE-able by group and by any, something similar to 'chmod'. As:
$ hadoop fs -chmod 755  hdfs://d05-001.bigtop.deploy:8020/user

2.2.4. Spark ClassNotFoundException

When run ./bin/workload/micro/wordcount/spark/run.sh, Error message goes at 'spark-submit':
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
To fix, need to modify SPARK_DIST_CLASSPATH to include Hadoop’s package jars. In HiBench/Bigtop env., run this:
$ export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Also, make it automatic run on each reboot:
$ sudo vi /etc/profile.d/sparkdistclasspath.sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
$ sudo chmod a+x /etc/profile.d/sparkdistclasspath.sh
$ . /etc/profile.d/sparkdistclasspath.sh

2.3. Benchmark Running

Note: please ensure firewalld is disabled on all machines. Reboot can make firewalld up again.

2.3.1. Hadoop: Micro/wordcount

$ ./bin/workloads/micro/wordcount/prepare/prepare.sh
$ ./bin/workloads/micro/wordcount/hadoop/run.sh

2.3.2. Spark: Micro/wordcount

$ ./bin/workloads/micro/wordcount/prepare/prepare.sh
$ ./bin/workloads/micro/wordcount/spark/run.sh

2.4. Run_all.sh

Modify conf/benchmarks.lst to include only benchmarkings you need.
Modify conf/frameworks.lst to contain only frameworks you need.
Then, run:
$ ./bin/run_all.sh

2.4.1. Physical memory for each container

Default physical memory for each container is set to 1G bytes. Although it is ok for most test tasks, it doesn't work for nutchindexing. Please refer to later section: "Nutchindexing: Beyond Physical Memory Limits" for error messages that pop up.
To fix that, need to update
In file: /usr/lib/hadoop/etc/hadoop/mapred-site.xml, I found these two parameters:
   mapreduce.map.java.opts
   -Xmx1024m
   mapreduce.reduce.java.opts
   -Xmx1024m
To update, modity the above to 4096m (4GBytes).
$ sudo vi /usr/lib/hadoop/etc/hadoop/mapred-site.xml

=========
~Finished~
=========

Comments

Salah Eddine said…
Hi i can i have more help about that i have trouble to build this ,Thanks

Popular posts from this blog

Issues I met when running NutchIndexing and How I fixed them