HiBench 7.0 Build and Run, a step-by-step guidance
This blog serves as a step-by-step guidance about how to build and run HiBench 7.0 (https://github.com/intel-hadoop/HiBench) on a Big Data cluster.
Big Data Cluster used here is deployed through Bigtop 1.3.0. Components I installed are:
- Hadoop: 2.8.4
- Spark: 2.2.1
Regarding how I deploy Bigtop 1.3.0 on multiple physical nodes, I have another blog, which is posted on https://collaborate.linaro.org/pages/viewpage.action?pageId=115311164
1. HiBench Build
This is to install HiBench 7.0 on Bigtop master node.
1.1. Install Maven
, which indicates 3.5.x version is required for Bigtop.
Refer to:
, for how to download and install.
# cd /usr/local/src
# tar -xf apache-maven-3.5.4-bin.tar.gz
# mv apache-maven-3.5.4/ apache-maven/
# cd /etc/profile.d/
# vim maven.sh
# Apache Maven Environment Variables
# MAVEN_HOME for Maven 1 - M2_HOME for Maven 2
export M2_HOME=/usr/local/src/apache-maven
export PATH=${M2_HOME}/bin:${PATH}
# chmod +x maven.sh
# source /etc/profile.d/maven.sh
# mvn --version
Apache Maven 3.5.4 ...
1.2. Build HiBench-7.0:
$ git clone https://github.com/intel-hadoop/HiBench
$ cd HiBench
$ git checkout -b working-hibench-7.0 HiBench-7.0
$ sudo yum -y install bc vim
[INFO] BUILD SUCCESS
Note: spark version 2.2 comes from bigtop.bom.
2. HiBench Benchmarking
2.1. Config Hadoop.conf and Spark.conf
$ cd HiBench
2.1.1. Hadoop.conf
$ cp conf/hadoop.conf.template conf/hadoop.conf
$ vi conf/hadoop.conf
key | description | value |
hibench.hadoop.home | The Hadoop installation location | /usr/lib/hadoop |
hibench.hadoop.executable | The path of hadoop executable. For Apache Hadoop, it is/YOUR/HADOOP/HOME/bin/hadoop | {hibench.hadoop.home}/bin/hadoop |
hibench.hadoop.configure.dir | Hadoop configuration directory. For Apache Hadoop, it is/YOUR/HADOOP/HOME/etc/hadoop | {hibench.hadoop.home}etc/hadoop |
hibench.hdfs.master | The root HDFS path to store HiBench data, i.e. hdfs://localhost:8020/user/username | hdfs://d05-001.bigtop.deploy:8020 |
hibench.hadoop.release | Hadoop release provider. Supported value: apache, cdh5, hdp | apache |
2.1.2. Spark.conf
$ cp conf/spark.conf.template conf/spark.conf
$ vi conf/spark.conf
hibench.spark.home /usr/lib/spark
2.2. Additional Steps to Fix Known Issues
Please go through each of these subsections and do what they requested. Otherwise, problems may show up when running the benchmarking.
2.2.1. Set `hibench.hadoop.examples.test.jar`
Bigtop deployed hadoop is located in /usr/lib/hadoop, which has a different location for `hibench.hadoop.examples.test.jar`. Need to modify it.
- Solution 1:
Add its setting into
$ vi conf/hibench.conf
hibench.hadoop.examples.test.jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.8.4-tests.jar
- Solution 2:
Modify
$ vi bin/functions/load_config.py
diff --git a/bin/functions/load_config.py b/bin/functions/load_config.py
index 61101dc..041e8e6 100755
--- a/bin/functions/load_config.py
+++ b/bin/functions/load_config.py
@@ -423,7 +423,7 @@ def probe_hadoop_examples_test_jars():
examples_test_jars_candidate_hdp0 = HibenchConf[
'hibench.hadoop.home'] + "/../hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient-tests.jar"
examples_test_jars_candidate_hdp1 = HibenchConf[
- 'hibench.hadoop.home'] + "/../hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar"
+ 'hibench.hadoop.home'] + "/../hadoop-mapreduce/hadoop-mapreduce-client-jobclient*-tests.jar"
examples_test_jars_candidate_list = [
examples_test_jars_candidate_apache0,
2.2.2. JAVA_HOME not set
Need to set JAVA_HOME.
$ sudo vi /etc/profile.d/javahome.sh
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
$ sudo chmod a+x /etc/profile.d/javahome.sh
$ . /etc/profile.d/javahome.sh
2.2.3. HDFS Permission Denied
Message like this happens when run wordcount prepare.sh as 'guodong':
org.apache.hadoop.security.AccessControlException: Permission denied: user=guodong, access=WRITE, inode="/user":hdfs:hadoop:drwxr-xr-x
It means: user 'guodong' want to WRITE into this hdfs folder '/user'. However, folder '/user' is owned by user 'hdfs' and by group 'hadoop', with permission drwxr-xr-x.
‘guodong' doesn't belong to group 'hadoop'. So to fix the issue, Need to change folder '/user' permission to be WRITE-able by group and by any, something similar to 'chmod'. As:
$ hadoop fs -chmod 755 hdfs://d05-001.bigtop.deploy:8020/user
2.2.4. Spark ClassNotFoundException
When run ./bin/workload/micro/wordcount/spark/run.sh, Error message goes at 'spark-submit':
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
Reason analysis can be found in: https://spark.apache.org/docs/latest/hadoop-provided.html
To fix, need to modify SPARK_DIST_CLASSPATH to include Hadoop’s package jars. In HiBench/Bigtop env., run this:
$ export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Also, make it automatic run on each reboot:
$ sudo vi /etc/profile.d/sparkdistclasspath.sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
$ sudo chmod a+x /etc/profile.d/sparkdistclasspath.sh
$ . /etc/profile.d/sparkdistclasspath.sh
2.3. Benchmark Running
Note: please ensure firewalld is disabled on all machines. Reboot can make firewalld up again.
2.3.1. Hadoop: Micro/wordcount
$ ./bin/workloads/micro/wordcount/prepare/prepare.sh
$ ./bin/workloads/micro/wordcount/hadoop/run.sh
2.3.2. Spark: Micro/wordcount
$ ./bin/workloads/micro/wordcount/prepare/prepare.sh
$ ./bin/workloads/micro/wordcount/spark/run.sh
2.4. Run_all.sh
Modify conf/benchmarks.lst to include only benchmarkings you need.
Modify conf/frameworks.lst to contain only frameworks you need.
Then, run:
$ ./bin/run_all.sh
2.4.1. Physical memory for each container
Default physical memory for each container is set to 1G bytes. Although it is ok for most test tasks, it doesn't work for nutchindexing. Please refer to later section: "Nutchindexing: Beyond Physical Memory Limits" for error messages that pop up.
To fix that, need to update
In file: /usr/lib/hadoop/etc/hadoop/mapred-site.xml, I found these two parameters:
To update, modity the above to 4096m (4GBytes).
$ sudo vi /usr/lib/hadoop/etc/hadoop/mapred-site.xml
=========
~Finished~
=========
Comments