2014年5月7日 星期三

Hadoop Single Node 安裝紀錄

 
環境:Windows 8.1 Professional, 6GB memory, AMD Athlon 64 X2 Dual Core 5200+

 

前置準備:

  1. 安裝 Virtualbox & Virtualbox Extension pack
  2. 安裝 Ubuntu
    • Download Ubuntu 13.10 Server(not desktop) 32bit ISO image from official website.
    • Create a virtual machine, 2GB ram and 16GB disk space
    • Install a new Ubunto server, procedure as default recommend, final, in the choosing package step, select openssh
    • 安裝完成後,啟動guest server,準備安裝 guest additions,此處有點麻煩
    • Ubuntu啟動後,於命令視窗執行,
$ sudo apt-get update
$ sudo apt-get install dkms
$ sudo reboot


    • 在執行中的guest視窗,在上方功能表的[裝置]-->[CD/DVD裝置],將VboxGuestAdditions.iso掛載到虛擬光碟。
    • 於ubunto中執行 mount,並確認是否有正確掛載(eg. #ls /media/cdrom)
$ sudo mount /dev/cdrom /media/cdrom
$ cd /media/cdrom
$ sudo sh ./VBoxLinuxAdditions.run


    • Change the guests's network adapter settings to 'bridged',以便與Host Network互通


非必要:


為了方便Windows 8.1用遠端桌面連到guest machine,於Ubuntu安裝xrdp套件

$ sudo apt-get install xrdp
$ sudo add-apt-repository ppa:xubuntu-dev/xfce-4.10
$ sudo apt-get update
$ sudo apt-get install xfce4
$ echo xfce4-session >~/.xsession
$ sudo service xrdp restart


開啟Windows 遠端桌面,連線位址:ip:3389,測試解果解析度不能超過1280,色彩為16bit,否則會有問題(這應該跟VM的硬體太弱有關)
 

安裝 Hadoop


1. Install Java

# sudo apt-get update
# sudo apt-get install default-jdk
# java -version                                 <-- 確認Java版本
# dpkg --get-selections | grep java             <-- 確認已安裝的package


2. Create and Setup SSH Cerfificates
# sudo apt-get install openssh-server        <-- Install openssh server
# dpkg --get-selections | grep java          <-- 確認已安裝的package
# ssh-keygen -t rsa -P ""                    <-- 產生ssh key
# cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys    
# ssh localhost                              <--測試連線是否還需要輸入密碼
# exit


  (非必要)若要取消SSH遠端連線,可修改設定檔 /etc/ssh/sshd_config,加入以下內容:
     ListenAddress 127.0.0.1

3. Fetch and Install Hadoop
# wget http://apache.mirrors.tds.net/hadoop/common/current/hadoop-2.4.0.tar.gz  <--安裝目前release版本
# tar zxvf hadoop-2.4.0.tar.gz
# cp -r hadoop-2.4.0 /usr/local/hadoop


4. 修改設定檔
4.1 編輯 ~/.bashrc
# update-alternatives --config java                 <-- 確認java的安裝路徑
  /usr/lib/jvm/java-7-openjdk-i386/jre/bin/java     <-- 命令回應的java安裝路徑


  #vi ~/.bashrc,將以下文字加到檔案的最後面
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END

#source ~/.bashrc                    <-- 讓新設定的環境變數生效


4.2 編輯 /usr/local/hadoop/etc/hadoop/hadoop-env.sh,修改JAVA_HOME如下
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386


4.3 編輯 /usr/local/hadoop/etc/hadoop/core-site.xml,在<configuration>tag中加入以下文字:
<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
</property>


4.4 編輯 /usr/local/hadoop/etc/hadoop/yarn-site.xml,在<configuration>tag中加入以下文字:
<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


5.5 建立與編輯 /usr/local/hadoop/etc/hadoop/mapred-site.xml
#cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

編輯/usr/local/hadoop/etc/hadoop/mapred-site.xml,<configuration>tag中加入以下文字:
<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>


5.6 編輯 /usr/local/hadoop/etc/hadoop/hdfs-site.xml
# mkdir -p /usr/local/hadoop_store/hdfs/namenode
# mkdir -p /usr/local/hadoop_store/hdfs/datanode


編輯 /usr/local/hadoop/etc/hadoop/hdfs-site.xml,在<configuration>tag中加入以下文字:
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>

 
6. Format the New Hadoop Filesystem
以上所有的設定完成後,在啟動新建的Hadoop之前必須先格式化Hadoop Filesystem。
#hdfs namenode -format

此命令只需執行一次,如果再次執行,Hadoop Filesystem的資料將全數被清除。

7. 啟動 Hadoop
# start-dfs.sh
# start-yarn.sh
# jps

   應該要看到:Jps, NodeManager, NameNode, SecondaryNameNode, DataNode, ResourceManager。如果有缺,就表示有問題。

8. 測試
Access web interfaces:

Test Hadoop:
# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar TestDFSIO -write
# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar TestDFSIO -clean
# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar pi 2 5


8. 最後,停止Hadoop
#stop-dfs.sh && stop-yarn.sh


 
Reference:
https://www.digitalocean.com/community/articles/how-to-install-hadoop-on-ubuntu-13-10
http://www.ercoppa.org/Linux-Install-Hadoop-220-on-Ubuntu-Linux-1304-Single-Node-Cluster.htm



















沒有留言: