環境:Windows 8.1 Professional, 6GB memory, AMD Athlon 64 X2 Dual Core 5200+
前置準備:
- 安裝 Virtualbox & Virtualbox Extension pack
- 安裝 Ubuntu
- Download Ubuntu 13.10 Server(not desktop) 32bit ISO image from official website.
- Create a virtual machine, 2GB ram and 16GB disk space
- Install a new Ubunto server, procedure as default recommend, final, in the choosing package step, select openssh
- 安裝完成後,啟動guest server,準備安裝 guest additions,此處有點麻煩
- Ubuntu啟動後,於命令視窗執行,
$ sudo apt-get update $ sudo apt-get install dkms $ sudo reboot
- 在執行中的guest視窗,在上方功能表的[裝置]-->[CD/DVD裝置],將VboxGuestAdditions.iso掛載到虛擬光碟。
- 於ubunto中執行 mount,並確認是否有正確掛載(eg. #ls /media/cdrom)
$ sudo mount /dev/cdrom /media/cdrom $ cd /media/cdrom $ sudo sh ./VBoxLinuxAdditions.run
- Change the guests's network adapter settings to 'bridged',以便與Host Network互通
非必要:
為了方便Windows 8.1用遠端桌面連到guest machine,於Ubuntu安裝xrdp套件
$ sudo apt-get install xrdp
$ sudo add-apt-repository ppa:xubuntu-dev/xfce-4.10
$ sudo apt-get update
$ sudo apt-get install xfce4
$ echo xfce4-session >~/.xsession
$ sudo service xrdp restart
開啟Windows 遠端桌面,連線位址:ip:3389,測試解果解析度不能超過1280,色彩為16bit,否則會有問題(這應該跟VM的硬體太弱有關)
安裝 Hadoop
1. Install Java
# sudo apt-get update # sudo apt-get install default-jdk # java -version <-- 確認Java版本 # dpkg --get-selections | grep java <-- 確認已安裝的package
2. Create and Setup SSH Cerfificates
# sudo apt-get install openssh-server <-- Install openssh server # dpkg --get-selections | grep java <-- 確認已安裝的package # ssh-keygen -t rsa -P "" <-- 產生ssh key # cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys # ssh localhost <--測試連線是否還需要輸入密碼 # exit
(非必要)若要取消SSH遠端連線,可修改設定檔 /etc/ssh/sshd_config,加入以下內容:
ListenAddress 127.0.0.1
3. Fetch and Install Hadoop
# wget http://apache.mirrors.tds.net/hadoop/common/current/hadoop-2.4.0.tar.gz <--安裝目前release版本 # tar zxvf hadoop-2.4.0.tar.gz # cp -r hadoop-2.4.0 /usr/local/hadoop
4. 修改設定檔
4.1 編輯 ~/.bashrc
# update-alternatives --config java <-- 確認java的安裝路徑 /usr/lib/jvm/java-7-openjdk-i386/jre/bin/java <-- 命令回應的java安裝路徑
#vi ~/.bashrc,將以下文字加到檔案的最後面
#HADOOP VARIABLES START export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386 export HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib" #HADOOP VARIABLES END
#source ~/.bashrc <-- 讓新設定的環境變數生效
4.2 編輯 /usr/local/hadoop/etc/hadoop/hadoop-env.sh,修改JAVA_HOME如下
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
4.3 編輯 /usr/local/hadoop/etc/hadoop/core-site.xml,在<configuration>tag中加入以下文字:
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>
4.4 編輯 /usr/local/hadoop/etc/hadoop/yarn-site.xml,在<configuration>tag中加入以下文字:
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
5.5 建立與編輯 /usr/local/hadoop/etc/hadoop/mapred-site.xml
#cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
編輯/usr/local/hadoop/etc/hadoop/mapred-site.xml,<configuration>tag中加入以下文字:
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
5.6 編輯 /usr/local/hadoop/etc/hadoop/hdfs-site.xml
# mkdir -p /usr/local/hadoop_store/hdfs/namenode # mkdir -p /usr/local/hadoop_store/hdfs/datanode
編輯 /usr/local/hadoop/etc/hadoop/hdfs-site.xml,在<configuration>tag中加入以下文字:
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property>
6. Format the New Hadoop Filesystem
以上所有的設定完成後,在啟動新建的Hadoop之前必須先格式化Hadoop Filesystem。
#hdfs namenode -format
7. 啟動 Hadoop
# start-dfs.sh # start-yarn.sh # jps
8. 測試
Access web interfaces:
- Cluster status: http://localhost:8088
- HDFS status: http://localhost:50070
- Secondary NameNode status: http://localhost:50090
Test Hadoop:
# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar TestDFSIO -write # hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar TestDFSIO -clean # hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar pi 2 5
8. 最後,停止Hadoop
#stop-dfs.sh && stop-yarn.sh
Reference:
https://www.digitalocean.com/community/articles/how-to-install-hadoop-on-ubuntu-13-10
http://www.ercoppa.org/Linux-Install-Hadoop-220-on-Ubuntu-Linux-1304-Single-Node-Cluster.htm
沒有留言:
張貼留言