Environment:
Node1&2 : Ubuntu 12.04.3 64bit
Controller : Ubuntu 12.04.3 32bit
Java ver. : Java 7
Hadoop ver. : hadoop-1.2.1.tar.gz
三台機器隨著離開研究所以後…還回去了XD
先暫停此計畫
2013-12-26
兩台Node安裝好Ubuntu 12.04.3 64bit,開SSH連線(22),不限port,正在思考控制端該用哪一台電腦。
[@@@使Ubuntu能透過SSH登入]
安裝SSH方法:
$ sudo apt-get install ssh
打開SSH port (22):
$ sudo gedit /etc/ssh/ssh_config
找到
# Port 22
把#去掉即可
這樣子的狀態是可以被任何IP位置透過SSH方式連入。
—
2013-12-28
[@@@限制SSH的IP登入位置]
首先設定了SSH只能透過特定port連入。
$ sudo gedit /etc/hosts.allow
在最後一行加入
ALL:
在/etc資料夾下有兩個檔案:hosts.allow和hosts.deny,前者是用來設定哪些IP可以登入這台電腦,後者則是用來設定哪些IP不能登入這台電腦。
接著開始設定Hadoop前置作業。
[@@@安裝Java 7]
根據Hadoop官方wiki,Ubuntu 12.04 及 Java 7是可以被接受的
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
[@@@新增Hadoop專屬使用者帳號]
$ sudo addgroup hadoop
$ sudo adduser –ingroup hadoop hduser
[@@@設定SSH Public Key]
user@ubuntu:~$ su – hduser
hduser@ubuntu:~$ ssh-keygen -t rsa -P “”
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory ‘/home/hduser/.ssh’.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
The key’s randomart image is:
[…略…]
hduser@ubuntu:~$
hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
hduser@ubuntu:~$ ssh localhost
The authenticity of host ‘localhost (::1)’ can’t be established.
RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘localhost’ (RSA) to the list of known hosts.
Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux
Ubuntu 10.04 LTS
[…再略…]
hduser@ubuntu:~$
此時可離開了。
[@@@取消IPv6]
因為沒用到又怕干擾,先關掉全機IPv6。
$ sudo vi /etc/sysctl.conf
在最後一行加入
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
:wq儲存離開後重新開機
$ sudo shutdown -r now
—
2013-12-30
首先我想要說的第一件事情是
我真的覺得太扯,學校鎖HDFS幹嘛…
[@@@Hadoop下載]
原本抓好hadoop-2.2.0.tar.gz準備開始動作,結果2.2.0裡面竟然沒有conf,整個傻眼,只好改抓hadoop-1.2.1.tar.gz,其實很常遇到sudo的問題,要注意哪些動作屬於哪個使用者。
根據範例我們把hadoop放在 /usr/local 操作
$ cd /usr/local
$ sudo wget http://apache.cdpa.nsysu.edu.tw/hadoop/core/stable1/hadoop-1.2.1.tar.gz
$ sudo tar xzf hadoop-1.2.1.tar.gz
$ sudo mv hadoop-1.2.1 hadoop
$ sudo chown -R hduser:hadoop hadoop
[@@@更新.bashrc]
注意這邊是更新hduser的.bashrc
# Set Hadoop-related environment variables export HADOOP_HOME=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs=”hadoop fs”
unalias hls &> /dev/null
alias hls=”fs -ls”
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed ‘lzop’ command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
[@@@調校]
除了製作tmp要sudo,其他還是調hduser的部分,直接vi即可
1. /usr/local/hadoop/conf/hadoop-env.sh
把
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
改成
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
2. 製作tmp
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
$ sudo chmod 750 /app/hadoop/tmp
接下來要橋XML檔參數,以下參數都要加入在 … 之間
3. conf/core-site.xml
hadoop.tmp.dir /app/hadoop/tmp A base for other temporary directories.
fs.default.name
hdfs://localhost:54310
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.
4. conf/mapred-site.xml
mapred.job.tracker localhost:54311 The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
5. conf/hdfs-site.xml
dfs.replication 1 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
[@@@Hadoop第一步: Format]
接著的動作都是在hduser底下了
$ /usr/local/hadoop/bin/hadoop namenode -format
訊息大概長這樣
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format 10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ubuntu/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ 10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop 10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup 10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true 10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds. 10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted. 10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1 ************************************************************/ hduser@ubuntu:/usr/local/hadoop$
有”10/05/08 16:59:57 INFO common.Storage: Storage directory …/hadoop-hduser/dfs/name has been successfully formatted.”這串即代表成功
[@@@Hadoop啟動與停止]
啟動語法如下
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh
訊息大概長這樣
hduser@ubuntu:/usr/local/hadoop$ bin/start-all.sh starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-ubuntu.out localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-ubuntu.out localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-ubuntu.out starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-jobtracker-ubuntu.out localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-ubuntu.out hduser@ubuntu:/usr/local/hadoop$
可以鍵入jps來查詢是否成功
hduser@ubuntu:/usr/local/hadoop$ jps 2287 TaskTracker 2149 JobTracker 1938 DataNode 2085 SecondaryNameNode 2349 Jps 1788 NameNode
測試停止Hadoop
hduser@ubuntu:/usr/local/hadoop$ bin/stop-all.sh stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode hduser@ubuntu:/usr/local/hadoop$
[@@@測試MapReduce]
分別從下面三個網站下載要測試的電子書,選Plain Text UTF-8格式
http://www.gutenberg.org/ebooks/20417
http://www.gutenberg.org/ebooks/5000
http://www.gutenberg.org/ebooks/4300
儲存在 /tmp/gutenberg
然後再次打開Hadoop
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh
跑MapReduce之前要先把測試資料弄進HDFS裡
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser Found 1 items drwxr-xr-x - hduser supergroup 0 2010-05-08 17:40 /user/hduser/gutenberg hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser/gutenberg Found 3 items -rw-r--r-- 3 hduser supergroup 674566 2011-03-10 11:38 /user/hduser/gutenberg/pg20417.txt -rw-r--r-- 3 hduser supergroup 1573112 2011-03-10 11:38 /user/hduser/gutenberg/pg4300.txt -rw-r--r-- 3 hduser supergroup 1423801 2011-03-10 11:38 /user/hduser/gutenberg/pg5000.txt hduser@ubuntu:/usr/local/hadoop$
執行MapReduce任務
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
訊息可能長這樣
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output 10/05/08 17:43:00 INFO input.FileInputFormat: Total input paths to process : 3 10/05/08 17:43:01 INFO mapred.JobClient: Running job: job_201005081732_0001 10/05/08 17:43:02 INFO mapred.JobClient: map 0% reduce 0% 10/05/08 17:43:14 INFO mapred.JobClient: map 66% reduce 0% 10/05/08 17:43:17 INFO mapred.JobClient: map 100% reduce 0% 10/05/08 17:43:26 INFO mapred.JobClient: map 100% reduce 100% 10/05/08 17:43:28 INFO mapred.JobClient: Job complete: job_201005081732_0001 10/05/08 17:43:28 INFO mapred.JobClient: Counters: 17 10/05/08 17:43:28 INFO mapred.JobClient: Job Counters 10/05/08 17:43:28 INFO mapred.JobClient: Launched reduce tasks=1 10/05/08 17:43:28 INFO mapred.JobClient: Launched map tasks=3 10/05/08 17:43:28 INFO mapred.JobClient: Data-local map tasks=3 10/05/08 17:43:28 INFO mapred.JobClient: FileSystemCounters 10/05/08 17:43:28 INFO mapred.JobClient: FILE_BYTES_READ=2214026 10/05/08 17:43:28 INFO mapred.JobClient: HDFS_BYTES_READ=3639512 10/05/08 17:43:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3687918 10/05/08 17:43:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=880330 10/05/08 17:43:28 INFO mapred.JobClient: Map-Reduce Framework 10/05/08 17:43:28 INFO mapred.JobClient: Reduce input groups=82290 10/05/08 17:43:28 INFO mapred.JobClient: Combine output records=102286 10/05/08 17:43:28 INFO mapred.JobClient: Map input records=77934 10/05/08 17:43:28 INFO mapred.JobClient: Reduce shuffle bytes=1473796 10/05/08 17:43:28 INFO mapred.JobClient: Reduce output records=82290 10/05/08 17:43:28 INFO mapred.JobClient: Spilled Records=255874 10/05/08 17:43:28 INFO mapred.JobClient: Map output bytes=6076267 10/05/08 17:43:28 INFO mapred.JobClient: Combine input records=629187 10/05/08 17:43:28 INFO mapred.JobClient: Map output records=629187 10/05/08 17:43:28 INFO mapred.JobClient: Reduce input records=102286
檢查有沒有成功輸出log
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser Found 2 items drwxr-xr-x - hduser supergroup 0 2010-05-08 17:40 /user/hduser/gutenberg drwxr-xr-x - hduser supergroup 0 2010-05-08 17:43 /user/hduser/gutenberg-output hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser/gutenberg-output Found 2 items drwxr-xr-x - hduser supergroup 0 2010-05-08 17:43 /user/hduser/gutenberg-output/_logs -rw-r--r-- 1 hduser supergroup 880802 2010-05-08 17:43 /user/hduser/gutenberg-output/part-r-00000 hduser@ubuntu:/usr/local/hadoop$
反譯
hduser@ubuntu:/usr/local/hadoop$ mkdir /tmp/gutenberg-output hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -getmerge /user/hduser/gutenberg-output /tmp/gutenberg-output hduser@ubuntu:/usr/local/hadoop$ head /tmp/gutenberg-output/gutenberg-output "(Lo)cra" 1 "1490 1 "1498," 1 "35" 1 "40," 1 "A 2 "AS-IS". 1 "A_ 1 "Absoluti 1 "Alack! 1 hduser@ubuntu:/usr/local/hadoop$
[@@@Hadoop Web UI]
http://localhost:50070/ –> web UI of the NameNode daemon
http://localhost:50030/ –> web UI of the JobTracker daemon
http://localhost:50060/ –> web UI of the TaskTracker daemon
—
<>
//使用hduserr中(hduserl尚未實作 conf/core-site.xml)
————————————–參考文獻————————————–
[1] Running Hadoop on Ubuntu Linux (Multi-Node Cluster) – : http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
[2] Running Hadoop on Ubuntu Linux (Single-Node Cluster) – : http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
[3] Ubuntu作業系統教學─如何設定SSH遠端連線功能 – : http://ipzoner.pixnet.net/blog/post/23520297-ubuntu%E4%BD%9C%E6%A5%AD%E7%B3%BB%E7%B5%B1%E6%95%99%E5%AD%B8%E2%94%80%E5%A6%82%E4%BD%95%E8%A8%AD%E5%AE%9Assh%E9%81%A0%E7%AB%AF%E9%80%A3%E7%B7%9A%E5%8A%9F%E8%83%BD
[4] How To Install Oracle (Sun) Java JDK & JRE in Ubuntu via PPA – : http://community.linuxmint.com/tutorial/view/1414
[5] Hadoop Java Versions – : http://wiki.apache.org/hadoop/HadoopJavaVersions
[6] shutdown與reboot用起來 不同嗎? – : http://www.ubuntu-tw.org/modules/newbb/viewtopic.php?viewmode=flat&order=ASC&topic_id=8335&forum=2&move=prev
[7] vim 程式編輯器 – : http://linux.vbird.org/linux_basic/0310vi.php
[8] 如何利用WGET下載檔案,並儲存到指定的目錄 – : http://www.inote.tw/2009/06/wget.html
[9] Warning: $HADOOP_HOME is deprecated. hadoop1.0.4解决方法 – : http://chenzhou123520.iteye.com/blog/1826002
[10] WARN snappy.LoadSnappy: Snappy native library not loaded – : http://stackoverflow.com/questions/10878038/warn-snappy-loadsnappy-snappy-native-library-not-loaded
[11]
搶先發佈留言