Hadoop/HBase

Hadoop

Prerequisites

Hadoop defaults to Sun (Oracle) version of java, which is not provided in Debian repositories.

apt-get install ssh openjdk-6-jre-headless rsync

Download

Download the latest version of the current stable release of Hadoop from one of the mirrors.

http://www.apache.org/dyn/closer.cgi/hadoop/common/

Install

dpkg -i hadoop_1.0.3-1_x86_64.deb

Configuration

Java

Edit the /etc/hadoop/hadoop-env.sh and set JAVA_HOME

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk

Also set the maximum heap size, the default is too low to even run the examples

export HADOOP_CLIENT_OPTS="-Xmx1g $HADOOP_CLIENT_OPTS"

Run an example job to confirm that Java is correctly configured.

$ mkdir input 
$ cp /etc/hadoop/*.xml input 
$ hadoop jar /usr/share/hadoop/hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 
$ cat output/*

Configuration file

core-site.xml

core-site.xml goes here

hdfs-site.xml

hdfs-site.xml goes here

mapred-site.xml

mapred-site.xml goes here

User account

Create the hadoop group.

groupadd hadoop

Create the hadoop user. Standard password.

useradd -g hadoop -s /bin/bash -d /home/hadoop hadoop

Create the home directory

mkdir /home/hadoop
chown 770 hadoop:hadoop /home/hadoop

SSH

Hadoop uses SSH to communicate and does not use passwords. Set up certificates as the hadoop user.

ssh-keygen -t rsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Test by logging into localhost over SSH

ssh localhost

Also, copy the key to other nodes in the cluster.

ssh-copy-id h1.local

Format Disks

Use gparted(it can align partitions automatically) to format the disks. Make the partition tables gpt format. Label the disks appropriately.

Add entries to fstab, use the UUID as the device, mount at /mnt/disk2, /mnt/disk3, etc...

fstab

Use blkid to get the UUID for the newly formatted disk.

Create and entry in fstab. Use the noatime option improves performance by not updating the time of last modification.

# Disk 2
UUID=abc123... /mnt/disk2 xfs noatime 0 0

Create directories

This needs to be done for each disk in the node.

cd /mnt
for i in `ls`; do sudo mkdir -p "$i"/dfs/data ; done
for i in `ls`; do sudo mkdir -p "$i"/dfs/name ; done
for i in `ls`; do sudo mkdir -p "$i"/dfs/namesecondary ; done
for i in `ls`; do sudo mkdir -p "$i"/mapred/local ; done

for i in `ls`; do sudo chown -R hadoop:hadoop "$i"/dfs ; done
for i in `ls`; do sudo chown -R hadoop:hadoop "$i"/mapred ; done

mkdir /mnt/disk2/dfs/name
mkdir /mnt/disk2/dfs/data
mkdir /mnt/disk2/dfs/namesecondary
mkdir /mnt/disk2/mapred/local

Set permissions

chown -R hadoop:hadoop /mnt/disk2/dfs
chown -R hadoop:hadoop /mnt/disk2/mapred

Format namenode

First, become hadoop user.

su - hadoop

hadoop namenode -format

Controlling the daemons

Starting daemons

/usr/sbin/start-dfs.sh
/usr/sbin/start-mapred.sh

Verify

Verify that they have started with netstat

netstat -tulnp

Stopping daemons

/usr/sbin/stop-dfs.sh
/usr/sbin/stop-mapred.sh

Firewall

HBase

Download

Select a mirror at the page below.

http://www.apache.org/dyn/closer.cgi/hbase/

Install

cd /usr/local
sudo tar -xf /home/unique/Download/hbase-x.y.z.tar.gz

Set permissions

sudo chown -R hadoop:hadoop /usr/local/hbase-x.y.z

Configure

JAR

Use the Hadoop JAR provided by Hadoop rather than the one provided with HBase.

cd /usr/local/hbase-x.y.z/lib
sudo mv hadoop-core-x.y.x.jar hadoop-core-x.y.x.jar.old
sudo cp /usr/share/hadoop/hadoop-core-x.y.z.jar

DO NOT USE THE JAR INCLUDED WITH HBASE IN PRODUCTION. IT IS ONLY SUITABLE FOR STANDALONE MODE.

Java

Set JAVA_HOME in conf/hase-env.sh

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk

Configuration files

hbase-site.xml

Running HBase

Become the hadoop user. Change to the HBase root directory. Run HBase start script

su - hadoop
cd /usr/local/hbase-x.y.z
./bin/start-hbase.sh

The web interface will be at localhost:60010 if HBase started successfully.

Stop HBase

HBase can take some time to stop.

bin/stop-hbase.sh

MyWiki: Hadoop (last edited 2012-11-03 01:52:47 by LukaszSzybalski)