BIG DATA: November 2016

Thursday 10 November 2016

Oozie Installation

Oozie Installation in Ubuntu

Step 1 :- To install the Oozie server package on an Ubuntu and other Debian system:

$ sudo apt-get install oozie

Step 2 :- To install the Oozie client package on an Ubuntu and other Debian system:

$ sudo apt-get install oozie-client

Step 3 :- Configuring which Hadoop Version to Use

To use MRv1(without SSL) :

alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.http.mr1

Step 4 :- Edit /etc/oozie/conf/oozie-env.sh file and make the entry

export CATALINA_BASE=/var/lib/oozie/tomcat-deployment

Step 5 :- Start the Oozie server

$ sudo service oozie start

Step 6 :- Accessing the Oozie Server with the Oozie Client

The Oozie client is a command-line utility that interacts with the Oozie server via the Oozie web-services API.

Use the /usr/bin/oozie script to run the Oozie client.

For example, if you want to invoke the client on the same machine where the Oozie server is running:

$ oozie admin -oozie http://localhost:11000/oozie -status
System mode: NORMAL

To make it convenient to use this utility, set the environment variable OOZIE_URL to point to the URL of the Oozie server. Then you can skip the -oozie option.

For example, if you want to invoke the client on the same machine where the Oozie server is running, set the OOZIE_URL to http://localhost:11000/oozie.

$ export OOZIE_URL=http://localhost:11000/oozie
$ oozie admin -version
Oozie server build version: 4.0.0-cdh5.0.0

Step 7 :- Confiduring MySQL for Oozie

Step 1: Create the Oozie database and Oozie MySQL user.

For example, using the MySQL mysql command-line tool:

$ mysql -u root -p

Enter password: ******

mysql> create database oozie;

Query OK, 1 row affected (0.03 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';

Query OK, 0 rows affected (0.03 sec)

mysql>  grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';

Query OK, 0 rows affected (0.03 sec)

mysql> exit

Bye

Step 2: Configure Oozie to use MySQL.

Edit properties in the oozie-site.xml file as follows:

...

    <property>

        <name>oozie.service.JPAService.jdbc.driver</name>

        <value>com.mysql.jdbc.Driver</value>

    </property>

    <property>

        <name>oozie.service.JPAService.jdbc.url</name>

        <value>jdbc:mysql://localhost:3306/oozie</value>

    </property>

    <property>

        <name>oozie.service.JPAService.jdbc.username</name>

        <value>oozie</value>

    </property>

    <property>

        <name>oozie.service.JPAService.jdbc.password</name>

        <value>oozie</value>

    </property>

...

Step 3 : Creating the Oozie DatabaseSchema

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie-create.sql

or

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run

Step 4 : Enabling the Oozie Web Console

To enable Oozie's web console, you must download and add the ExtJS library to the Oozie server. If you have not already done this, proceed as follows.

Step 4.1: Download the Library

Download the ExtJS version 2.2 library from http://archive.cloudera.com/gplextras/misc/ext-2.2.zip and place it a convenient location.

Step 4.2: Install the Library

Extract the ext-2.2.zip file into /var/lib/oozie.

$ cd Downloads/

$ sudo cp -avr ext-2.2 /var/lib/oozie/

Step 5 : Installing the Oozie Shared Library in Hadoop HDFS

The Oozie installation bundles the Oozie shared library, which contains all of the necessary JARs to enable workflow jobs to run streaming, DistCp, Pig, Hive, and Sqoop actions.

The Oozie installation bundles two shared libraries, one for MRv1 and one for YARN. Make sure you install the right one for the MapReduce version you are using:

· The shared library file for MRv1 is oozie-sharelib-mr1.tar.gz.

· The shared library file for YARN is oozie-sharelib-yarn.tar.gz.

sudo

-u oozie oozie  admin -shareliblist -oozie
http://localhost:11000/oozie

sudo service oozie restart

To install the Oozie shared library in Hadoop HDFS in the oozie user home directory

$ sudo -u hdfs hadoop fs -mkdir /user/oozie
$ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
$ sudo oozie-setup sharelib create -fs hdfs://localhost:8020 -locallib /usr/lib/oozie/oozie-sharelib-mr1

add the below line to oozie-site.xml to recognize the shared lib functionality

<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/etc/hadoop/conf</value>
<description>
Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
the relevant Hadoop *-site.xml files. If the path is relative is looked within
the Oozie configuration directory; though the path can be absolute (i.e. to point
to Hadoop client conf/ directories in the local filesystem.
</description>
</property>

Configuring Support for Oozie Uber JARs

An uber JAR is a JAR that contains other JARs with dependencies in a lib/ folder inside the JAR. You can configure the cluster to handle uber JARs properly for the MapReduce action (as long as it does not include any streaming or pipes) by setting the following property in the oozie-site.xmlfile:

...

    <property>

        <name>oozie.action.mapreduce.uber.jar.enable</name>

    <value>true</value>

...

When this property is set, users can use the oozie.mapreduce.uber.jar configuration property in their MapReduce workflows to notify Oozie that the specified JAR file is an uber JAR.

Configuring Oozie to Run against a Federated Cluster

To run Oozie against a federated HDFS cluster using ViewFS, configure the oozie.service.HadoopAccessorService.supported.filesystems property in oozie-site.xml as follows:

<property>

     <name>oozie.service.HadoopAccessorService.supported.filesystems</name>

     <value>hdfs,viewfs</value>

</property>

Trouble shooting
sudo cp mysql-connector-java-5.1.35-bin.jar /var/lib/oozie/

HBase Installation

Installing HBase

We can install HBase in any of the three modes: Standalone mode, Pseudo Distributed mode, and Fully Distributed mode.

Installing HBase in Standalone Mode

Download the latest stable version of HBase form http://www.interior-dsgn.com/apache/hbase/stable/ using “wget” command, and extract it using the tar “zxvf” command. See the following command.

$cd usr/local/

$wget http://www.interior-dsgn.com/apache/hbase/stable/hbase-0.98.8-

hadoop2-bin.tar.gz

$tar -zxvf hbase-0.98.8-hadoop2-bin.tar.gz

Shift to super user mode and move the HBase folder to /usr/local as shown below.

$su

$password: enter your password here

**mv hbase-0.99.1/* Hbase/**

Configuring HBase in Standalone Mode

Before proceeding with HBase, you have to edit the following files and configure HBase.

hbase-env.sh

Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit JAVA_HOME environment variable and change the existing path to your current JAVA_HOME variable as shown below.

cd /usr/local/Hbase/conf

gedit hbase-env.sh

This will open the env.sh file of HBase. Now replace the existing JAVA_HOME value with your current value as shown below.

export JAVA_HOME=/usr/lib/jvm/java-1.7.0

hbase-site.xml

This is the main configuration file of HBase. Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several files, open the hbase-site.xml file as shown below.

#cd /usr/local/HBase/

#cd conf

# gedit hbase-site.xml

Inside the hbase-site.xml file, you will find the <configuration> and </configuration> tags. Within them, set the HBase directory under the property key with the name “hbase.rootdir” as shown below.

<configuration>

   //Here you have to set the path where you want HBase to store its files.

   <property>

      <name>hbase.rootdir</name>

      <value>file:/home/hadoop/HBase/HFiles</value>

   </property>



   //Here you have to set the path where you want HBase to store its built in zookeeper files.

   <property>

      <name>hbase.zookeeper.property.dataDir</name>

      <value>/home/hadoop/zookeeper</value>

   </property>

</configuration>

With this, the HBase installation and configuration part is successfully complete. We can start HBase by using start-hbase.sh script provided in the bin folder of HBase. For that, open HBase Home Folder and run HBase start script as shown below.

$cd /usr/local/HBase/bin

$./start-hbase.sh

If everything goes well, when you try to run HBase start script, it will prompt you a message saying that HBase has started.

starting master, logging to /usr/local/HBase/bin/../logs/hbase-tpmaster-localhost.localdomain.out

Installing HBase in Pseudo-Distributed Mode

Let us now check how HBase is installed in pseudo-distributed mode.

CONFIGURING HBASE

Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a remote system and make sure they are running. Stop HBase if it is running.

hbase-site.xml

Edit hbase-site.xml file to add the following properties.

<property>

<name>hbase.cluster.distributed</name>

   <value>true</value>

</property>

It will mention in which mode HBase should be run. In the same file from the local file system, change the hbase.rootdir, your HDFS instance address, using the hdfs://// URI syntax. We are running HDFS on the localhost at port 8030.

<property>

   <name>hbase.rootdir</name>

   <value>hdfs://localhost:8030/hbase</value>

</property>

Starting HBase

After configuration is over, browse to HBase home folder and start HBase using the following command.

$cd /usr/local/HBase

$bin/start-hbase.sh

Note: Before starting HBase, make sure Hadoop is running.

Checking the HBase Directory in HDFS

HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the following command.

$ ./bin/hadoop fs -ls /hbase

If everything goes well, it will give you the following output.

Found 7 items

drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp

drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs

drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt

drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data

-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id

-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version

drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs

Starting and Stopping a Master

Using the “local-master-backup.sh” you can start up to 10 servers. Open the home folder of HBase, master and execute the following command to start it.

$ ./bin/local-master-backup.sh 2 4

To kill a backup master, you need its process id, which will be stored in a file named “/tmp/hbase-USER-X-master.pid.” you can kill the backup master using the following command.

$ cat /tmp/hbase-user-1-master.pid |xargs kill -9

Starting and Stopping RegionServers

You can run multiple region servers from a single system using the following command.

$ .bin/local-regionservers.sh start 2 3

To stop a region server, use the following command.

$ .bin/local-regionservers.sh stop 3

Starting HbaseShell

After Installing HBase successfully, you can start HBase Shell. Below given are the sequence of steps that are to be followed to start the HBase shell. Open the terminal, and login as super user.

Start Hadoop File System

Browse through Hadoop home sbin folder and start Hadoop file system as shown below.

$cd $HADOOP_HOME/sbin

$start-all.sh

Start HBase

Browse through the HBase root directory bin folder and start HBase.

$cd /usr/local/HBase

$./bin/start-hbase.sh

Start HBase Master Server

This will be the same directory. Start it as shown below.

$./bin/local-master-backup.sh start 2 (number signifies specific

server.)

Start Region

Start the region server as shown below.

$./bin/./local-regionservers.sh start 3

Start HBase Shell

You can start HBase shell using the following command.

$cd bin

$./hbase shell

This will give you the HBase Shell Prompt as shown below.

2016-12-09 14:24:27,526 INFO [main] Configuration.deprecation:

hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri

Nov 14 18:26:29 PST 2016

hbase(main):001:0>