Thursday 10 November 2016

Oozie Installation

Oozie Installation in Ubuntu

Step 1 :- To install the Oozie server package on an Ubuntu and other Debian system:
$ sudo apt-get install oozie  
 
Step 2 :- To install the Oozie client package on an Ubuntu and other Debian system:
$ sudo apt-get install oozie-client

Step 3 :- Configuring which Hadoop Version to Use

To use MRv1(without SSL) :
alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.http.mr1

Step 4 :- Edit /etc/oozie/conf/oozie-env.sh file and make the entry

export CATALINA_BASE=/var/lib/oozie/tomcat-deployment

Step 5 :- Start the Oozie server
$ sudo service oozie start

Step 6 :- Accessing the Oozie Server with the Oozie Client

The Oozie client is a command-line utility that interacts with the Oozie server via the Oozie web-services API.

Use the /usr/bin/oozie script to run the Oozie client.

For example, if you want to invoke the client on the same machine where the Oozie server is running:

$ oozie admin -oozie http://localhost:11000/oozie -status
System mode: NORMAL

To make it convenient to use this utility, set the environment variable OOZIE_URL to point to the URL of the Oozie server. Then you can skip the -oozie option.

For example, if you want to invoke the client on the same machine where the Oozie server is running, set the OOZIE_URL to http://localhost:11000/oozie.

$ export OOZIE_URL=http://localhost:11000/oozie
$ oozie admin -version
Oozie server build version: 4.0.0-cdh5.0.0


Step 7 :- Confiduring MySQL for Oozie

Step 1: Create the Oozie database and Oozie MySQL user.

For example, using the MySQL mysql command-line tool:
$ mysql -u root -p
Enter password: ******
 
mysql> create database oozie;
Query OK, 1 row affected (0.03 sec)
 
mysql>  grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)
 
mysql>  grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
Query OK, 0 rows affected (0.03 sec)
 
mysql> exit
Bye

Step 2: Configure Oozie to use MySQL.

Edit properties in the oozie-site.xml file as follows:
...
    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:mysql://localhost:3306/oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>oozie</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>oozie</value>
    </property>
    ...

Step 3 : Creating the Oozie DatabaseSchema

$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie-create.sql
 
or
 
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run
 

Step 4 : Enabling the Oozie Web Console

To enable Oozie's web console, you must download and add the ExtJS library to the Oozie server. If you have not already done this, proceed as follows.

Step 4.1: Download the Library

Download the ExtJS version 2.2 library from http://archive.cloudera.com/gplextras/misc/ext-2.2.zip and place it a convenient location.

Step 4.2: Install the Library

Extract the ext-2.2.zip file into /var/lib/oozie
$ cd Downloads/
$ sudo cp -avr ext-2.2 /var/lib/oozie/

Step 5 : Installing the Oozie Shared Library in Hadoop HDFS

The Oozie installation bundles the Oozie shared library, which contains all of the necessary JARs to enable workflow jobs to run streaming, DistCp, Pig, Hive, and Sqoop actions.
The Oozie installation bundles two shared libraries, one for MRv1 and one for YARN. Make sure you install the right one for the MapReduce version you are using:
·                     The shared library file for MRv1 is oozie-sharelib-mr1.tar.gz.
·                     The shared library file for YARN is oozie-sharelib-yarn.tar.gz.
 
sudo -u oozie oozie  admin -shareliblist -oozie http://localhost:11000/oozie
sudo service oozie restart

To install the Oozie shared library in Hadoop HDFS in the oozie user home directory

$ sudo -u hdfs hadoop fs -mkdir /user/oozie
$ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
$ sudo oozie-setup sharelib create -fs hdfs://localhost:8020 -locallib /usr/lib/oozie/oozie-sharelib-mr1

add the below line to oozie-site.xml to recognize the shared lib functionality

<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/etc/hadoop/conf</value>
<description>
Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
the relevant Hadoop *-site.xml files. If the path is relative is looked within
the Oozie configuration directory; though the path can be absolute (i.e. to point
to Hadoop client conf/ directories in the local filesystem.
</description>
</property>

Configuring Support for Oozie Uber JARs

An uber JAR is a JAR that contains other JARs with dependencies in a lib/ folder inside the JAR. You can configure the cluster to handle uber JARs properly for the MapReduce action (as long as it does not include any streaming or pipes) by setting the following property in the oozie-site.xmlfile:
...
    <property>
        <name>oozie.action.mapreduce.uber.jar.enable</name>
    <value>true</value>
 
    ...
When this property is set, users can use the oozie.mapreduce.uber.jar configuration property in their MapReduce workflows to notify Oozie that the specified JAR file is an uber JAR.

Configuring Oozie to Run against a Federated Cluster

To run Oozie against a federated HDFS cluster using ViewFS, configure the oozie.service.HadoopAccessorService.supported.filesystems property in oozie-site.xml as follows:
<property>
     <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
     <value>hdfs,viewfs</value>
</property>
 
Trouble shooting
sudo cp mysql-connector-java-5.1.35-bin.jar /var/lib/oozie/

HBase Installation

Installing HBase

We can install HBase in any of the three modes: Standalone mode, Pseudo Distributed mode, and Fully Distributed mode.

Installing HBase in Standalone Mode

Download the latest stable version of HBase form http://www.interior-dsgn.com/apache/hbase/stable/ using “wget” command, and extract it using the tar “zxvf” command. See the following command.
$cd usr/local/
$wget http://www.interior-dsgn.com/apache/hbase/stable/hbase-0.98.8-
hadoop2-bin.tar.gz
$tar -zxvf hbase-0.98.8-hadoop2-bin.tar.gz
Shift to super user mode and move the HBase folder to /usr/local as shown below.
$su
$password: enter your password here
mv hbase-0.99.1/* Hbase/

Configuring HBase in Standalone Mode

Before proceeding with HBase, you have to edit the following files and configure HBase.

hbase-env.sh

Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit JAVA_HOME environment variable and change the existing path to your current JAVA_HOME variable as shown below.
cd /usr/local/Hbase/conf
gedit hbase-env.sh
This will open the env.sh file of HBase. Now replace the existing JAVA_HOME value with your current value as shown below.
export JAVA_HOME=/usr/lib/jvm/java-1.7.0

hbase-site.xml

This is the main configuration file of HBase. Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several files, open the hbase-site.xml file as shown below.
#cd /usr/local/HBase/
#cd conf
# gedit hbase-site.xml
Inside the hbase-site.xml file, you will find the <configuration> and </configuration> tags. Within them, set the HBase directory under the property key with the name “hbase.rootdir” as shown below.
<configuration>
   //Here you have to set the path where you want HBase to store its files.
   <property>
      <name>hbase.rootdir</name>
      <value>file:/home/hadoop/HBase/HFiles</value>
   </property>
           
   //Here you have to set the path where you want HBase to store its built in zookeeper  files.
   <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/home/hadoop/zookeeper</value>
   </property>
</configuration>

With this, the HBase installation and configuration part is successfully complete. We can start HBase by using start-hbase.sh script provided in the bin folder of HBase. For that, open HBase Home Folder and run HBase start script as shown below.
$cd /usr/local/HBase/bin
$./start-hbase.sh
If everything goes well, when you try to run HBase start script, it will prompt you a message saying that HBase has started.
starting master, logging to /usr/local/HBase/bin/../logs/hbase-tpmaster-localhost.localdomain.out


Installing HBase in Pseudo-Distributed Mode


Let us now check how HBase is installed in pseudo-distributed mode.

CONFIGURING HBASE

Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a remote system and make sure they are running. Stop HBase if it is running.
hbase-site.xml
Edit hbase-site.xml file to add the following properties.
<property>
   <name>hbase.cluster.distributed</name>
   <value>true</value>
</property>

It will mention in which mode HBase should be run. In the same file from the local file system, change the hbase.rootdir, your HDFS instance address, using the hdfs://// URI syntax. We are running HDFS on the localhost at port 8030.
<property>
   <name>hbase.rootdir</name>
   <value>hdfs://localhost:8030/hbase</value>
</property>

Starting HBase

After configuration is over, browse to HBase home folder and start HBase using the following command.
$cd /usr/local/HBase
$bin/start-hbase.sh
Note: Before starting HBase, make sure Hadoop is running.

Checking the HBase Directory in HDFS

HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the following command.
$ ./bin/hadoop fs -ls /hbase
If everything goes well, it will give you the following output.
Found 7 items
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs

Starting and Stopping a Master


Using the “local-master-backup.sh” you can start up to 10 servers. Open the home folder of HBase, master and execute the following command to start it.
$ ./bin/local-master-backup.sh 2 4
To kill a backup master, you need its process id, which will be stored in a file named “/tmp/hbase-USER-X-master.pid.” you can kill the backup master using the following command.
$ cat /tmp/hbase-user-1-master.pid |xargs kill -9

Starting and Stopping RegionServers

You can run multiple region servers from a single system using the following command.
$ .bin/local-regionservers.sh start 2 3
To stop a region server, use the following command.
$ .bin/local-regionservers.sh stop 3

Starting HbaseShell

After Installing HBase successfully, you can start HBase Shell. Below given are the sequence of steps that are to be followed to start the HBase shell. Open the terminal, and login as super user.

Start Hadoop File System

Browse through Hadoop home sbin folder and start Hadoop file system as shown below.
$cd $HADOOP_HOME/sbin
$start-all.sh

Start HBase

Browse through the HBase root directory bin folder and start HBase.
$cd /usr/local/HBase
$./bin/start-hbase.sh

Start HBase Master Server

This will be the same directory. Start it as shown below.
$./bin/local-master-backup.sh start 2 (number signifies specific
server.)

Start Region

Start the region server as shown below.
$./bin/./local-regionservers.sh start 3

Start HBase Shell

You can start HBase shell using the following command.

$cd bin
$./hbase shell

This will give you the HBase Shell Prompt as shown below.
2016-12-09 14:24:27,526 INFO [main] Configuration.deprecation:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri
Nov 14 18:26:29 PST 2016

hbase(main):001:0>

Wednesday 12 October 2016

SQOOP INSTALLATION

SQOOP  INSTALLATION

Step 1: Installing Sqoop

The following commands are used to extract the Sqoop tar ball and move it to “/usr/lib/sqoop” directory.
$tar -xvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
$ su
password:
# mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha /usr/lib/sqoop
#exit

Step 2: Configuring bashrc

You have to set up the Sqoop environment by appending the following lines to ~/.bashrc file:
#Sqoop
export SQOOP_HOME=/usr/lib/sqoop export PATH=$PATH:$SQOOP_HOME/bin
The following command is used to execute ~/.bashrc file.
$ source ~/.bashrc

Step 3: Configuring Sqoop

To configure Sqoop with Hadoop, you need to edit the sqoop-env.sh file, which is placed in the $SQOOP_HOME/conf directory. First of all, Redirect to Sqoop config directory and copy the template file using the following command:
$ cd $SQOOP_HOME/conf
$ mv sqoop-env-template.sh sqoop-env.sh
Open sqoop-env.sh and edit the following lines:
export HADOOP_COMMON_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=/usr/local/hadoop

Step 4: Download and Configure mysql-connector-java

We can download mysql-connector-java-5.1.30.tar.gz file from the following link.
The following commands are used to extract mysql-connector-java tarball and move mysql-connector-java-5.1.30-bin.jar to /usr/lib/sqoop/lib directory.
$ tar -zxf mysql-connector-java-5.1.30.tar.gz
$ su
password:

# cd mysql-connector-java-5.1.30
# mv mysql-connector-java-5.1.30-bin.jar /usr/lib/sqoop/lib

Step 5: Verifying Sqoop

The following command is used to verify the Sqoop version.
$ cd $SQOOP_HOME/bin
$ sqoop-version
Expected output:
15/12/17 14:52:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
Sqoop 1.4.5 git commit id 5b34accaca7de251fc91161733f906af2eddbe83
Compiled by abe on Fri Dec 1 11:19:26 PDT 2015
 Sqoop installation is complete.

Monday 19 September 2016

WordCount Program Map Reducer

WordCount Program



Step 1:
Open Eclipse IDE  ( download from http://www.eclipse.org/downloads/ ) and create a new project with 3 class files - WordCount.java , WordCountMapper.java and WordCountReducer.java
Step 2:
Open WordCount.java and paste the following code.
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;


public class WordCount extends Configured implements Tool{
      public int run(String[] args) throws Exception
      {
            //creating a JobConf object and assigning a job name for identification purposes
            JobConf conf = new JobConf(getConf(), WordCount.class);
            conf.setJobName("WordCount");

            //Setting configuration object with the Data Type of output Key and Value
            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);

            //Providing the mapper and reducer class names
            conf.setMapperClass(WordCountMapper.class);
            conf.setReducerClass(WordCountReducer.class);
            //We wil give 2 arguments at the run time, one in input path and other is output path
            Path inp = new Path(args[0]);
            Path out = new Path(args[1]);
            //the hdfs input and output directory to be fetched from the command line
            FileInputFormat.addInputPath(conf, inp);
            FileOutputFormat.setOutputPath(conf, out);

            JobClient.runJob(conf);
            return 0;
      }
   
      public static void main(String[] args) throws Exception
      {
            // this main function will call run method defined above.
        int res = ToolRunner.run(new Configuration(), new WordCount(),args);
            System.exit(res);
      }
}
Step 3:
 Open WordCountMapper.java and paste the following code.
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
      //hadoop supported data types
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();
   
      //map method that performs the tokenizer job and framing the initial key value pairs
      // after all lines are converted into key-value pairs, reducer is called.
      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
      {
            //taking one line at a time from input file and tokenizing the same
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
       
          //iterating through all the words available in that line and forming the key value pair
            while (tokenizer.hasMoreTokens())
            {
               word.set(tokenizer.nextToken());
               //sending to output collector which inturn passes the same to reducer
                 output.collect(word, one);
            }
       }
}
Step 4:
 Open WordCountReducer.java and paste the following code.


import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>
{
      //reduce method accepts the Key Value pairs from mappers, do the aggregation based on keys and produce the final out put
      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
      {
            int sum = 0;
            /*iterates through all the values available with a key and add them together and give the
            final result as the key and sum of its values*/
          while (values.hasNext())
          {
               sum += values.next().get();
          }
          output.collect(key, new IntWritable(sum));
      }
}

you need to remove dependencies by
adding jar files in the hadoop source
folder. Now Click on Project tab and go to Properties.Under
Libraries tab, click Add External JARs and select all the jars in the folder
(click on 1st jar, and Press Shift and Click on last jar
to select all jars in between and click ok) /home/radha/radha/hadoop-1.21/common and /home/radha/radha/hadoop1.2.1/share/hadoop/mapreduce folders.

Step 6:


Now Click on
the Run tab and click Run-Configurations. Click on New Configuration button on
the left-top side and Apply after filling the following properties.
Name - Any name will do - Ex:
     WordCountConfig

     Project - Browse and select
     your project
     Main Class - Select
     WordCount.java - this is our main class

     Step 7:


 Now click on File tab and select
Export. under Java, select Runnable Jar.

In Launch Config - select the config
     fie you created in Step 6  (WordCountConfig).
   
     Select an export destination
     ( lets say
     desktop.)

     Under Library handling,
     select Extract Required Libraries into generated JAR and click Finish.

     Right-Click the jar file, go
     to Properties and under Permissions tab, Check Allow executing file
     as a program. and give Read and Write access to
     all the users.

Wednesday 14 September 2016

FLUME INSTALLATION

  FLUME INSTALLATION

  My configuration is Apache Flume 1.6.0 on machine with Ubuntu 14.04 and Apache Hadoop 1.2.1 and location of Hadoop is /usr/local/Hadoop

Step 1: Download latest Flume release from Apache Website.

Step 2: Move/Copy it to the location you want to install, in my case it is “/usr/local/”.
              1.$cd Downloads/              2.$sudo cp apache-flume-1.6.0-bin.tar.gz /usr/local/

Step 3: Extract the tar file. Go to the copied folder, in my case it is “/usr/local/” , run the below   commands
          $ cd /usr/local
          $ sudo tar -xzvf apache-flume-1.6.0-bin.tar.gz

Step 4: Rename folder from “apache-flume-1.6.0-bin” to “flume” for simplicity.
          $ sudo mv apache-flume-1.6.0-bin flume

Step 5:  Update environments
           $ gedit ~/.bashrc
       Add Below Lines:
              export FLUME_HOME=/usr/local/flume
              export FLUME_CONF_DIR=$FLUME_HOME/conf
              export FLUME_CLASSPATH=$FLUME_CONF_DIR
              export PATH=$PATH:$FLUME_HOME/bin 

Step 6: Change owner to user and group, in my case it is user:hduser and group:hadoop
           $ sudo chown -R hduser:hadoop /usr/local/flume

Step 7: Rename “flume-env.sh.template” to “flume-env.sh” and write the below values.
       $ sudo mv /usr/local/flume/conf/flume-env.sh.template flume-env.sh
        $ gedit /usr/local/flume/conf/flume-env.sh

        Add below line(please use your installed java version)
           $JAVA_OPTS="-Xms500m -Xmx1000m -Dcom.sun.management.jmxremote"
            export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
      
  How to check java version:
     echo $JAVA_HOME

Step 8: Enter as hduser and run flume CLI: 
            $ flume-ng --help 

Saturday 10 September 2016

PIG LATIN INSTALLATION

       PIG LATIN INSTALLATION


1. Download a Pig release from the following link
http://mirror.wanxp.id/apache/pig/

2.Enter into the directory where the stable version is downloaded. By default it downloads in  Downloads directory.
$ cd Downloads/

3.Unzip the tar file.
$ tar -xvf pig-0.11.1.tar.gz

4.Create directory
$ sudo mkdir /usr/lib/pig

5.move pig-0.16.0 to pig
$ sudo mv pig-0.16.0 /usr/lib/pig

6.Set the PIG_HOME path in bashrc file.
$ gedit ~/.bashrc

In bashrc file append the below 2 statements
export PIG_HOME=/usr/lib/pig/pig-0.16.0
export PATH=$PATH:$PIG_HOME/bin

Save and exit.
          $ source .bashrc

7.Now lets test the installation On the command prompt type
$ pig -h

8.Starting pig in mapreduce mode


$ pig