WordCount Program Map Reducer
WordCount Program
Step 1:
Open Eclipse IDE ( download from http://www.eclipse.org/downloads/ ) and create a new project with 3 class files - WordCount.java , WordCountMapper.java and WordCountReducer.java
Step 2:
Open WordCount.java and paste the following code.
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount extends Configured implements Tool{
public int run(String[] args) throws Exception
{
//creating a JobConf object and assigning a job name for identification purposes
JobConf conf = new JobConf(getConf(), WordCount.class);
conf.setJobName("WordCount");
//Setting configuration object with the Data Type of output Key and Value
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
//Providing the mapper and reducer class names
conf.setMapperClass(WordCountMapper.class);
conf.setReducerClass(WordCountReducer.class);
//We wil give 2 arguments at the run time, one in input path and other is output path
Path inp = new Path(args[0]);
Path out = new Path(args[1]);
//the hdfs input and output directory to be fetched from the command line
FileInputFormat.addInputPath(conf, inp);
FileOutputFormat.setOutputPath(conf, out);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception
{
// this main function will call run method defined above.
int res = ToolRunner.run(new Configuration(), new WordCount(),args);
System.exit(res);
}
}
Step 3:
Open WordCountMapper.java and paste the following code.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
//hadoop supported data types
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
//map method that performs the tokenizer job and framing the initial key value pairs
// after all lines are converted into key-value pairs, reducer is called.
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
//taking one line at a time from input file and tokenizing the same
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
//iterating through all the words available in that line and forming the key value pair
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
//sending to output collector which inturn passes the same to reducer
output.collect(word, one);
}
}
}
Step 4:
Open WordCountReducer.java and paste the following code.
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>
{
//reduce method accepts the Key Value pairs from mappers, do the aggregation based on keys and produce the final out put
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
int sum = 0;
/*iterates through all the values available with a key and add them together and give the
final result as the key and sum of its values*/
while (values.hasNext())
{
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
you need to remove dependencies by
adding jar files in the hadoop source
folder. Now Click on Project tab and go to Properties.Under
Libraries tab, click Add External JARs and select all the jars in the folder
(click on 1st jar, and Press Shift and Click on last jar
to select all jars in between and click ok) /home/radha/radha/hadoop-1.21/common and /home/radha/radha/hadoop1.2.1/share/hadoop/mapreduce folders.
Step 6:
Now Click on
the Run tab and click Run-Configurations. Click on New Configuration button on
the left-top side and Apply after filling the following properties.
Name - Any name will do - Ex:
WordCountConfig
Project - Browse and select
your project
Main Class - Select
WordCount.java - this is our main class
Step 7:
Now click on File tab and select
Export. under Java, select Runnable Jar.
In Launch Config - select the config
fie you created in Step 6 (WordCountConfig).
Select an export destination
( lets say
desktop.)
Under Library handling,
select Extract Required Libraries into generated JAR and click Finish.
Right-Click the jar file, go
to Properties and under Permissions tab, Check Allow executing file
as a program. and give Read and Write access to
all the users.
Great article with excellent idea!Thank you for such a valuable article. I really appreciate for this great information.. this
ReplyDelete