7.0 MapReduce Programming

❮ Android Tutorial Bitmap2 Android Tutorial Sdk Problem Solve ❯

7.0 MapReduce Programming

Category Hadoop Tutorial

After learning about the use of MapReduce, we can already handle statistical and retrieval tasks such as Word Count, but objectively, there is still much more that MapReduce can do.

MapReduce mainly relies on developers to implement functionality through programming. Developers can process data by implementing methods related to Map and Reduce.

To simply demonstrate this process, we will manually write a Word Count program.

Note: MapReduce depends on the Hadoop library, but since the Hadoop running environment used in this tutorial is a Docker container, it is difficult to deploy the development environment, so the actual development work (including debugging) will require a computer running Hadoop. Here, we only learn about the deployment of the completed program.

MyWordCount.java File Code

/**
 * Reference declaration
 * This program is referenced from http://hadoop.apache.org/docs/r1.0.4/cn/mapred_tutorial.html
 */
package com.tutorialpro.hadoop;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
/**
 * Methods related to `Map`
 */
class Map extends MapReduceBase implements Mapper&lt;LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    public void map(LongWritable key,
                    Text value,
                    OutputCollector&lt;Text, IntWritable> output,
                    Reporter reporter)
            throws IOException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            output.collect(word, one);
        }
    }
}
/**
 * Methods related to `Reduce`
 */
class Reduce extends MapReduceBase implements Reducer&lt;Text, IntWritable, Text, IntWritable> {
    public void reduce(Text key,
                       Iterator<IntWritable> values,
                       OutputCollector&lt;Text, IntWritable> output,
                       Reporter reporter)
           throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        output.collect(key, new IntWritable(sum));
    }
}
public class MyWordCount {
    public static void main(String[] args) throws Exception {
        JobConf conf = new JobConf(MyWordCount.class);
        conf.setJobName("my_word_count");
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);
        // The first argument represents the input
        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        // The second argument represents the output
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));
        JobClient.runJob(conf);
    }
}

Please save the contents of this Java file to the NameNode container, suggested location:

/home/hadoop/MyWordCount/com/tutorialpro/hadoop/MyWordCount.java

Note: Depending on the current situation, some Docker environments installed with JDK do not support Chinese, so to be on the safe side, please remove the Chinese comments from the above code.

Enter the directory:

cd /home/hadoop/MyWordCount

Compile:

javac -classpath ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.1.4.jar -classpath ${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.4.jar com/tutorialpro/hadoop/MyWordCount.java

Package:

jar -cf my-word-count.jar com

Execute:

hadoop jar my-word-count.jar com.tutorialpro.hadoop.MyWordCount /wordcount/input /wordcount/output2

View the results:

hadoop fs -cat /wordcount/output2/part-00000

Output:

I       4
hadoop  2
like    2
love    2
tutorialpro  2

WeChat Follow

❮ Android Tutorial Bitmap2 Android Tutorial Sdk Problem Solve ❯