6.0 MapReduce Usage

❮ Embedded Engineer Require Skills Front End Interview A Few Important Points Of Knowledge ❯

6.0 MapReduce Usage

Category Hadoop Tutorial

After studying the previous concepts of MapReduce, we should already know what Map and Reduce are and understand how they work.

This chapter will teach you how to use MapReduce.

Word Count

Word Count, which means "word count," is the most classic type of MapReduce program. Its main task is to summarize and count the words in a text file, counting the number of times each word appears.

Hadoop includes many classic MapReduce example programs, including Word Count.

Note: This example can still run without HDFS running, so let's first test it in standalone mode.

First, start a new container of the previously created hadoop_proto image:

docker run -d --name=word_count hadoop_proto

Enter the container:

docker exec -it word_count bash

Go to the HOME directory:

cd ~

Now, we prepare a text file input.txt:

I love tutorialpro
I like tutorialpro
I love hadoop
I like hadoop

Save the above content with a text editor.

Execute MapReduce:

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.4.jar wordcount input.txt output

Explain the meaning:

hadoop jar executes the MapReduce job from the jar file, followed by the path of the example program package.

wordcount indicates the execution of the Word Count program in the example program package, followed by these two parameters, the first is the input file, and the second is the directory name of the output result (because the output result is multiple files).

After execution, a folder named output should be output, and there are two files in this folder: _SUCCESS and part-r-00000.

Among them, _SUCCESS is just an empty file used to express the successful execution, and part-r-00000 is the processing result. When we display its content:

cat ~/output/part-r-00000

You should be able to see the following information:

I       4
hadoop  2
like    2
love    2
tutorialpro  2

Cluster Mode

Now let's run MapReduce in cluster mode.

Start the cluster container configured in the previous chapter:

docker start nn dn1 dn2

Enter the NameNode container:

docker exec -it nn su hadoop

Go to HOME:

cd ~

Edit input.txt:

I love tutorialpro
I like tutorialpro
I love hadoop
I like hadoop

Start HDFS:

start-dfs.sh

Create a directory:

hadoop fs -mkdir /wordcount
hadoop fs -mkdir /wordcount/input

Upload input.txt

hadoop fs -put input.txt /wordcount/input/

Execute Word Count:

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.4.jar wordcount /wordcount/input /wordcount/output

Check the execution result:

hadoop fs -cat /wordcount/output/part-r-00000

If everything is normal, the following result will be displayed:

I       4
hadoop  2
like    2
love    2
tutorialpro  2

WeChat Follow

❮ Embedded Engineer Require Skills Front End Interview A Few Important Points Of Knowledge ❯