6.0 MapReduce Usage
Category Hadoop Tutorial
After studying the previous concepts of MapReduce, we should already know what Map and Reduce are and understand how they work.
This chapter will teach you how to use MapReduce.
Word Count
Word Count, which means "word count," is the most classic type of MapReduce program. Its main task is to summarize and count the words in a text file, counting the number of times each word appears.
Hadoop includes many classic MapReduce example programs, including Word Count.
Note: This example can still run without HDFS running, so let's first test it in standalone mode.
First, start a new container of the previously created hadoop_proto image:
docker run -d --name=word_count hadoop_proto
Enter the container:
docker exec -it word_count bash
Go to the HOME directory:
cd ~
Now, we prepare a text file input.txt:
I love tutorialpro
I like tutorialpro
I love hadoop
I like hadoop
Save the above content with a text editor.
Execute MapReduce:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.4.jar wordcount input.txt output
Explain the meaning:
hadoop jar executes the MapReduce job from the jar file, followed by the path of the example program package.
wordcount indicates the execution of the Word Count program in the example program package, followed by these two parameters, the first is the input file, and the second is the directory name of the output result (because the output result is multiple files).
After execution, a folder named output should be output, and there are two files in this folder: _SUCCESS and part-r-00000.
Among them, _SUCCESS is just an empty file used to express the successful execution, and part-r-00000 is the processing result. When we display its content:
cat ~/output/part-r-00000
You should be able to see the following information:
I 4
hadoop 2
like 2
love 2
tutorialpro 2
Cluster Mode
Now let's run MapReduce in cluster mode.
Start the cluster container configured in the previous chapter:
docker start nn dn1 dn2
Enter the NameNode container:
docker exec -it nn su hadoop
Go to HOME:
cd ~
Edit input.txt:
I love tutorialpro
I like tutorialpro
I love hadoop
I like hadoop
Start HDFS:
start-dfs.sh
Create a directory:
hadoop fs -mkdir /wordcount
hadoop fs -mkdir /wordcount/input
Upload input.txt
hadoop fs -put input.txt /wordcount/input/
Execute Word Count:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.4.jar wordcount /wordcount/input /wordcount/output
Check the execution result:
hadoop fs -cat /wordcount/output/part-r-00000
If everything is normal, the following result will be displayed:
I 4
hadoop 2
like 2
love 2
tutorialpro 2
6.0 MapReduce Usage