2.0 Hadoop Runtime Environment

❮ Safety Engineer Skills Pip Cn Mirror ❯

2.0 Hadoop Runtime Environment

Category Hadoop Tutorial

Since Hadoop is designed for clusters, it is inevitable that we will encounter the need to configure Hadoop on multiple computers when learning its usage. This poses several obstacles for learners, mainly two:

Expensive computer clusters. A cluster environment consisting of multiple computers requires expensive hardware.
Difficult to deploy and maintain. Deploying the same software environment on numerous computers is a significant task and is very inflexible, making it hard to redeploy after environmental changes.

To address these issues, we have a very mature solution: Docker.

Docker is a container management system that can run multiple "virtual machines" (containers) and form a cluster, similar to virtual machines. However, since virtual machines fully virtualize a computer, they consume a lot of hardware resources and are inefficient. Docker, on the other hand, provides an independent, replicable runtime environment, and all processes in the container are still executed in the host's kernel, so its efficiency is almost the same as that of processes on the host (close to 100%).

This tutorial will describe the usage of Hadoop with Docker as the underlying environment. If you are not familiar with Docker and do not know a better way, please study the Docker Tutorial.

Windows Docker Installation

Note: Windows users are advised to install Docker using a virtual machine solution.

Docker Deployment

After entering the Docker command line, pull a Linux image to serve as the Hadoop runtime environment. Here, CentOS image is recommended (Debian and other images may have some issues for now).

docker pull centos:8

Then, you can view the local images with the docker images command:

Now, create a container:

docker run -d centos:8 /usr/sbin/init

You can view running containers with docker ps:

We can make the container print "Hello World":

This indicates that Docker has been successfully installed and deployed.

Creating a Container

Hadoop supports running on a single device, mainly in two modes: standalone mode and pseudo-cluster mode.

This chapter covers the installation and standalone mode of Hadoop.

Configuring Java and SSH Environment

Now, create a container named java_ssh_proto to configure an environment containing Java and SSH:

docker run -d --name=java_ssh_proto --privileged centos:8 /usr/sbin/init

Then, enter the container:

docker exec -it java_ssh_proto bash

Configure the image:

sed -e 's|^mirrorlist=|#mirrorlist=|g' \
         -e 's|^#baseurl=http://mirror.centos.org/$contentdir|baseurl=https://mirrors.ustc.edu.cn/centos|g' \
         -i.bak \
         /etc/yum.repos.d/CentOS-Linux-AppStream.repo \
         /etc/yum.repos.d/CentOS-Linux-BaseOS.repo \
         /etc/yum.repos.d/CentOS-Linux-Extras.repo \
         /etc/yum.repos.d/CentOS-Linux-PowerTools.repo \
         /etc/yum.repos.d/CentOS-Linux-Plus.repo
yum makecache

Install OpenJDK 8 and SSH services:

yum install -y java-1.8.0-openjdk-devel openssh-clients openssh-server

Then, enable SSH services:

systemctl enable sshd && systemctl start sshd

For Ubuntu systems, use the following commands to start SSH services:

systemctl enable ssh && systemctl start ssh

If no faults occur, a prototype container with a Java runtime environment and SSH environment has been created. This is a crucial container, so it is recommended to exit the container with the exit command here, then stop the container and save it as an image named java_ssh:

docker stop java_ssh_proto
docker commit java_ssh_proto java_ssh

Hadoop Installation

Download Hadoop

Hadoop official website: http://hadoop.apache.org/

Hadoop release versions download: https://hadoop.apache.org/releases.html

In current tests, versions 3.1.x and 3.2.x have better compatibility. This tutorial uses version 3.1.4 as an example.

Hadoop 3.1.4 image address, download the tar.gz compressed package file for later use.

Create Hadoop Standalone Container

Now, create a container named hadoop_single using the previously saved java_ssh image:

docker run -d --name=hadoop_single --privileged java_ssh /usr/sbin/init

Copy the downloaded Hadoop compressed package to the /root directory in the container:

docker cp &lt;path where you stored the Hadoop compressed package> hadoop_single:/root/

Enter the container:

docker exec -it hadoop_single bash

Go to the /root directory:

cd /root

Here, you should find the Hadoop-x.x.x.tar.gz file that was just copied. Now, extract it:

tar -zxf hadoop-3.1.4.tar.gz

After extraction, you will get a folder named hadoop-3.1.4. Now, move it to a common location:

mv hadoop-3.1.4 /usr/local/hadoop

Then, configure the environment variables:

echo "export HADOOP_HOME=/usr/local/hadoop" >> /etc/bashrc
echo "export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin" >> /etc/bashrc

Exit the Docker container and re-enter.

At this point, the result of echo $HADOOP_HOME should be /usr/local/hadoop.

echo "export JAVA_HOME=/usr" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh
echo "export HADOOP_HOME=/usr/local/hadoop" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh

These steps configure the Hadoop internal environment variables. Then, execute the following command to check if it was successful:

hadoop version

This indicates that your Hadoop standalone version has been configured successfully.

-1.0 Hadoop Tutorial

2.0 Hadoop Runtime Environment

-3.0 Hadoop Concepts

-4.0 HDFS Configuration and Usage

-5.0 HDFS Cluster

-6.0 MapReduce Usage

-7.0 MapReduce Programming

WeChat Subscription

❮ Safety Engineer Skills Pip Cn Mirror ❯