4.0 HDFS Configuration and Usage

❮ Lua Table Length Android Tutorial Exercise 1 ❯

4.0 HDFS Configuration and Usage

Category Hadoop Tutorial

HDFS Configuration and Startup

Hadoop Runtime Environment

hadoop version

If the result displays the Hadoop version number, it means Hadoop is present.

Next, we will proceed with the formal steps.

Create a Hadoop User

Create a new user named hadoop:

adduser hadoop

Install a small utility for modifying user passwords and managing permissions:

yum install -y passwd sudo

Set the hadoop user password:

passwd hadoop

Enter the password twice, make sure to remember it!

Change the ownership of the Hadoop installation directory to the hadoop user:

chown -R hadoop /usr/local/hadoop

Then, use a text editor to modify the /etc/sudoers file, after

root    ALL=(ALL)       ALL

Add a line

hadoop  ALL=(ALL)       ALL

Then exit the container.

Stop and commit the container hadoop_single to the image hadoop_proto:

docker stop hadoop_single
docker commit hadoop_single hadoop_proto

Create a new container hdfs_single:

docker run -d --name=hdfs_single --privileged hadoop_proto /usr/sbin/init

This way, a new user is created.

Start HDFS

Now enter the newly created container:

docker exec -it hdfs_single su hadoop

Now you should be the hadoop user:

whoami

It should display "hadoop"

Generate SSH keys:

ssh-keygen -t rsa

You can just press enter until the generation is finished.

Then add the generated key to the trusted list:

ssh-copy-id [email protected]

Check the container IP address:

ip addr | grep 172

Thus, you know the container's IP address is 172.17.0.2, your IP may differ.

Before starting HDFS, we will make some simple configurations. All Hadoop configuration files are stored in the etc/hadoop subdirectory under the installation directory, so we can enter this directory:

cd $HADOOP_HOME/etc/hadoop

Here we modify two files: core-site.xml and hdfs-site.xml

In core-site.xml, under the tag, add the attribute:

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://&lt;your IP>:9000</value>
</property>

In hdfs-site.xml under the tag, add the attribute:

<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>

Format the file structure:

hdfs namenode -format

Then start HDFS:

start-dfs.sh

The startup is divided into three steps, starting the NameNode, DataNode, and Secondary NameNode respectively.

We can run jps to check the Java processes.

So far, the HDFS daemon has been established, and since HDFS itself has an HTTP panel, we can access the HDFS panel and detailed information through the browser at http://your container IP:9870/:

If this page appears, it means that HDFS is configured and started successfully.

Note: If you are not using a Linux system with a desktop environment and have no browser, you can skip this step. If you are using a Windows system but are not using Docker Desktop, this step will be difficult for you.

HDFS Usage

HDFS Shell

Back to the hdfs_single container, the following commands will be used to operate HDFS:

# Display files and subdirectories in the root directory /, absolute path
hadoop fs -ls /
# Create a folder, absolute path
hadoop fs -mkdir /hello
# Upload a file
hadoop fs -put hello.txt /hello/
# Download a file
hadoop fs -get /hello/hello.txt
# Output file content
hadoop fs -cat /hello/hello.txt

The most basic commands of HDFS are as described above, in addition to many other operations supported by traditional file systems.

HDFS API

HDFS is supported by many backend platforms, and currently, the official release includes programming interfaces for C/C++ and Java. In addition, package managers for

❮ Lua Table Length Android Tutorial Exercise 1 ❯