4.0 HDFS Configuration and Usage
Category Hadoop Tutorial
HDFS Configuration and Startup
hadoop version
If the result displays the Hadoop version number, it means Hadoop is present.
Next, we will proceed with the formal steps.
Create a Hadoop User
Create a new user named hadoop:
adduser hadoop
Install a small utility for modifying user passwords and managing permissions:
yum install -y passwd sudo
Set the hadoop user password:
passwd hadoop
Enter the password twice, make sure to remember it!
Change the ownership of the Hadoop installation directory to the hadoop user:
chown -R hadoop /usr/local/hadoop
Then, use a text editor to modify the /etc/sudoers file, after
root ALL=(ALL) ALL
Add a line
hadoop ALL=(ALL) ALL
Then exit the container.
Stop and commit the container hadoop_single to the image hadoop_proto:
docker stop hadoop_single
docker commit hadoop_single hadoop_proto
Create a new container hdfs_single:
docker run -d --name=hdfs_single --privileged hadoop_proto /usr/sbin/init
This way, a new user is created.
Start HDFS
Now enter the newly created container:
docker exec -it hdfs_single su hadoop
Now you should be the hadoop user:
whoami
It should display "hadoop"
Generate SSH keys:
ssh-keygen -t rsa
You can just press enter until the generation is finished.
Then add the generated key to the trusted list:
ssh-copy-id [email protected]
Check the container IP address:
ip addr | grep 172
Thus, you know the container's IP address is 172.17.0.2, your IP may differ.
Before starting HDFS, we will make some simple configurations. All Hadoop configuration files are stored in the etc/hadoop subdirectory under the installation directory, so we can enter this directory:
cd $HADOOP_HOME/etc/hadoop
Here we modify two files: core-site.xml and hdfs-site.xml
In core-site.xml, under the tag, add the attribute:
<property>
<name>fs.defaultFS</name>
<value>hdfs://<your IP>:9000</value>
</property>
In hdfs-site.xml under the tag, add the attribute:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Format the file structure:
hdfs namenode -format
Then start HDFS:
start-dfs.sh
The startup is divided into three steps, starting the NameNode, DataNode, and Secondary NameNode respectively.
We can run jps to check the Java processes.
So far, the HDFS daemon has been established, and since HDFS itself has an HTTP panel, we can access the HDFS panel and detailed information through the browser at http://your container IP:9870/:
If this page appears, it means that HDFS is configured and started successfully.
Note: If you are not using a Linux system with a desktop environment and have no browser, you can skip this step. If you are using a Windows system but are not using Docker Desktop, this step will be difficult for you.
HDFS Usage
HDFS Shell
Back to the hdfs_single container, the following commands will be used to operate HDFS:
# Display files and subdirectories in the root directory /, absolute path
hadoop fs -ls /
# Create a folder, absolute path
hadoop fs -mkdir /hello
# Upload a file
hadoop fs -put hello.txt /hello/
# Download a file
hadoop fs -get /hello/hello.txt
# Output file content
hadoop fs -cat /hello/hello.txt
The most basic commands of HDFS are as described above, in addition to many other operations supported by traditional file systems.
HDFS API
HDFS is supported by many backend platforms, and currently, the official release includes programming interfaces for C/C++ and Java. In addition, package managers for