1.0 Hadoop Tutorial
Category Hadoop Tutorial
Hadoop is an open-source distributed computing and storage framework developed and maintained by the Apache Foundation.
Hadoop provides reliable and scalable application-level computing and storage support for large computer clusters. It allows for the distributed processing of large datasets across clusters of computers using simple programming models and supports scaling from a single computer to thousands of computers.
Hadoop is developed in Java, making it deployable and usable on various hardware platforms. Its core components include the distributed file system (Hadoop DFS, HDFS) and MapReduce.
Hadoop History
In 2003 and 2004, Google published two famous papers, GFS and MapReduce.
These papers, along with the 2006 BigTable paper, became the renowned "Google Three Papers."
Influenced by these theories, Doug Cutting began developing Hadoop.
Hadoop contains two core components. In Google's papers, GFS is a distributed file system running on large computer clusters, and HDFS in Hadoop implements its functionality. MapReduce is a distributed computing method, and Hadoop implements its functionality with the MapReduce framework of the same name. We will discuss it in detail in the subsequent MapReduce section. Since 2008, Hadoop has existed as an Apache top-level project. It and its numerous subprojects are widely used by large internet service companies such as Yahoo, Alibaba, and Tencent, and are supported by platform companies like IBM, Intel, and Microsoft.
Role of Hadoop
The role of Hadoop is quite simple: to create a unified and stable storage and computing environment in a multi-computer cluster environment and to provide platform support for other distributed applications.
In a sense, Hadoop organizes multiple computers into a single computer (performing the same task), where HDFS acts as the hard drive of this computer, and MapReduce serves as the CPU controller.
- 1.0 Hadoop Tutorial
-4.0 HDFS Configuration and Usage