What does the Hadoop cluster topology mean

Hadoop file system

File system
is a top-level abstraction. The specific implementation depends on the instance you get. We can get the local file system through the file system, edit or get the files on the Linux hard drive Distributed file system, operating files on HDFS

Hadoops file system
ftp: // ftp file system: Can upload and download files

webHdfs: The browser operates the file system with which we can upload, download, modify and modify files on HDFS via the browser

hdfs: Distributed file system, the most important

local: local file system

HDFS comes from Google's GFS article (GFS, Mapreduce, BigTable is Google's old troika) and was published in October 2003.
is available on any computer that is a distributed file system.
HDFS is a clone of the GFS Hadoop distributed file system

  • Easily expandable distributed file system
  • Run on a large number of common cheap machines and provide a fault tolerance mechanism
  • Provide good file access services to a large number of users

The goal of HDFS file system design

  1. Hardware failures are normal and hard drives have been in use for 7 * 24 hours. Hard drive damage, in particular, is normal.
    to solve: Copy mechanism
  2. Stream access. All access is to access a large amount of data, use the I / O stream to work all the time. Striving is stability, not efficiency
  3. Big data set. Assuming all the data stored in hdfs is massive data, they cannot handle small files. A small file occupies a metadata, and the metadata is stored in the memory and occupies a large amount of the memory of the name node.
  4. Simple correlation model. Assuming the file is written once and read many times, there are no frequent updates that will result in frequent metadata changes, and it is good to save some historical data
  5. Mobile computing is cheaper than mobile data. The code can calculate three data sets at the same time, but the amount of mobile data is very large.
  6. Portability of a wide variety of software and hardware

HDFS infrastructure

Heartbeats: Heartbeat mechanism, DataNode periodically sends a heartbeat mechanism to NameNode to tell NamNode that it is not dead. In this way the NameNode can recognize how many DataNodes it owns are still active and available.
balancing: Compensation mechanism, data is written as evenly as possible in each machine.
replication: There are three copies of a block and a total of three for that block. They are copies of each other.

  1. NameNodeIs a central server, a single node (simplifies the design and implementation of the system) that is responsible for managing the namespace (namespace) of the file system and client access to files
  2. File operationsThe name node is responsible for running file metadata, and the data node is responsible for processing reading and writing of file contents. The data flow related to the file content does not go through the Namenode and only asks which data node it is contacting, otherwise the NameNode becomes the bottleneck of the system
  3. Copy memoryWhich data nodes are controlled by the NameNode and make block placement decisions based on the global situation. When reading files, the NameNode tries to allow users to read the latest copy first, which reduces reading network overhead and read latency

NameNode fully manages the replication of data blocks. It regularly receives heartbeat information and status reports from every DataNode in the cluster. Receiving a heartbeat signal means that the DataNode is functioning normally and the block status report contains all of the data on the DataNode. List. Make sure you have three copies of the block block. If this is not enough, it will be copied.

The difference between NameNode and Datanode

Copy mechanism and block storage

Block storage
All files are stored in block blocks in the HDFS file system. In hadoop1 the standard block size of the files is 64 MB, and in hadoop2 the standard block size of the files is 128 MB, the block size can be specified via the configuration file in hdfs-site.xml

If the files are large, you can increase the block size slightly
The 128M block is faster than the 300M file. 3 blocks are generated. The metadata information of the 3 blocks is saved in the name node.
The 256M block is faster than the 300M file. 2 blocks are generated. The metadata information of the 2 blocks is stored in the name node.
This way, the amount of name node information is less stored

Checking HDFS file permissions

Hdfs's file permission mechanism is similar to that of the Linux system:
r: read w: write x: execute
Permission x means ignore files and whether you have permission to access their contents for folders

When the Linux system user zhangsan uses the hadoop command to create a file, the file in HDFS is owned by zhangsan

The purpose of HDFS file permissions is to keep good people from doing bad things, not bad people from doing bad things. HDFS believes you are telling me who you are who you are

HDFS metadata information FSimage, Edits and SecondaryNN

edits: Operating log of the last period
fsimage: Relatively complete metadata information is stored. One is on the hard drive and one is in memory. The number of files that can be saved depends on the memory size of the name node. Therefore, hdfs recommends storing large files.

In general, the operations on the name node are processed at the beginning. Why not in fsimage?
Since fsimage is a complete picture of the name node, the content is very large. Each time it is loaded into memory, a tree topology is generated that is very memory and computationally intensive.

The content of fsimage contains metadata information about files and file blocks in all data nodes managed by namenode and the data node in which the block is located. As the content of the changes grows, it needs to be merged with fsimage at some point.

View the file information in the FSimage file

Use the hdfs oiv command

View file information in Changes

Show command hdfs oev

How Secondary NameNode helps manage FSImage and edit files

①: secnonaryNN informs NameNode about the change of the processing protocol
②: SecondaryNN receives FSImage and Editlog from NameNode (via http)
③: SecondaryNN loads the FSImage into the memory, then starts to merge the processing log and the merge becomes a new fsimage (since fsimage is to be read into the memory, snn needs a large memory, usually an independent machine)
④: SecondaryNN sends the new picture back to NameNode
⑤: NameNode replaces the old fsimage with the new fsimage


Control the size of the edit file: Duration, file size
merges our edtis file via the secondary name code in fsimage
fs.checkpoint.period: The default value is one hour (3600s).
fs.checkpoint.size: Changes also trigger a merge when they reach a certain size (default 64MB).

Process of writing and reading files

File upload process

  1. The client initiates a file upload request and establishes communication with the NameNode via RPC. The NameNode checks whether the target file already exists, whether the parent directory exists, and returns whether it can be uploaded. The file is split on the client.
  2. On which DataNode servers does the client request the transfer of the first block?
  3. NameNode according to the number of backups specified in the configuration file andRack detection principlePerform the file association and return the addresses of the available DataNodes, e.g. E.g .: A, B, C;
    Rack detection principle: Find the machine closest to the client (fewer cross switches).
    Note: Hadoop was designed with data security and efficiency in mind. By default, three data files are stored in HDFS.Storage strategyIt is a local copy, a copy on another node in the same rack, and a copy on a node in a different rack.
  4. The client asks one of the three DataNodes A to upload data (essentially an RPC call to set up a pipeline), A keeps calling B when it receives the request, and then calls BC to finish building the entire pipeline, and Then return to the client level by level.
  5. The client starts uploading the first block to A (first reading the data from the hard drive and storing it in a local memory cache). In the packet unit (UDP data packet, standard 64 KB), A receives a packet and then sends it to B and B sends to C; A puts a packet in a reply queue to wait for a reply every time it sends a packet.
  6. The data is divided into packet data packets and transmitted one after the other in the pipeline. In the opposite direction of the pipeline, an acknowledgment (message acknowledgment mechanism; correct response) is sent one after the other, and finally the pipeline tack is sent to the client by the first DataNode node A in the pipeline;
  7. When the transfer of a block is complete, the client asks the NameNode again to upload the second block to the server.

Process of reading files

  1. The client initiates an RPC request to the NameNode to determine the location of the requested block of files.
  2. The NameNode returns part or all of the file's block list, depending on the case. For each block, the NameNode returns the DataNode address that contains the copy of the block. These returned DN addresses determine the distance between the DataNode and the client depends on the cluster topology and then continues with the sorting. Two rules for sorting: The client closest to the client in the network topology is ranked first. The DN status reported by the timeout in the heartbeat mechanism is STALE, and this rank is lower.
  3. The client selects the DataNode with the highest order to read the block. If the client is a DataNode itself, the data is retrieved directly from the local location (short-circuit reading function).
  4. The bottom layer is essentially to be establishedSocket streamCall (FSDataInputStream) the read method of the parent class DataInputStream repeatedly until the data in this block has been read.
  5. If, after reading the block list, the file has not been read, the client will continue to receive the next batch of block lists from the NameNode.
  6. A checksum check is carried out after a block has been read. If an error occurs while reading the DataNode, the client notifies the NameNode and then continues reading from the next DataNode, which contains a copy of the block. There is no resumable transfer function.
  7. The reading method reads block information in parallel, Block by block not read; NameNode only returns the DataNode address of the block containing the client request, not the data of the requested block.
  8. In the end, all of the blocks are merged into one complete final file.

JavaAPI operation

Since all software of the CDH version has copyright problems, all JAR packages are not hosted in the Maven warehouse, but rather on CDH's own server, so we cannot download them from the Maven warehouse by default. Manually add the repository to the CDH warehouse for download. The following two addresses are official documentation instructions. Please check them carefully1,2

Create a Maven project and import the JAR package

Use the URL to access data

Different ways to get FileSystem