Hi, In this Blog i will explain about one of the most important part of Big Data Hadoop , known as HDFS ( hadoop distributed file structure) ( NOTE:- in this we will stick only to HDFS technical aspects not about its history and theory ) HDFS , as name suggest is something related to storage with distirbuted characteristics, so lets starts HDFS 🙂 WHAT IS HDFS :-
- HDFS stands for hadoop distributed file structure system , whichis used to store data in hadoop cluster ( hadoop is technology which is used to manage BIg Data)
- HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets
Architecture of HDFS :- HDFS has two components :- 1. NAME NODE 2. DATA NODE NameNode :- HDFS works on master slave architecture ( its an architecture where one server act as master give orders to other servers ,those server are known as slaves) the server which acts as master is known as namenode server , important :- namenode is daemon. As it is a master hence there will be only 1 namenode in full hadoop cluster ( sadly :- its a single point of failure in hadoop cluster) Here are the main functions which :-
- It keeps the directory tree of all files in the file system,
- And tracks where across the cluster the file data is kept.
- It does not store the data of these files itself. it just contain address of the data where is has been kept on HDFS
- Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file.
- The NameNode responds the successful requests by returning a list of relevant DataNode(will cover later) servers where the data lives
- It also determines the mapping of blocks to DataNodes
The NameNode is a Single Point of Failure for the HDFS Cluster. HDFS is not currently a High Availability system. When the NameNode goes down, the file system goes offline. There is an optional SecondaryNameNode that can be hosted on a separate machine. It only creates checkpoints of the namespace by merging the edits file into the fsimage file and does not provide any real redundancy DATANODE :-
- As described earlier hdfs is master-slave architecture , datanodes are slaves in this architecture
- A DataNode stores data in the HadoopFileSystem. A functional filesystem has more than one DataNode, with data replicated ( duplicate copy) across them.
- Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data
Now lets go to practical approach which will make things more clear A typical HDFS architecture :- Lets have a look on each component :- 1. Namenode :- described earlier 2.DataNOde:- described earlier 3.client :- MAchine which request HDFS to write,create,delete file in HDFS 4. Blocks :- blocks are basic building units in hdfs , which means a file is broken into blocks to be stored in HDFS ( for eg u have 40 m long pipe :- then u can break it into 10 + 10 + 10 + 10 meters small pipes , same like that as HDFS deals with big data , data usually in TB’S hence that data break into small units these units are known as Blocks . 5. RACK :- A computer rack (commonly called a rack) is a metal frame used to hold various hardware devices such as servers, hard disk drives, modems and other electronic equipment 6. Replication :- one of the best property of HDFS , replication means that there are more than 1 copy of data(block) in whole cluster (hdfs) , generally 3. which help us not to loose data(block) easily in hdfs. Now lets explain the above diagram :-
- Namenode contains metadata information about the data in HDFS like , which datanode store which block ( only address of datablock not the whole data) , and always be the first point of contact by the client
- Metadata(name,repl) denotes the address of data , and total replicated copies of that data
Read Operations in HDFS :- as explained earlier , first point of contact is Namenode hence :- 1. To start the file read operation, client opens the required file by calling open() on Filesystem object which is an instance of DistributedFileSystem. Open method initiate HDFS client for the read request. 2. DistributedFileSystem interacts with Namenode to get the block locations of file to be read. Block locations are stored in metadata of namenode. For each block,Namenode returns the sorted address of Datanode that holds the copy of that block.Here sorting is done based on the proximity of Datanode with respect to Namenode, picking up the nearest Datanode first. 3. DistributedFileSystem returns an FSDataInputStream, which is an input stream to support file seeks to the client. FSDataInputStream uses a wrapper DFSInputStream to manage I/O operations over Namenode and Datanode. Following steps are performed in read operation. a) Client calls read() on DFSInputStream. DFSInputStream holds the list of address of block locations on Datanode for the first few blocks of the file. It then locates the first block on closest Datanode and connects to it. b) Block reader gets initialized on target Block/Datanode along with below information:
- Block ID.
- Data start offset to read from.
- Length of data to read.
- Client name.
c) Data is streamed from the Datanode back to the client in form of packets, this data is copied directly to input buffer provided by client.DFS client is reading and performing checksum operation and updating the client buffer d) Read () is called repeatedly on stream till the end of block is reached. When end of block is reached DFSInputStream will close the connection to Datanode and search next closest Datanode to read the block from it. 4. Blocks are read in order, once DFSInputStream done through reading of the first few blocks, it calls the Namenode to retrieve Datanode locations for the next batch of blocks. 5. When client has finished reading it will call Close() on FSDataInputStream to close the connection. 6. If Datanode is down during reading or DFSInputStream encounters an error during communication, DFSInputStream will switch to next available Datanode where replica can be found. DFSInputStream remembers the Datanode which encountered an error so that it does not retry them for later blocks. As you can see that client with the help of Namenode gets the list of best Datanode for each block and communicates directly with Datanode to retrieve the data. Here Namenode serves the address of block location on Datanode rather than serving data itself which could become the bottleneck as the number of clients grows. This design allows HDFS to scale up to a large numbers of clients since the data traffic is spread across all the Datanodes of clusters. Write Operations in HDFS :-
- Step 1: The client creates the file by calling create() method on DistributedFileSystem.
- Step 2: DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespace, with no blocks associated with it.
- The namenode performs various checks to make sure the file doesn’t already exist and that the client has the right permissions to create the file. If these checks pass, the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException. The DistributedFileSystem returns anFSDataOutputStream for the client to start writing data to. Just as in the read case, FSDataOutputStream wraps a DFSOutputStream, which handles communication with the datanodes and namenode.
- Step 3: As the client writes data, DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, which is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas. The list of datanodes forms a pipeline, and here we’ll assume the replication level is three, so there are three nodes in the pipeline. TheDataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline.
- Step 4: Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline.
- Step 5: DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline.
- Step 6: When the client has finished writing data, it calls close() on the stream.
- Step 7: This action flushes all the remaining packets to the datanode pipeline and waits for acknowledgments before contacting the namenode to signal that the file is complete The namenode already knows which blocks the file is made up of (via DataStreamer asking for block allocations), so it only has to wait for blocks to be minimally replicated before returning successfully.
So what happens when datanode fails while data is being written to it ?
- First, the pipeline is closed, and any packets in the ack queue are added to the front of the data queue so that datanodes that are downstream from the failed node will not miss any packets.
- The current block on the good datanodes is given a new identity, which is communicated to the namenode, so that the partial block on the failed datanode will be deleted if the failed datanode recovers later on.
- The failed datanode is removed from the pipeline, and the remainder of the block’s data is written to the two good datanodes in the pipeline.
- The namenode notices that the block is under-replicated, and it arranges for a further replica to be created on another node. Subsequent blocks are then treated as normal.
Hope this above explanations gves u some insight about the HDFS . Thanks and cheers 🙂 Please do comment or like if u think …