In this blog i will explain about Inside of map task + reduce task .
As we know that in broad way mapreduce follows a simple mechanism like :-
But in actual lot of done inside the two main phases know as map and reduce ( specially sorting and shuffling)
Then lets start with the Map phase :-
Shuffle :- The process of distributing maps output to reducers as inputs is know as shuffling
1) Map Side :-
- when map function starts producing output it does not write them to disk
- process takes advantages of buffering writes in memory which help out for some more efficiency
- each map task has a circular memory which is used for writing out output of each map task . this circular memory is basically buffer , buffer size is 100 mb by default
- it can be changed using property of io.sort.mb . after starting writting the map output to circular memory / buffer then when the contents reaches a threshold size ( io.sort.spill.percent by default 80%)
- A new thread will start spilling the output records to local disk . Map task output written to buffer and spilling output to disk will takes place in parallel then afterwards .
- spills are written in round robin manner to the specified directory by property :- mapred.local.dir
- But before written spills onto the disks , the thread first divides the spill data into partitions corresponding to reducers that they will ultimately be sent to .
- In each partition the data is arranged according to sort ( which we say and in-memory sort key process) .
- If there is combiner function then it will run on the sort data , which means it will produce more sorted and compact map output , so less data needs to be transferred to reduce and written on local disk
- As every time the buffer reaches spill threshold new spill record is generated , hence after map task write its last out there may be many spills , therefore before the map task completed thread merge spill records( streams) into a single partitioned and sorted output file.
- the property io.sort.factor will tell the maximum numbers of spills record that can be merge at once
- add on :- it is good to compress the output of map task written which saves disk space, saves the amount of data to be transferred to reducer
- By default compression is not enabled , we need to enable it by setting mapred.compress.map.output to true
- Now , map output file ( u can call it partitioned file as well ) is available for reducers over http . The maximum number of worker threads used to server the file partitions is controlled by the tasktracker.http.thrteads property ; this setting is per tasktracker, not per map task
2. The Reducer Side
- At this point of time we have the output of maptask , which is written into map task local disk
- Now the reduce task will start to verify which partition/output of map task is needed by it . Map tasks generally completed on different times , but reduce task start copying the map partioned output as as soon as each completes this copy mechanism is known as Copy phase
- reduces using threads for copying the map output data which fetch output in parrellel. default numbner for these threads is 5 . can be change using mapred.reduce.parellel.copies property.
- Map output directly copy to reduce JVM memory if there are small in numbers otherwise
- they are copied to disk .
- As now the data is on disk then a background thread merges them into larger,sorted files . This saves time merging later on .
- Note :- every map output which is compressed must need to decompressed first
- When all the mapout have been copied then rduce task moves into sort phase ( which should be called merged phase as sorting was carried in map phase) , which merge the map output managing their sort order . this is done in round robin fashion .
- Merge factor play an important role in merging files . For ex :- if there are 50 outputs and merge factor is 10 ( by default , can be change using io.sort.factor ), there would be 5 rounds . each round will merge 10 Files into one , so that at end there will be 5 intermediate files .
- Rather than merging these fives files into a a single sorted files , the merge will feed these files to reduce function hence saving a trip to disk .
- Once reduce function get these files it will perform its operation and write their output to filesystem , typically HDFS.
How do reducers know which tasktrackers to fetch map output from?
As map tasks complete successfully, they notify their parent tasktracker
of the status update, which in turn notifies the jobtracker. These notifications
are transmitted over the heartbeat communication mechanism
described earlier. Therefore, for a given job, the jobtracker knows the
mapping between map outputs and tasktrackers. A thread in the reducer
periodically asks the jobtracker for map output locations until it has
retrieved them all.
Tasktrackers do not delete map outputs from disk as soon as the first
reducer has retrieved them, as the reducer may fail. Instead, they wait
until they are told to delete them by the jobtracker, which is after the
job has completed.
Hence this is the internal process of sorting shuffling in MapReduce mechanism.
In next blog we will learn about some more features of MapReduce
Hope you guys like this . Please like or comment for ypur valuable feedback.
Thanks .. Cheers 🙂