Hadoop :- Inside MapReduce ( Process of Shuffling , sorting ) – Part II


In this blog i will explain about  Inside of map task + reduce task .

As we know that in broad way mapreduce follows a simple mechanism like :-


But in actual lot of done inside the  two main phases know as map and reduce ( specially sorting and shuffling)

Then lets start with the Map phase :-


Shuffle :- The process of distributing maps output to reducers as inputs is know as shuffling

1) Map Side :- 

  • when map function starts producing output it does not write them to disk
  • process takes advantages of buffering writes in memory which help out for some more efficiency
  • each map task has a circular memory which is used for writing out output  of each map task . this circular memory is basically  buffer , buffer size is 100 mb by default
  • it can be changed using property of io.sort.mb . after starting writting the map output to circular memory / buffer then when the contents reaches a threshold size ( io.sort.spill.percent by default 80%)
  • A new thread will start spilling the output records to local disk . Map task output written to buffer and spilling output to disk will takes place in parallel then afterwards .
  • spills are written in round robin manner to the specified directory by property :- mapred.local.dir
  • But before written spills onto the disks , the thread first divides the spill data into partitions corresponding to reducers that they will ultimately be sent to .
  • In each partition the data is arranged according to sort ( which we say and in-memory sort key process) .
  • If there is combiner function then it will run on the sort data , which means it will produce more sorted and compact map output , so less data needs to be transferred to reduce and written on local disk
  • As every time the buffer reaches spill threshold new spill record is generated , hence after map task write its last out there may be many spills , therefore before the map task completed thread merge spill records( streams) into a single partitioned and sorted output file.
  •  the property io.sort.factor will tell the maximum numbers of spills record that can be merge at once 
  • add on :-  it is good to compress the output of map task written which saves disk space, saves the amount of data to be transferred  to reducer
  • By default compression is not enabled , we need to enable it by setting mapred.compress.map.output to true
  • Now , map  output file ( u can call it partitioned file as well ) is available for reducers over http . The maximum number of  worker threads used to server the file partitions is controlled by the tasktracker.http.thrteads  property ; this setting is per  tasktracker, not per map task

2. The Reducer Side

  • At this point of time we have the output of maptask , which is written into map task local disk
  • Now the reduce task will start to verify which partition/output of map task is needed by it . Map tasks generally completed on different times , but reduce task start copying the map partioned output as as soon as each completes this copy mechanism is known as Copy phase
  • reduces using threads for copying the map output data which fetch output in parrellel. default numbner for these threads is 5 . can be change using mapred.reduce.parellel.copies property.
  • Map output directly copy to reduce JVM memory if there are small in numbers otherwise
  • they are copied to disk .
  • As now the data is on disk then a background thread merges them into larger,sorted files . This saves time merging later on .
  • Note :- every map output which is compressed must need to decompressed first
  • When all the mapout have been copied then rduce task moves into sort phase ( which should be called merged phase as sorting was carried in map phase) , which merge the map output managing their sort order . this  is done in round robin fashion .
  • Merge factor play an important role in merging files . For ex :- if there are 50 outputs and merge factor is 10 ( by default , can be change using io.sort.factor ), there would be 5 rounds . each round will merge 10 Files into one , so that at end there will be 5 intermediate files .
  • Rather than merging these fives files into a a single sorted files , the merge will feed these files to reduce function hence saving a trip to disk .
  • Once reduce function get these files it will perform its operation and write their output to filesystem , typically HDFS.

Note :-

How do reducers know which tasktrackers to fetch map output from?
As map tasks complete successfully, they notify their parent tasktracker
of the status update, which in turn notifies the jobtracker. These notifications
are transmitted over the heartbeat communication mechanism
described earlier. Therefore, for a given job, the jobtracker knows the
mapping between map outputs and tasktrackers. A thread in the reducer
periodically asks the jobtracker for map output locations until it has
retrieved them all.
Tasktrackers do not delete map outputs from disk as soon as the first
reducer has retrieved them, as the reducer may fail. Instead, they wait
until they are told to delete them by the jobtracker, which is after the
job has completed.

Hence this is the internal process of sorting shuffling in MapReduce  mechanism.

In next blog we will learn  about some more features of  MapReduce 

Hope you guys like this . Please like or comment for ypur valuable feedback.

Thanks .. Cheers 🙂


2 thoughts on “Hadoop :- Inside MapReduce ( Process of Shuffling , sorting ) – Part II

  1. […] Inside MapReduce ( Process of Shuffling , sorting )                           https://haritbigdata.wordpress.com/2015/07/21/hadoop-inside-mapreduce-process-of-shuffling-sorting-p…                                                                  […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s