Hadoop : Pig :- Logical Plan VS Physical Plan

Hi,

In this blog i am going to explain the execution plan of a Pig script ,

mainly there are 3 plan which will created while executing a Pig script/Program , but here i am going to explain the most important ones which are Logical Plan and Physical Plan , with help of example

Lets we have a employee file  plain text file columns separated by space

003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000

 

And we have a PIG script as follows :-

A = load ‘/emp’ using PigStorage(‘ ‘) as (eid:int,name:chararray,city:chararray,country:chararray,salary:int);
salfil = Filter A by salary>50000 ;
dump salfil ;

Explainantion :-

Script will load the data from ‘/emp’ directory anf then salfil Alias  will filter records according to conditions

Finally we will dispaly/dump the output which will be as follows :-

(7,Ravi,Patna,India,98777)
(8,Avinash,Punjab,India,120000)
(9,Saajan,Punjab,India,54000)
(110,Kaushal,Agra,India,90000)
(141,Ajay,Patna,India,120000)

 

Lets explain about logical plan now :-

  • Pig latin programs are based on interpreter checking
  • Basically logical plan is a plan which is created for each line in the pig script/programs
  • Interpreter check each statement/line about the syntax checks for operators ( logical operators) , and if find error then it will throw an exception and program execution ends
  • If no error found for the statement/line then a plan is generated which is known as logical plan and that plan will added to default logical plan of that program
  • with each line the logical plan for that programs becomes extended and bigger , because each statement has its own logical plan .
  • EG :-  For line

A = load ‘/emp’ using PigStorage(‘ ‘) as (eid:int,name:chararray,city:chararray,country:chararray,salary:int)

Interpreter will only check about the syntax of logical operators used in it :-  load as well syntax of PigStorage

if it find syntax correct then the logical plan of this statement will be added to logical plan of the main program .

this process will repeat for other statements as well

Important :-  

  • During logical plan no data processing takes place , Only syntax and semantics checks are taking place 
  • LogicalPlan will contain the collection of logical operators. It will not contain the edges between the operators

Reason :- there is no use of loading the data first if we need to filter it afterwards .

The trigger for pig to actually start the execution of statement is DUMP  statement

While everything goes right with each statement ,then after getting the DUMP statement ,logical plan will ve compiled to Physical plane

 

Logical Plan will be look like this :-

 

logicalplan

The flow of this chart is bottom to top so that the Load operator is at the very bottom. The lines between operators show the flow

 

Actual Logical plan for above program will be like :-

lplan1 lplan2

Physical Plan :- 

  • Physical plan is basically a series of map reduce jobs
  • This plan describes the physical operators Pig will use to execute the script, without reference to how they will be executed in MapReduce

Physical plan will be look like :-

pplan1

 

GROUP

The logical operator co-group would be converted to 3 physical operators the Local Rearrange, Global Rearrange and Package as shown below:

Group.png

The Local Rearrange takes the input tuple and outputs a key, value pair with the group field as the key and the tuple as the value

The Global Rearrange converts the key-value pairs of keys belonging to a partition into a set of (key, list of values). The partition is decided by which reducer the Global Rearrange is catering to. This need not be implemented by us as this is the intermediate step that happens between mapper and reducer.

The Package just takes each key, list of values and puts it in appropriate format as required by the co-group.

Both plans look similar but :- 

As explained earlier logical plan is not for processing data , and if we closely look the physical and logical plan we will get the difference

In physical plan :- Load function and Store function get resolved which was not in Logical plan

In logical plan :-

A: (Name: LOLoad Schema: eid#29:bytearray,name#30:bytearray,city#31:bytearray,country#32:bytearray,salary#33:bytearray)RequiredFields:null

In Physical Plan :-

A: Load(/emp:PigStorage(‘ ‘))

Hope this article help you to understand the concept of logical and physical plan for Pig Program .

There is one plan left which is Map-Reduce plan , which we i will explain later.

Hope you guys like it .

Please comment or like if find useful .

Thanks

Cheers 🙂

Advertisements

One thought on “Hadoop : Pig :- Logical Plan VS Physical Plan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s