In this blog i am going to explain the execution plan of a Pig script ,
mainly there are 3 plan which will created while executing a Pig script/Program , but here i am going to explain the most important ones which are Logical Plan and Physical Plan , with help of example
Lets we have a employee file plain text file columns separated by space
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
And we have a PIG script as follows :-
A = load ‘/emp’ using PigStorage(‘ ‘) as (eid:int,name:chararray,city:chararray,country:chararray,salary:int);
salfil = Filter A by salary>50000 ;
dump salfil ;
Script will load the data from ‘/emp’ directory anf then salfil Alias will filter records according to conditions
Finally we will dispaly/dump the output which will be as follows :-
Lets explain about logical plan now :-
- Pig latin programs are based on interpreter checking
- Basically logical plan is a plan which is created for each line in the pig script/programs
- Interpreter check each statement/line about the syntax checks for operators ( logical operators) , and if find error then it will throw an exception and program execution ends
- If no error found for the statement/line then a plan is generated which is known as logical plan and that plan will added to default logical plan of that program
- with each line the logical plan for that programs becomes extended and bigger , because each statement has its own logical plan .
- EG :- For line
A = load ‘/emp’ using PigStorage(‘ ‘) as (eid:int,name:chararray,city:chararray,country:chararray,salary:int)
Interpreter will only check about the syntax of logical operators used in it :- load as well syntax of PigStorage
if it find syntax correct then the logical plan of this statement will be added to logical plan of the main program .
this process will repeat for other statements as well
- During logical plan no data processing takes place , Only syntax and semantics checks are taking place
- LogicalPlan will contain the collection of logical operators. It will not contain the edges between the operators
Reason :- there is no use of loading the data first if we need to filter it afterwards .
The trigger for pig to actually start the execution of statement is DUMP statement
While everything goes right with each statement ,then after getting the DUMP statement ,logical plan will ve compiled to Physical plane
Logical Plan will be look like this :-
The flow of this chart is bottom to top so that the
Load operator is at the very bottom. The lines between operators show the flow
Actual Logical plan for above program will be like :-
Physical Plan :-
- Physical plan is basically a series of map reduce jobs
- This plan describes the physical operators Pig will use to execute the script, without reference to how they will be executed in MapReduce
Physical plan will be look like :-
The logical operator co-group would be converted to 3 physical operators the Local Rearrange, Global Rearrange and Package as shown below:
The Local Rearrange takes the input tuple and outputs a key, value pair with the group field as the key and the tuple as the value
The Global Rearrange converts the key-value pairs of keys belonging to a partition into a set of (key, list of values). The partition is decided by which reducer the Global Rearrange is catering to. This need not be implemented by us as this is the intermediate step that happens between mapper and reducer.
The Package just takes each key, list of values and puts it in appropriate format as required by the co-group.
Both plans look similar but :-
As explained earlier logical plan is not for processing data , and if we closely look the physical and logical plan we will get the difference
In physical plan :- Load function and Store function get resolved which was not in Logical plan
In logical plan :-
A: (Name: LOLoad Schema: eid#29:bytearray,name#30:bytearray,city#31:bytearray,country#32:bytearray,salary#33:bytearray)RequiredFields:null
In Physical Plan :-
A: Load(/emp:PigStorage(‘ ‘))
Hope this article help you to understand the concept of logical and physical plan for Pig Program .
There is one plan left which is Map-Reduce plan , which we i will explain later.
Hope you guys like it .
Please comment or like if find useful .