Hadoop PIG & Sql Comparison

Hi,

In this blog we are going to compare PIG & SQL for this we use :-

Hope this will find you helpful for creating pig scripts

http://www.w3schools.com/sql/ ( SQL source)

Following is the dataset which we are going to follow (Separated by tab’/t’):-  ( from W3 Schools)

1              AlfredsFutter kiste          Maria Anders     Obere Str. 57     Berlin    12209    Germany

2              Ana Trujillo Emparedados y helados        Ana Trujillo         Avda. de la Constitución 2222     México D.F.        05021                Mexico

3              Antonio Moreno Taquería           Antonio Moreno              Mataderos 2312               México D.F.        05023                Mexico

4              Around the Horn              Thomas Hardy   120 Hanover Sq.               London WA1 1DP             UK

5              Berglunds snabbköp       Christina Berglund           Berguvsvägen 8                Luleå     S-95822 Sweden

SQL Statement:-

SELECT * FROM Customers;

Pig:-

A = load ‘/sqldata’ as (CustomerID:int,CustomerName:chararray,ContactName:chararray,Address:chararray,

City:chararray,PostalCode:chararray,Country:chararray);

dump A ;

SQL :-

SELECT DISTINCT City FROM Customers;

PIG :-

A = load ‘/sqldata’ as (CustomerID:int,CustomerName:chararray,ContactName:chararray,Address:chararray,

City:chararray,PostalCode:chararray,Country:chararray);

res = GROUP A by City ;

result = foreach res generate group ;

dump result ;

Output :-

(Berlin)

(London)

(Luleå)

(México D.F.)

SQL :-

SELECT * FROM Customers
WHERE Country=’ México D.F.’;

Pig :-

A = load ‘/sqldata’ as (CustomerID:int,CustomerName:chararray,ContactName:chararray,Address:chararray,

City:chararray,PostalCode:chararray,Country:chararray);

result = FILTER A by City == ‘México D.F.’;

dump result ;

Output :-

(2,Ana Trujillo Emparedados y helados,Ana Trujillo,Avda. de la Constitución 2222,México D.F.,05021,Mexico)

(3,Antonio Moreno Taquería,Antonio Moreno,Mataderos 2312,México D.F.,05023,Mexico)

SQL :-

SELECT * FROM Customers
WHERE CustomerID=1;

Pig :-

A = load ‘/sqldata’ as (CustomerID:int,CustomerName:chararray,ContactName:chararray,Address:chararray,

City:chararray,PostalCode:chararray,Country:chararray);

result = FILTER A by CustomerID == 1;

dump result ;

OutPut :-

(1,AlfredsFutter kiste,Maria Anders,Obere Str. 57,Berlin,12209,Germany)

SQL :-

SELECT * FROM Customers
WHERE Country=’Germany’
AND City=’Berlin’;

Pig:-

A = load ‘/sqldata’ as (CustomerID:int,CustomerName:chararray,ContactName:chararray,Address:chararray,

City:chararray,PostalCode:chararray,Country:chararray);

result = FILTER A by City == ‘Berlin’ and Country == ‘Germany’;

dump result ;

output:-

(1,AlfredsFutter kiste,Maria Anders,Obere Str. 57,Berlin,12209,Germany)

SQL :-

SELECT * FROM Customers
ORDER BY Country;

PIG :-

A = load ‘/sqldata’ as (CustomerID:int,CustomerName:chararray,ContactName:chararray,Address:chararray,

City:chararray,PostalCode:chararray,Country:chararray);

result = ORDER A BY Country ;

dump result ;

Output :-

(1,AlfredsFutter kiste,Maria Anders,Obere Str. 57,Berlin,12209,Germany)

(3,Antonio Moreno Taquería,Antonio Moreno,Mataderos 2312,México D.F.,05023,Mexico)

(2,Ana Trujillo Emparedados y helados,Ana Trujillo,Avda. de la Constitución 2222,México D.F.,05021,Mexico)

(5,Berglunds snabbköp,Christina Berglund,Berguvsvägen 8,Luleå,S-95822,Sweden)

(4,Around the Horn,Thomas Hardy,120 Hanover Sq.,London,WA1 1DP,UK)

SQL :-

SELECT TOP 2 * FROM Customers;

PIG :-

A = load ‘/sqldata’ as (CustomerID:int,CustomerName:chararray,ContactName:chararray,Address:chararray,

City:chararray,PostalCode:chararray,Country:chararray);

result = LIMIT A 2 ;

 

OutPut :-

(1,AlfredsFutter kiste,Maria Anders,Obere Str. 57,Berlin,12209,Germany)

(2,Ana Trujillo Emparedados y helados,Ana Trujillo,Avda. de la Constitución 2222,México D.F.,05021,Mexico)

Between in sql :-

We can use filter <= , <= for between  in PIG

Alias

SELECT CustomerName AS Customer, ContactName AS [Contact Person]
FROM Customers;

Pig :-

Most important thing is alias :-

A = load ‘/sqldata’ as (CustomerID:int,CustomerName:chararray,ContactName:chararray,Address:chararray,

City:chararray,PostalCode:chararray,Country:chararray);

Another example:-

Its related to webserver logs :-

a = load ‘/mcafe’ USING PigStorage(‘\t’) as (Source_ip:chararray,Destination_ip:chararray,

Domain:chararray,User:chararray,Time:chararray,Url:chararray,Http_Status:int ,Bytes_Send:int,

Bytes_received:int,User_Agent:chararray,cateogry:chararray,Media_Type:chararray,Media_status:chararray,

Malware:chararray ,Risk_cateogry:chararray,TCM_Method:chararray,Final_result:chararray,Proxy_server_name:chararray);

alias1 =FILTER a BY Malware != ‘-‘;

test = GROUP alias1 ALL ;

c = foreach test generate COUNT(alias1.Malware) AS ab ;

gr = Group alias1 BY Malware;

result = foreach gr generate group,(COUNT(alias1.Malware)*100/c.ab);

dump result ;

— dump c ;

Join : –

SQL :-

SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;

Pig :-

Sample data :-

003 Amit Delhi India 12000

004 Anil Delhi India 15000

005 Deepak Delhi India 34000

006 Fahed Agra India 45000

007 Ravi Patna India 98777

008 Avinash Punjab India 120000

009 Saajan Punjab India 54000

001 Harit Delhi India 20000

002 Hardy Agra India 20000

Need to find out maximum salary of particular city with name of employee:-

PIG :-

sdata = LOAD ‘/emp’ using PigStorage (‘ ‘) AS (eno:int,name:chararray,city:chararray,country:chararray,sal:int);

gr = GROUP sdata by city ;

d= FOREACH gr generate MAX(sdata.sal) as a2 ;

main = JOIN sdata by sal,d by a2 ;

result = FOREACH main generate $0,$1,$2,$3,$4;

dump result;

Output :-

(5,Deepak,Delhi,India,34000)

(6,Fahed,Agra,India,45000)

(7,Ravi,Patna,India,98777)

(8,Avinash,Punjab,India,120000)

Thanks for reading .Will update the blog with other pig scripts in comparison to SQL .

Please comment and like .

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s