Hi,
in this blog i will create a program to find out maximum salary of employee according to city
Input :-
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
Our program is like :-
import org.apache.spark._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql._
object salmax extends App {
System.setProperty(“hadoop.home.dir”, “c://winutil//”)
val conf=new SparkConf().setMaster(“local[2]”).setAppName(“testfilter”)
val sc = new SparkContext(conf)
val rdd2=sc.textFile(“file:\\D://sparkprog//inputdata/maxsalary”).map{ x => x.split(” “) }
.map{ x =>((x(2)),(x(1),x(4).toDouble))}.groupByKey
for(i<-rdd2)
{
println(i._1,i._2.filter(x=>x._2==i._2.map(x=>x._2).max))
}
our output will be like :-
(Delhi,List((Deepak,34000.0)))
(Patna,List((Ravi,98777.0)))
(Punjab,List((Avinash,120000.0)))
(Agra,List((Fahed,45000.0)))
hope you guys understand the program , any doubt pls comment , like and share
thanks.
CHeers
}