Spark Streaming :- Word Counts in a folder for text files

Hi,

 

In this blog i will write a program for spark streaming which will read textfiles in a folder on regular basis and print the word counts

program as follows

imp :- i use maven in this( u can find mavan dependency in my previous post)
import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark._
//import java.util.stream.StreamSpliterators.ArrayBuffer
import scala.collection.mutable.ArrayBuffer
object teststreaming {

def main(args: Array[String]) {

System.setProperty(“hadoop.home.dir”, “c://winutil//”)
val conf = new SparkConf().setAppName(“Application”).setMaster(“local[2]”)
//val sc = new SparkContext(conf)
val ssc = new StreamingContext(conf,Seconds(30))
val input=ssc.textFileStream(“file:///C://Users//HA848869//Desktop//sparkdata//”)
val lines=input.flatMap(_.split(” “))
val words=lines.map(word=>(word,1))
val counts=words.reduceByKey(_+_)
counts.print()
val arr = new ArrayBuffer[String]();
ssc.start()
ssc.awaitTermination()
}

}

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s