Dear Jagadish,

Hope you are doing well.

We are extremely sorry for delayed response, please accept our sincere apologies.

Answer for Module 5 question 2:

Code:

case class emp(id:Int,Pannum:String)
case class emp_detail(id:Int,salary:Int)

 val a = sc.textFile("file:///home/edureka/Desktop/Mydoc2").map(_.split(",")).map(x=>emp(x(0).toInt,x(1))).toDF

val b = sc.textFile("file:///home/edureka/Desktop/Mydoc").map(_.split(",")).map(x=>emp_detail(x(0).toInt,x(1).toInt)).toDF
  
a.registerTempTable("emp")
b.registerTempTable("emp_detail")
val c= sqlContext.sql("SELECT * FROM emp C JOIN emp_detail I ON C.id= I.id").toDF
c.show
c.registerTempTable("newTable")
sqlContext.sql("SELECT * from newTable").foreach(println)
sqlContext.sql("SELECT * from newTable where pannum='null'").foreach(println)


Dataset: (We have to modify the dataset i.e. we have to mention null otherwise it will give error   org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1

due to the space )

1,AHFVH4560S
2,null
3,AHFVD4560A
4,JHDVH4560S
5,KHSVH8460S
6,HGFVH4520S
7,AGFVH3060S
8,AHGHH4542S
9,SKDVH2960S
10,SLTVH4028S


Screen shot:


Regarding other two question I am working on it.

We will revert back with answer at the earliest.


Please note if you are not happy with the response on this ticket, please escalate it to escalations@edureka.in.
We assure you that we will get back to you within 24 hours