Dear Suneel,

Hope you are doing well.

We cannot use groupBy and avg in RDD but we can use it in dataframe so you can convert it to data frame and the use groupBy and avg operations.

The dataset which you have sent is showing error so I have taken an example to show groupBy and avg in dataframe.

Dataset:

file:
1,12
1,13
2,18
2,16

file2:
100,2
100,2
200,3
200,4

Code:

val data = sc.textFile("file:///home/edureka/Desktop/file").map(_.split(',')).map(x => (x(0).toInt,x(1).toInt)).toDF

val user = sc.textFile("file:///home/edureka/Desktop/file2").map(_.split(',')).map(x => (x(0).toInt,x(1).toInt)).toDF

val newDf = data.selectExpr("_1 as x1", "_2 as X2")   // (This will rename the coulumn)

val newDf1= = user.selectExpr("_1 as x3", "_2 as X4")  // (This will rename the coulumn)

val joined = newDf.join(newDf1)

val df1= df1=joined.groupBy("X1").avg("X2")


image


image


Please let us know if you have any concern over this.

We are eagerly waiting for your response.


Please note if you are not happy with the response on this ticket, please escalate it to escalations@edureka.in.
We assure you that we will get back to you within 24 hours