Hello Sunil,
Hope you are doing well.
Map in hadoop:
public class LineLengthMapper
extends Mapper<LongWritable,Text,IntWritable,IntWritable> {
@Override
protected void map(LongWritable lineNumber, Text line, Context context)
throws IOException, InterruptedException {
context.write(new IntWritable(line.getLength()), new IntWritable(1));
}
}
Map in spark:
lines.map(line => (line.length, 1))
=======================================================================
Reduce in hadoop:
public class LineLengthReducer
extends Reducer<IntWritable,IntWritable,IntWritable,IntWritable> {
@Override
protected void reduce(IntWritable length, Iterable<IntWritable> counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : counts) {
sum += count.get();
}
context.write(length, new IntWritable(sum));
}
}
Reduce in spark:
val lengthCounts = lines.map(line => (line.length, 1)).reduceByKey(_ + _)
I hope this will help you.