Hello Sunil,


Hope you are doing well.


Map in hadoop:

public class LineLengthMapper

    extends Mapper<LongWritable,Text,IntWritable,IntWritable> {

  @Override

  protected void map(LongWritable lineNumber, Text line, Context context)

      throws IOException, InterruptedException {

    context.write(new IntWritable(line.getLength()), new IntWritable(1));

  }

}


Map in spark:

lines.map(line => (line.length, 1))

=======================================================================


Reduce in hadoop:


public class LineLengthReducer

    extends Reducer<IntWritable,IntWritable,IntWritable,IntWritable> {

  @Override

  protected void reduce(IntWritable length, Iterable<IntWritable> counts, Context context)

      throws IOException, InterruptedException {

    int sum = 0;

    for (IntWritable count : counts) {

      sum += count.get();

    }

    context.write(length, new IntWritable(sum));

  }

}


Reduce in spark:

val lengthCounts = lines.map(line => (line.length, 1)).reduceByKey(_ + _)



I hope this will help you.