Q. As per my understanding mapper key is nothing but byte offset value of each line.
but If I want to change the key like know my data very well and want to keep it other than byte offset.Is it possible?


The input to the mapper depends on what InputFormat is used. The InputFormat is responsible for reading the 
incoming data and shaping it into whatever format the Mapper expects.
The default InputFormat is TextInputFormat, which extends FileInputFormat<LongWritable, Text>.

1. As per my understanding mapper key is nothing but byte offset value of each line.

Yes. But not always. It is true if you are using TextInputFormat(as in your case). 
Keys and values depend on the type of InputFormat you are using and change accordingly.

2.  but If I want to change the key like know my data very well and want to keep it other than byte offset.Is it possible?

Yes you can change.You can write your own custom InputFormat by subclassing FileInputFormat to achieve this.

Ex: 

Let say If you expect <Text, Text> input, you will have to choose an appropiate InputFormat. You can set the InputFormat in Job setup:

job.setInputFormatClass(MyInputFormat.class);

Now, let's say your input data is a bunch of newline-separated records delimited by a comma:

"1,India"
"2,US"

If you want the input key to the mapper to be ("A", "value1"), ("B", "value2") you will have to implement a 
custom InputFormat and RecordReader with the <Text, Text> signature.
Refer below link

In short, add a class which extends FileInputFormat<Text, Text> and a class which extends RecordReader<Text, Text>. Override the 
FileInputFormat#getRecordReader method, and have it return an instance of your custom RecordReader.


Regards,
Abhishek Tiwari
edureka! Support Team
On Thu, 30 Jun at 4:11 PM , Hadoop at Edureka <hadoop@edureka.in> wrote:
Dear Reema,

Hope you are doing well.

The input to the mapper depends on what InputFormat is used. The InputFormat is responsible for reading the 
incoming data and shaping it into whatever format the Mapper expects.
The default InputFormat is TextInputFormat, which extends FileInputFormat<LongWritable, Text>.

1. As per my understanding mapper key is nothing but byte offset value of each line.

Yes. But not always. It is true if you are using TextInputFormat(as in your case). 
Keys and values depend on the type of InputFormat you are using and change accordingly.

2.  but If I want to change the key like know my data very well and want to keep it other than byte offset.Is it possible?

Yes you can change.You can write your own custom InputFormat by subclassing FileInputFormat to achieve this.

Ex: 

Let say If you expect <Text, Text> input, you will have to choose an appropiate InputFormat. You can set the InputFormat in Job setup:

job.setInputFormatClass(MyInputFormat.class);

Now, let's say your input data is a bunch of newline-separated records delimited by a comma:

"1,India"
"2,US"

If you want the input key to the mapper to be ("A", "value1"), ("B", "value2") you will have to implement a 
custom InputFormat and RecordReader with the <Text, Text> signature.
Refer below link

In short, add a class which extends FileInputFormat<Text, Text> and a class which extends RecordReader<Text, Text>. Override the 
FileInputFormat#getRecordReader method, and have it return an instance of your custom RecordReader.

Hope it resolves your query.
If you have any further query,please let us know.

Please share your feedback by choosing either of smiley's.


Please note if you are not happy with the response on this ticket, please escalate it to escalations@edureka.in.
We assure you that we will get back to you within 24 hours 


Regards,
Abhishek Tiwari
edureka! Support Team

218240