Hello Harshvardhan,

Hope you are doing well.

map(inKey, inValue) -> list(intermediateKey, intermediateValue)

The purpose of the map phase is to organize the data in preparation for the processing done in the reduce phase. The input to the map function is in the form of key-value pairs, even though the input to a MapReduce program is a file or file(s). By default, the value is a data record and the key is generally the offset of the data record from the beginning of the data file.

The output consists of a collection of key-value pairs which are input for the reduce function. The content of the key-value pairs depends on the specific implementation.

For example, a common initial program implemented in MapReduce is to count words in a file. [1] The input to the mapper is each line of the file, while the output from each mapper is a set of key-value pairs where one word is the key and the number 1 is the value.

To optimize the processing capacity of the map phase, MapReduce can run several identical mappers in parallel.[3] Since every mapper is the same, they produce the same result as running one map function.


reduce(intermediateKey, list(intermediateValue)) -> list(outKey, outValue)

Each reduce function processes the intermediate values for a particular key generated by the map function and generates the output.[1] Essentially there exists a one-one mapping between keys and reducers. Several reducers can run in parallel, since they are independent of one another. The number of reducers is decided by the user. By default, the number of reducers is 1.