Map Reduce Interview Questions

Map Reduce Interview Questions and Answers

  1. What is Map Reduce?

Map Reduce is a java based programming paradigm of the Hadoop framework that

provides scalability across various Hadoop clusters

  1. How Map Reduce works in Hadoop?

MapReduce distributes the workload into two different jobs namely

1. Map job and 2. Reduce job that can run in parallel.

The Map job breaks down the data sets into key-value pairs or tuples.

The Reduce job then takes the output of the map job and combines the data tuples

into smaller set of tuples.

  1. What is ‘Key value pair’ in Map Reduce?

Key value pair is the intermediate data generated by maps and sent to reduces for

generating the final output.

  1. What is the difference between MapReduce engine and HDFS cluster?

HDFS cluster is the name given to the whole configuration of master and slaves

where data is stored. Map Reduce Engine is the programming module which is used

to retrieve and analyze data.

  1. Is map like a pointer?

No, Map is not like a pointer.

  1. Why are the number of splits equal to the number of maps?

The number of maps is equal to the number of input splits because we want the key

and value pairs of all the input splits.

  1. Is a job split into maps?

No, a job is not split into maps. Spilt is created for the file. The file is placed on

datanodes in blocks. For each split, a map is needed.

  1. How can you set an arbitrary number of mappers to be created for a job inHadoop?

This is a trick question. You cannot set it

  1. How can you set an arbitary number of reducers to be created for a job in Hadoop?

You can either do it progamatically by using method setNumReduceTasksin the

JobConfclass or set it up as a configuration setting

  1. How will you write a custom partitioner for a Hadoop job?

The following steps are needed to write a custom partitioner.

– Create a new class that extends Partitioner class

– Override method getPartition

– In the wrapper that runs the Map Reducer, either

– add the custom partitioner to the job programtically using method

setPartitionerClass or

– add the custom partitioner to the job as a config file (if your wrapper reads from

config file or oozie)

Leave a Reply

Your email address will not be published. Required fields are marked *