Map Reduce Interview Questions and Answers
- What is Map Reduce?
Map Reduce is a java based programming paradigm of the Hadoop framework that
provides scalability across various Hadoop clusters
- How Map Reduce works in Hadoop?
MapReduce distributes the workload into two different jobs namely
1. Map job and 2. Reduce job that can run in parallel.
The Map job breaks down the data sets into key-value pairs or tuples.
The Reduce job then takes the output of the map job and combines the data tuples
into smaller set of tuples.
- What is ‘Key value pair’ in Map Reduce?
Key value pair is the intermediate data generated by maps and sent to reduces for
generating the final output.
What is the difference between MapReduce engine and HDFS cluster?
HDFS cluster is the name given to the whole configuration of master and slaves
where data is stored. Map Reduce Engine is the programming module which is used
to retrieve and analyze data.
Is map like a pointer?
No, Map is not like a pointer.
Why are the number of splits equal to the number of maps?
The number of maps is equal to the number of input splits because we want the key
and value pairs of all the input splits.
Is a job split into maps?
No, a job is not split into maps. Spilt is created for the file. The file is placed on
datanodes in blocks. For each split, a map is needed.
How can you set an arbitrary number of mappers to be created for a job inHadoop?
This is a trick question. You cannot set it
How can you set an arbitary number of reducers to be created for a job in Hadoop?
You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting
How will you write a custom partitioner for a Hadoop job?
The following steps are needed to write a custom partitioner.
– Create a new class that extends Partitioner class
– Override method getPartition
– In the wrapper that runs the Map Reducer, either
– add the custom partitioner to the job programtically using method
– add the custom partitioner to the job as a config file (if your wrapper reads from
config file or oozie)