What is BIG DATA?
Big Data represents a huge and complex data that is difficult to capture, store, process, retrieve and analyze with the help of on-hand traditional database management tools.
What are the three major characteristics of Big Data?
According to IBM, the three characteristics of Big Data are:
Volume: Facebook generating 500+ terabytes of data per day.
Velocity: Analyzing 2 million records each day to identify the reason for losses.
Variety: images, audio, video, sensor data, log files, etc.
What is Hadoop?
Hadoop is a framework that allows distributed processing of large data sets across clusters of commodity hardware(computers) using a simple programming model.
What is the basic difference between traditional RDBMS and Hadoop?
Traditional RDBMS is used for transactional systems to store and process the data, whereas Hadoop is used to store and process large amount of data in the distributed file system.
What are the basic components of Hadoop?
HDFS and MapReduce are the basic components of hadoop.
HDFS is used to store large data sets and MapReduce is used to process such large data sets.
What is HDFS?
HDFS stands for Hadoop Distributed File System and it is designed for storing very large files with streaming data access patterns, running clusters on commodity hardware.
What is Map Reduce?
Map Reduce is a java based programming paradigm of Hadoop framework that provides scalability across various Hadoop clusters
How Map Reduce works in Hadoop?
MapReduce distributes the workload into two different jobs namely 1. Map job and 2. Reduce job that can run in parallel.
1.The Map job breaks down the data sets into key-value pairs or tuples.
2.The Reduce job then takes the output of the map job and combines the data tuples into smaller set of tuples.