Data Mining Interview Questions
- What is data mining?
Ans. It refers to extraction or “mining” of knowledge from the huge amount of data.
Data mining is a process of discovering useful knowledge from large amounts of data stored either, in database, data warehouse, or other information repositories.
Alternatively, it is the process of analyzing data from different perspectives and summarizing it into useful information – information that can be used to increase revenue, cuts costs, or both.
- What is KDD?
Ans. KDD-Knowledge Discovery in Databases
- Enlist the some of the applications of data mining
- Share Market analysis
- Market basket analysis
- Trend analysis
- Weather Forecasting Analysis
- Banking Industry
- Supermarkets such as Big Bazar, D-Mart, N-mart, Reliance Mall
- Biological data analysis
- Call record analysis
- What is data warehouse?
Ans. A data warehouse is an electronic storage of an Organization’s historical data for reporting, analysis and data mining or knowledge discovery.
- Enlist the various Data mining techniques?
- Classification Analysis
- Association Rule Learning
- Anomaly or Outlier Detection
- Clustering Analysis
- Regression Analysis
- Sequential Patterns
- Decision trees
- What are the various steps used in data mining?
- Business understanding
- Data understanding
- Data Pre-processing
- Data preparation
- Steps involved in the data mining KDD Process.
- Data Cleaning
- Data Integration
- Data Selection
- Data Transformation
- Data Mining
- Pattern Evaluation
- Knowledge Presentation
- Why do we pre-process the data?
Ans. To ensure the data quality in terms of accuracy, completeness, consistency, timeliness, believability, interpret-ability.
- What are the steps involved in data pre-processing?
Ans. Data cleaning, data integration, data reduction, data transformation.
- What is metadata in data mining?
Ans. metadata is simply defined as data about data.
- Define Predictive model.
Ans. It is used to predict the values of data by making use of known results from a
different set of sample data.
- Define descriptive model
Ans. It is used to determine the patterns and relationships in a sample data.
- Data mining tasks that are belongs to predictive model
- Time series analysis
- Data mining tasks that belongs to descriptive model
- Association rules
- Sequence discovery
- Define clustering
Ans. Clustering is a process of grouping the physical or conceptual data object into clusters.
- Define cluster analysis
Ans. Cluster analyses data objects without consulting a known class label. The class labels are not present in the training data simply because they are not known to begin with.
- What is CURE?
Ans. Clustering Using Representatives is called as CURE. The clustering algorithms generally work on spherical and similar size clusters. CURE overcomes the problem of spherical and similar size cluster and is more robust with respect to outliers.
- What is OLAP?
Ans. On-Line Analytic Processing (OLAP) refers to technology that allows users of multidimensional databases to generate on-line descriptive or comparative summaries (“views”) of data and other analytic queries.
- What is OLTP?
Ans. If an on-line operational database system is used for efficient retrieval, efficient storage and management of large amounts of data, then the system is said to be on-line transaction processing.
- How a database design is represented in OLTP systems?
Ans. Entity-relation model