In the context of computer science, “Data Mining” can be referred to as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging. Data Mining also known as Knowledge Discovery in Databases, refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data stored in databases.
The need of data mining is to extract useful information from large datasets and use it to make predictions or better decision-making. Nowadays, data mining is used in almost all places where a large amount of data is stored and processed.
For examples: Banking sector, Market Basket Analysis, Network Intrusion Detection.
KDD (Knowledge Discovery in Databases) is a process that involves the extraction of useful, previously unknown, and potentially valuable information from large datasets. The KDD process is an iterative process and it requires multiple iterations of the above steps to extract accurate knowledge from the data.The following steps are included in KDD process:
Data cleaning is defined as removal of noisy and irrelevant data from collection.
Data integration is defined as heterogeneous data from multiple sources combined in a common source(DataWarehouse). Data integration using Data Migration tools, Data Synchronization tools and ETL(Extract-Load-Transformation) process.
Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the data collection. For this we can use Neural network, Decision Trees, Naive bayes, Clustering, and Regression methods.
Data Transformation is defined as the process of transforming data into appropriate form required by mining procedure. Data Transformation is a two step process:
Data mining is defined as techniques that are applied to extract patterns potentially useful. It transforms task relevant data into patterns, and decides purpose of model using classification or characterization.
Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based on given measures. It find interestingness score of each pattern, and uses summarization and Visualization to make data understandable by user.
This involves presenting the results in a way that is meaningful and can be used to make decisions.
Note: KDD is an iterative process where evaluation measures can be enhanced, mining can be refined, new data can be integrated and transformed in order to get different and more appropriate results.Preprocessing of databases consists of Data cleaning and Data Integration.
Parameter
KDD
Data Mining
KDD refers to a process of identifying valid, novel, potentially useful, and ultimately understandable patterns and relationships in data.
Data Mining refers to a process of extracting useful and valuable information or patterns from large data sets.
To find useful knowledge from data.
To extract useful information from data.
Data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge representation and visualization.
Association rules, classification, clustering, regression, decision trees, neural networks, and dimensionality reduction.
Structured information, such as rules and models, that can be used to make decisions or predictions.
Patterns, associations, or insights that can be used to improve decision-making or understanding.
Focus is on the discovery of useful knowledge, rather than simply finding patterns in data.
Data mining focus is on the discovery of patterns or relationships in data.
Role of domain expertise
Domain expertise is important in KDD, as it helps in defining the goals of the process, choosing appropriate data, and interpreting the results.
Domain expertise is less critical in data mining, as the algorithms are designed to identify patterns without relying on prior knowledge.