0% found this document useful (0 votes)

94 views

Data Mining: Discovering Hidden Value in Your Data Warehouse

Data mining is the process of discovering hidden patterns and predictive information from large databases. It allows companies to identify unknown relationships in their data to predict future trends and customer behavior, helping to make better business decisions. Key technologies now enabling effective data mining are massive data collection capabilities, powerful computers, and advanced algorithms. Common techniques used in data mining include neural networks, decision trees, genetic algorithms, and nearest neighbor methods. These techniques can automatically discover patterns and predict future outcomes and trends based on historical data.

Uploaded by

jeebala

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views

Data Mining: Discovering Hidden Value in Your Data Warehouse

Uploaded by

jeebala

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Mining

Discovering hidden value in your data warehouse

Overview

Data mining, the extraction of hidden predictive information from large databases, is a powerful
new technology with great potential to help companies focus on the most important information
in their data warehouses. Data mining tools predict future trends and behaviors, allowing
businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses
offered by data mining move beyond the analyses of past events provided by retrospective tools
typical of decision support systems. Data mining tools can answer business questions that
traditionally were too time consuming to resolve. They scour databases for hidden patterns,
finding predictive information that experts may miss because it lies outside their expectations.

The Foundations of Data Mining

Data mining techniques are the result of a long process of research and product development.
This evolution began when business data was first stored on computers, continued with
improvements in data access, and more recently, generated technologies that allow users to
navigate through their data in real time. Data mining takes this evolutionary process beyond
retrospective data access and navigation to prospective and proactive information delivery. Data
mining is ready for application in the business community because it is supported by three
technologies that are now sufficiently mature:

 Massive data collection

 Powerful multiprocessor computers
 Data mining algorithms

Commercial databases are growing at unprecedented rates. A recent META Group survey of data
warehouse projects found that 19% of respondents are beyond the 50 gigabyte level, while 59%
expect to be there by second quarter of 1996.1 In some industries, such as retail, these numbers
can be much larger. The accompanying need for improved computational engines can now be
met in a cost-effective manner with parallel multiprocessor computer technology. Data mining
algorithms embody techniques that have existed for at least 10 years, but have only recently been
implemented as mature, reliable, understandable tools that consistently outperform older
statistical methods.

In the evolution from business data to business information, each new step has built upon the
previous one. For example, dynamic data access is critical for drill-through in data navigation
applications, and the ability to store large databases is critical to data mining. From the user’s
point of view, the four steps listed in Table 1 were revolutionary because they allowed new
business questions to be answered accurately and quickly.
Evolutionary Business Question Enabling Product Characteristics
Step Technologies Providers

Data Collection "What was my total Computers, tapes, IBM, CDC Retrospective,
revenue in the last five disks static data
(1960s) years?" delivery

Data Access "What were unit sales Relational databases Oracle, Retrospective,
in New England last (RDBMS), Sybase, dynamic data
(1980s) March?" Structured Query Informix, delivery at
Language (SQL), IBM, record level
ODBC Microsoft

Data "What were unit sales On-line analytic Pilot, Retrospective,

Warehousing & in New England last processing (OLAP), Comshare, dynamic data
March? Drill down to multidimensional Arbor, delivery at
Decision Boston." databases, data Cognos, multiple levels
Support warehouses Microstrategy

(1990s)

Data Mining "What’s likely to Advanced Pilot, Prospective,

happen to Boston unit algorithms, Lockheed, proactive
(Emerging sales next month? multiprocessor IBM, SGI, information
Today) Why?" computers, massive numerous delivery
databases startups
(nascent
industry)

Table 1. Steps in the Evolution of Data Mining.

The core components of data mining technology have been under development for decades, in
research areas such as statistics, artificial intelligence, and machine learning. Today, the maturity
of these techniques, coupled with high-performance relational database engines and broad data
integration efforts, make these technologies practical for current data warehouse environments.

The Scope of Data Mining

Data mining derives its name from the similarities between searching for valuable business
information in a large database — for example, finding linked products in gigabytes of store
scanner data — and mining a mountain for a vein of valuable ore. Both processes require either
sifting through an immense amount of material, or intelligently probing it to find exactly where
the value resides. Given databases of sufficient size and quality, data mining technology can
generate new business opportunities by providing these capabilities:

 Automated prediction of trends and behaviors. Data mining automates the process of
finding predictive information in large databases. Questions that traditionally required
extensive hands-on analysis can now be answered directly from the data — quickly. A
typical example of a predictive problem is targeted marketing. Data mining uses data on
past promotional mailings to identify the targets most likely to maximize return on
investment in future mailings. Other predictive problems include forecasting bankruptcy
and other forms of default, and identifying segments of a population likely to respond
similarly to given events.

 Automated discovery of previously unknown patterns. Data mining tools sweep

through databases and identify previously hidden patterns in one step. An example of
pattern discovery is the analysis of retail sales data to identify seemingly unrelated
products that are often purchased together. Other pattern discovery problems include
detecting fraudulent credit card transactions and identifying anomalous data that could
represent data entry keying errors.

Data mining techniques can yield the benefits of automation on existing software and hardware
platforms, and can be implemented on new systems as existing platforms are upgraded and new
products developed. When data mining tools are implemented on high performance parallel
processing systems, they can analyze massive databases in minutes. Faster processing means that
users can automatically experiment with more models to understand complex data. High speed
makes it practical for users to analyze huge quantities of data. Larger databases, in turn, yield
improved predictions.

The most commonly used techniques in data mining are:

 Artificial neural networks: Non-linear predictive models that learn through training and
resemble biological neural networks in structure.

 Decision trees: Tree-shaped structures that represent sets of decisions. These decisions
generate rules for the classification of a dataset. Specific decision tree methods include
Classification and Regression Trees (CART) and Chi Square Automatic Interaction
Detection (CHAID) .

 Genetic algorithms: Optimization techniques that use processes such as genetic

combination, mutation, and natural selection in a design based on the concepts of
evolution.

 Nearest neighbor method: A technique that classifies each record in a dataset based on a
combination of the classes of the k record(s) most similar to it in a historical dataset
(where k ³ 1). Sometimes called the k-nearest neighbor technique.

 Rule induction: The extraction of useful if-then rules from data based on statistical
significance.

Glossary of Data Mining Terms

analytical model A structure and process for analyzing a dataset. For example, a decision tree
is a model for the classification of a dataset.

anomalous data Data that result from errors (for example, data entry keying errors) or that
represent unusual events. Anomalous data should be examined carefully
because it may carry important information.
artificial neural Non-linear predictive models that learn through training and resemble
networks biological neural networks in structure.

CART Classification and Regression Trees. A decision tree technique used for
classification of a dataset. Provides a set of rules that you can apply to a new
(unclassified) dataset to predict which records will have a given outcome.
Segments a dataset by creating 2-way splits. Requires less data preparation
than CHAID.

CHAID Chi Square Automatic Interaction Detection. A decision tree technique used
for classification of a dataset. Provides a set of rules that you can apply to a
new (unclassified) dataset to predict which records will have a given
outcome. Segments a dataset by using chi square tests to create multi-way
splits. Preceded, and requires more data preparation than, CART.

classification The process of dividing a dataset into mutually exclusive groups such that
the members of each group are as "close" as possible to one another, and
different groups are as "far" as possible from one another, where distance is
measured with respect to specific variable(s) you are trying to predict. For
example, a typical classification problem is to divide a database of
companies into groups that are as homogeneous as possible with respect to a
creditworthiness variable with values "Good" and "Bad."

clustering The process of dividing a dataset into mutually exclusive groups such that
the members of each group are as "close" as possible to one another, and
different groups are as "far" as possible from one another, where distance is
measured with respect to all available variables.

data cleansing The process of ensuring that all values in a dataset are consistent and
correctly recorded.

data mining The extraction of hidden predictive information from large databases.

data navigation The process of viewing different dimensions, slices, and levels of detail of a
multidimensional database. See OLAP.

data The visual interpretation of complex relationships in multidimensional data.

visualization

data warehouse A system for storing and delivering massive quantities of data.

decision tree A tree-shaped structure that represents a set of decisions. These decisions
generate rules for the classification of a dataset. See CART and CHAID.

dimension In a flat or relational database, each field in a record represents a dimension.

In a multidimensional database, a dimension is a set of similar entities; for
example, a multidimensional sales database might include the dimensions
Product, Time, and City.
exploratory data The use of graphical and descriptive statistical techniques to learn about the
analysis structure of a dataset.

genetic Optimization techniques that use processes such as genetic combination,

algorithms mutation, and natural selection in a design based on the concepts of natural
evolution.

linear model An analytical model that assumes linear relationships in the coefficients of
the variables being studied.

linear regression A statistical technique used to find the best-fitting linear relationship
between a target (dependent) variable and its predictors (independent
variables).

logistic A linear regression that predicts the proportions of a categorical target

regression variable, such as type of customer, in a population.

multidimensional A database designed for on-line analytical processing. Structured as a

database multidimensional hypercube with one axis per dimension.

multiprocessor A computer that includes multiple processors connected by a network. See

computer parallel processing.

nearest neighbor A technique that classifies each record in a dataset based on a combination
of the classes of the k record(s) most similar to it in a historical dataset
(where k ³ 1). Sometimes called a k-nearest neighbor technique.

non-linear model An analytical model that does not assume linear relationships in the
coefficients of the variables being studied.

OLAP On-line analytical processing. Refers to array-oriented database applications

that allow users to view, navigate through, manipulate, and analyze
multidimensional databases.

outlier A data item whose value falls outside the bounds enclosing most of the other
corresponding values in the sample. May indicate anomalous data. Should
be examined carefully; may carry important information.

parallel The coordinated use of multiple processors to perform computational tasks.

processing Parallel processing can occur on a multiprocessor computer or on a network
of workstations or PCs.

predictive model A structure and process for predicting the values of specified variables in a
dataset.

prospective data Data analysis that predicts future trends, behaviors, or events based on
analysis historical data.

RAID Redundant Array of Inexpensive Disks. A technology for the efficient

parallel storage of data for high-performance computer systems.

retrospective Data analysis that provides insights into trends, behaviors, or events that
data analysis have already occurred.

rule induction The extraction of useful if-then rules from data based on statistical
significance.

SMP Symmetric multiprocessor. A type of multiprocessor computer in which

memory is shared among the processors.

terabyte One trillion bytes.

time series The analysis of a sequence of measurements made at specified time

analysis intervals. Time is usually the dominating dimension of the data.

The ONE Invisible Code - An Uncommon Formula To Breakthrough Mediocrity and Rise To The Next Level (Sharat Sharma)
No ratings yet
The ONE Invisible Code - An Uncommon Formula To Breakthrough Mediocrity and Rise To The Next Level (Sharat Sharma)
139 pages
Calculation of Pipe Bend Trust Force
100% (1)
Calculation of Pipe Bend Trust Force
4 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
11 pages
Introduction-DM2
No ratings yet
Introduction-DM2
13 pages
Data Mining 1
No ratings yet
Data Mining 1
10 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
16 pages
An Introduction To Data Mining: Discovering Hidden Value in Your Data Warehouse
No ratings yet
An Introduction To Data Mining: Discovering Hidden Value in Your Data Warehouse
18 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
12 pages
Data Mining
100% (3)
Data Mining
18 pages
A Survey On Data Mining
No ratings yet
A Survey On Data Mining
4 pages
Data Mining, Cobol, Memory
No ratings yet
Data Mining, Cobol, Memory
54 pages
An Introduction To Data Mining: Information System Management Assignment
No ratings yet
An Introduction To Data Mining: Information System Management Assignment
18 pages
data_mining_2
No ratings yet
data_mining_2
59 pages
Weka Tutorial
No ratings yet
Weka Tutorial
53 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
24 pages
Es 2646574663
No ratings yet
Es 2646574663
7 pages
Web Data Mining: A Case Study: Samia Jones
No ratings yet
Web Data Mining: A Case Study: Samia Jones
6 pages
Data Mining: by Doug Alexander
No ratings yet
Data Mining: by Doug Alexander
6 pages
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
No ratings yet
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
14 pages
Data Mining
No ratings yet
Data Mining
4 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
46 pages
DM-Unit_1
No ratings yet
DM-Unit_1
13 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
12 pages
Data Mining1
No ratings yet
Data Mining1
16 pages
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
No ratings yet
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
6 pages
Data Mining
No ratings yet
Data Mining
130 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
Data Mining
No ratings yet
Data Mining
14 pages
Sayan Ghosh 26900123054 Cse Data Mining 6th Sem
No ratings yet
Sayan Ghosh 26900123054 Cse Data Mining 6th Sem
11 pages
Motivation of Data Mining
No ratings yet
Motivation of Data Mining
4 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
41 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
Data Mining
No ratings yet
Data Mining
25 pages
Chapter 1 - What is Data Mining
No ratings yet
Chapter 1 - What is Data Mining
8 pages
358 44 Datamining and Warehousing 4.4
No ratings yet
358 44 Datamining and Warehousing 4.4
155 pages
Data Mining
No ratings yet
Data Mining
7 pages
Data Mining
No ratings yet
Data Mining
7 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
No ratings yet
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
SAYAN_GHOSH_26900123054_CSE_DATA_MINING_6TH_SEM
No ratings yet
SAYAN_GHOSH_26900123054_CSE_DATA_MINING_6TH_SEM
11 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Datamining With Big Data - Siva
No ratings yet
Datamining With Big Data - Siva
69 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
No ratings yet
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
9 pages
Data Mining - GDi Techno Solutions
No ratings yet
Data Mining - GDi Techno Solutions
145 pages
1 DM intro
No ratings yet
1 DM intro
34 pages
Abstract 8 TH
No ratings yet
Abstract 8 TH
1 page
Data Warehouse and Data Mining - Unit 2
No ratings yet
Data Warehouse and Data Mining - Unit 2
24 pages
Data Mining Seminar
100% (2)
Data Mining Seminar
21 pages
Department of Information Technology: Data Warehousing and Data Mining IT4204 3
No ratings yet
Department of Information Technology: Data Warehousing and Data Mining IT4204 3
60 pages
Module 3
No ratings yet
Module 3
187 pages
Data Rich, Information Poor
No ratings yet
Data Rich, Information Poor
5 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
(eBook PDF) Data Mining Concepts and Techniques 3rdpdf download
No ratings yet
(eBook PDF) Data Mining Concepts and Techniques 3rdpdf download
42 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
Chapter-1 (Introduction)
No ratings yet
Chapter-1 (Introduction)
17 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
What Every Manager Should Know About Big Data and Data Science
From Everand
What Every Manager Should Know About Big Data and Data Science
Lars Nielsen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Big Data: the Revolution That Is Transforming Our Work, Market and World
From Everand
Big Data: the Revolution That Is Transforming Our Work, Market and World
PAT NAKAMOTO
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Strategic Human Resource Management-2marks
No ratings yet
Strategic Human Resource Management-2marks
47 pages
IBM
No ratings yet
IBM
101 pages
EDP_ppt_unit 4
No ratings yet
EDP_ppt_unit 4
22 pages
EDP_ppt_unit5
No ratings yet
EDP_ppt_unit5
12 pages
STRATEGIC HUMAN RESOURCE MANAGEMENT-2marks
No ratings yet
STRATEGIC HUMAN RESOURCE MANAGEMENT-2marks
14 pages
Lecture notes-IR Final
No ratings yet
Lecture notes-IR Final
62 pages
Unit-Ii Advertisement Management
No ratings yet
Unit-Ii Advertisement Management
10 pages
Marketing Plan Advertising Plan Media Plan
No ratings yet
Marketing Plan Advertising Plan Media Plan
20 pages
E Business Notes
100% (1)
E Business Notes
44 pages
Entrepreneurship Development: Unit - 1: Entrepreneurship Concept-Entrepreneurship As A Career - Entrepreneur-Personality
No ratings yet
Entrepreneurship Development: Unit - 1: Entrepreneurship Concept-Entrepreneurship As A Career - Entrepreneur-Personality
118 pages
Message Strategies and Executions
No ratings yet
Message Strategies and Executions
6 pages
What Is E-Business and What Is E-Commerce?
No ratings yet
What Is E-Business and What Is E-Commerce?
1 page
Presentation Prepared By:: M.Balaji
No ratings yet
Presentation Prepared By:: M.Balaji
18 pages
Principles of Management Mg2351 Anna University Question Bank
No ratings yet
Principles of Management Mg2351 Anna University Question Bank
4 pages
10 Most Unethical Business Practices in Big Business
No ratings yet
10 Most Unethical Business Practices in Big Business
5 pages
Circuit Theory
No ratings yet
Circuit Theory
58 pages
BUDGET 2015-16: Major Highlights in Industry Wise
No ratings yet
BUDGET 2015-16: Major Highlights in Industry Wise
13 pages
IMC Unit 1
No ratings yet
IMC Unit 1
12 pages
Vysya College, Salem - 103. Class: Iii Bba Unit - I Subject: Human Resource Management Date: Unit - I Human Resource
No ratings yet
Vysya College, Salem - 103. Class: Iii Bba Unit - I Subject: Human Resource Management Date: Unit - I Human Resource
46 pages
CRM
100% (1)
CRM
41 pages
Decision Making
100% (1)
Decision Making
17 pages
5 Techno-economic-study-of-a-100-MW-class-multi-energy-vehicl - 2023 - Energy-Conv
No ratings yet
5 Techno-economic-study-of-a-100-MW-class-multi-energy-vehicl - 2023 - Energy-Conv
21 pages
Black Elegant and Modern Startup Pitch Deck Presentation (1)
No ratings yet
Black Elegant and Modern Startup Pitch Deck Presentation (1)
16 pages
9 Leadership Lessons From The Indian Cricket Team
No ratings yet
9 Leadership Lessons From The Indian Cricket Team
11 pages
SE210
0% (1)
SE210
2 pages
Common Bread Develop and Update Industry Knowledge
No ratings yet
Common Bread Develop and Update Industry Knowledge
18 pages
Log 5
No ratings yet
Log 5
14 pages
Driver and Maintenance Operations TD Steering System
No ratings yet
Driver and Maintenance Operations TD Steering System
52 pages
EEE 241 - Lecture 21 & 22
No ratings yet
EEE 241 - Lecture 21 & 22
26 pages
Cisco Iot
No ratings yet
Cisco Iot
66 pages
GRAPHALLOY® Pump DS2600
No ratings yet
GRAPHALLOY® Pump DS2600
24 pages
50 Copies TQ Emath 122 Pre Lim Exam Balansag 1
No ratings yet
50 Copies TQ Emath 122 Pre Lim Exam Balansag 1
3 pages
Tk4 Series: High Accuracy PID Temperature Controller
No ratings yet
Tk4 Series: High Accuracy PID Temperature Controller
2 pages
bk5000 - General Brochure
No ratings yet
bk5000 - General Brochure
4 pages
Now Let's Move On To The 1 Activity: Word Choice
No ratings yet
Now Let's Move On To The 1 Activity: Word Choice
3 pages
Liquid in Glass Thermometer
No ratings yet
Liquid in Glass Thermometer
21 pages
2D Drawing Tail Stock
No ratings yet
2D Drawing Tail Stock
1 page
Auditor Independence: Malaysian Accountants' Perceptions
No ratings yet
Auditor Independence: Malaysian Accountants' Perceptions
13 pages
GANs in Slanted Land - Solution
No ratings yet
GANs in Slanted Land - Solution
17 pages
Praseed Sarkar fs-20 Feg-326 Assignment
No ratings yet
Praseed Sarkar fs-20 Feg-326 Assignment
12 pages
Catalog Conveyor Rollers Full
No ratings yet
Catalog Conveyor Rollers Full
128 pages
Radio Description 4419
No ratings yet
Radio Description 4419
29 pages
MD1-0-T-030-05-00073 - ITP For Under Ground Piping PDF
No ratings yet
MD1-0-T-030-05-00073 - ITP For Under Ground Piping PDF
8 pages
Chinese
No ratings yet
Chinese
6 pages
Col 1
No ratings yet
Col 1
8 pages
AC30HH_SManual
No ratings yet
AC30HH_SManual
8 pages
Vif Procedure
No ratings yet
Vif Procedure
4 pages
Free Download Here: Discrete Mathematics Goodaire 3rd Edition PDF
No ratings yet
Free Download Here: Discrete Mathematics Goodaire 3rd Edition PDF
2 pages
Fautographie
No ratings yet
Fautographie
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining: Discovering Hidden Value in Your Data Warehouse

Uploaded by

Data Mining: Discovering Hidden Value in Your Data Warehouse

Uploaded by

Data Mining

Discovering hidden value in your data warehouse

The Foundations of Data Mining

 Massive data collection

Data "What were unit sales On-line analytic Pilot, Retrospective,

Data Mining "What’s likely to Advanced Pilot, Prospective,

Table 1. Steps in the Evolution of Data Mining.

The Scope of Data Mining

 Automated discovery of previously unknown patterns. Data mining tools sweep

The most commonly used techniques in data mining are:

 Genetic algorithms: Optimization techniques that use processes such as genetic

Glossary of Data Mining Terms

data The visual interpretation of complex relationships in multidimensional data.

dimension In a flat or relational database, each field in a record represents a dimension.

genetic Optimization techniques that use processes such as genetic combination,

logistic A linear regression that predicts the proportions of a categorical target

multidimensional A database designed for on-line analytical processing. Structured as a

multiprocessor A computer that includes multiple processors connected by a network. See

OLAP On-line analytical processing. Refers to array-oriented database applications

parallel The coordinated use of multiple processors to perform computational tasks.

RAID Redundant Array of Inexpensive Disks. A technology for the efficient

SMP Symmetric multiprocessor. A type of multiprocessor computer in which

terabyte One trillion bytes.

time series The analysis of a sequence of measurements made at specified time

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.