Unit #5 - Data Warehouse and Data Mining
Unit #5 - Data Warehouse and Data Mining
&
Storage Structure
M. S. Memon CSE
Dept. QUEST
May 20, 2023 Nawabshah 1
Data Warehouse
and
Data Mining
Prof. Dr. M. S. Memon
sulleman@quest.edu.pk
03337037187
May 20, 2023 2
Data Mart
• A data mart is a special purpose subset of
enterprise data for a particular function or
application (It may contain detail or summary data
or both).
• Data Mart types:
– Independent—created directly from operational systems
to a separate physical data store
– Logical—exists as a subset of existing data warehouse.
– Dependent—created from data warehouse to a separate
physical data store
Data Marts
Operational
Systems
Independent
Data
Mart Dependent
Data
Data Mart
Warehouse
Time
Multi-dimensional Data
• Measures - numerical data being tracked
• Dimensions - business parameters that define a
transaction
• Example: Analyst may want to view sales data
(measure) by geography, by time, and by product
(dimensions)
• Dimensional modeling is a technique for
structuring data around the business concepts
• ER models describe “entities” and “relationships”
• Dimensional models describe “measures” and
“dimensions”
Multi-dimensional Model
“Sales by product line over the past six
months” “Sales by store between 1990 and
1995”
Store Info Key columns joining fact table
to dimension tables Numerical Measures
• Web-based Architecture
– Advantages:
• Usage of existing software, reduction of costs, platform independence
– Disadvantages:
• Security issues: data encryption/user access and identification
DistributedDW
• In most cases the economics and technology
greatly favor a single centralized DW
• But in some cases, distributed DW make sense
• Types of distributed DW
– Geographically distributed
• Local DW/global DW
– Technologically distributed DW
• Logically one DW, physically more DW
– Independently evolving distributed DW
• Uncontrolled growth
DistributedDW
• Geographically distributed
– In the case of corporations spread
around the world
• Information is needed both locally and
globally
– A distributed DW makes sense
• When much processing occurs at the
local level
• Even though local branches report to the
same balance sheet, the local
organizations are their own companies
DistributedDW
DistributedDW
• Technologically distributed DW
– Placing the DW on the distributed technology of a vendor
– Advantages
• The entry cost is cheap – large centralized hardware is expensive
• No theoretical limit to how much data can be placed in the DW – one
can add new servers to the network
– As the DW starts to expand network data communication
starts playing an important role
• Example: Let’s simplify and consider one has 4 nodes holding each
data regarding the last 4 years
• Now let’s consider one has a query which needs to access the data
from the last 4 years: such a query arises the issue of transporting large
amount of data between processors
DistributedDW
• Independently evolving distributed DW
– In practice there are many cases in which independent
DW are developed concurrently and uncontrolled in the
same organization
• The first step many corporations make is to build a DW for
financial or marketing
• Once it is successfully set up, other parts of the organization
follow independently the process resulting in the coexistence
of more independent DW in the same organization
• This problem will be addressed later
Summary
• Storage structures:
– MDB are more suitable for DW
• Architectures:
– One-Tier Architecture: interesting for mobile applications
– N-Tier Architecture: complexity grows N
– Web-based Architecture: reduction of costs
• Security issues: data encryption/user access and identification
• DW are usually distributed geographically and
technologically