Chapter-06
Chapter-06
Chapter 6
Normalization of Database Tables
1
Learning Objectives
●
Explain normalization and its role in the database design process
●
Identify and describe each of the normal forms: 1NF, 2NF, 3NF, BCNF, and 4NF
●
Explain how normal forms can be transformed from lower normal forms to
higher normal forms
●
Apply normalization rules to evaluate and correct table structures
●
Identify situations that require denormalization to generate information efficiently
●
Use a data-modeling checklist to check that the ERD meets a set of minimum
requirements
2
Normalization
●
The process of improving the database design
●
Assigns attributes to tables based on
determination (Chapter 3)
– Knowing the value of one attribute makes it
possible to determine the value of another
●
Reduces redundancies and anomalies
3
Data with Bad Structure
4
Issues
●
Data grouped by project (repeating groups)
●
Many redundancies
– Employee name
– Job class
– Hourly charge
●
Primary key?
5
Goals
●
Create well-formed relations (tables)
– Each table represents a single subject
– Each row/column intersection contains only one value and not a
group of values
– No data item will be unnecessarily stored in more than one table
– All nonprime attributes in a table are dependent on the primary
key
– Each table has no insertion, update, or deletion anomalies
6
Normal Forms
7
Definitions from Chapter 3
●
Determination
– State in which knowing the value of one attribute makes it possible to
determine the value of another
●
Functional dependence
– Within a relation R, an attribute B is functionally dependent on an
attribute A if and only if a given value of attribute A determines exactly
one value of attribute B
– The relationship “B is dependent on A” is equivalent to “A determines
B” and is written as A → B
8
Bad Dependencies
●
Partial dependency
●
Transitive dependency
9
Partial dependency
●
Functional dependence in which the
determinant is only part of the primary key
– Assumption: one candidate key
– Straight forward
– Easy to identify
10
Partial Dependency
●
(PROJ_NUM, EMP_NUM) is the primary key
●
Dependencies
– (PROJ_NUM, EMP_NUM) → (EMP_NAME, HOURS)
– EMP_NUM → EMP_NAME
●
Then the functional dependence EMP_NUM → EMP_NAME is partial
dependency
Dependency on primary key
Partial dependency
11
Transitive dependency
●
Attribute is dependent on another attribute that
is not part of the primary key
– More difficult to identify among a set of data
– Occurs only when a functional dependence exists
among nonprime attributes
12
Transitive Dependency
●
EMP_NUM is the primary key
●
Dependencies
– (EMP_NUM) → (EMP_NAME, MGR_NUM, MGR_NAME)
– MGR_NUM → MGR_NAME
●
MGR_NUM is not part of the primary key
– MGR_NUM is a transitive dependency
Transitive dependency
13
To First Normal Form (1NF)
●
Eliminate repeating groups
●
Identify primary key
●
Identify dependencies
– Draw a dependency diagram
14
Bad to 1NF
16
Still Problems
●
Update anomalies
– Modifying the JOB_CLASS for employee Annelise Jones requires updating many
entries; otherwise, it will generate data inconsistencies
●
Insertion anomalies
– Adding a new employee requires the employee to be assigned to a project and
therefore to enter duplicate project information. If the employee is not yet assigned to
a project, a phantom project must be created to complete the employee data entry
●
Deletion anomalies
– Suppose that only one employee is associated with a given project. If that employee
is deleted, the project information will also be deleted.
17
To Second Normal Form (2NF)
●
Make new tables to remove partial dependencies
●
Table is in 2NF if:
– It is in 1NF
– Has no partial dependencies
●
Tables with a single attribute primary key and in
1NF are already 2NF
18
Not in 2NF
19
To Third Normal Form (3NF)
●
Make new tables to remove transitive
dependencies
●
Table is in 3NF when it:
– Is in 2NF
– Contains no transitive dependencies
20
Now in 3NF
21
More Things to Look At
●
Evaluate PK assignments and naming
conventions
– Long JOB_CODE entries not the best
– Use a surrogate primary key
22
More...
●
Refine attribute atomicity
– Atomic attribute: cannot be further subdivided
– Atomicity: characteristic of an atomic attribute
●
Possibly split employee name into first, last,
middle initial
23
More...
●
Identify new attributes and new relationships
– Will probably need to store more attributes on
employees
– Original data showed project managers
●
Add relationship
24
More...
●
Refine primary keys as required for data
granularity
– Granularity: Level of detail represented by the
values stored in a table’s row
●
What does ASSIGN_HOURS represent?
– Identify time frame and update design
25
More...
●
Maintain historical accuracy
– JOB_CHG_HOUR can change over time
– Save value at time of employee assignment
●
Evaluate using derived attributes
– Possibly store total costs (hours * hourly charge)
26
Final Results
27
Issues with Surrogate Keys
●
Two entries for the same job
●
Trade off designers have to make
28
Boyce-Codd Normal Form (BCNF)
●
Note the dependency ●
It is not partial
C→B ●
Table is not in BCNF
●
It is not transitive
29
BCNF Formal Definition
●
A special type of third normal form (3NF) in which every
determinant is a candidate key
– Determinant: Any attribute in a specific row whose value directly
determines other values in that row
– Candidate key: A minimal superkey; that is, a key that does not contain
a subset of attributes that is itself a superkey
– Superkey: An attribute or attributes that uniquely identify each entity in
a table
●
A table in BCNF must be in 3NF
30
Concrete Example
31
Fourth Normal Form (4NF)
●
Rules
– All attributes must be dependent on the primary key, but they
must be independent of each other
– No row may contain two or more multivalued facts about an entity
●
Table is in 4NF when it:
– Is in 3NF
– Has no multivalued dependencies
32
Example
●
An Employee can volunteer for many organizations
●
An Employee can have many assignments
●
All three table versions are not good
33
Split into Additional Tables
34
Normalization and Data Design
●
Normalization should be part of the design process
– Proposed entities must meet required the normal form before
table structures are created
●
Principles and normalization procedures to be
understood to redesign and modify databases
– ERD is created through an iterative process
– Normalization focuses on the characteristics of specific entities
35
Denormalization
●
Opposing design goals
– Creation of normalized relations
– Processing requirements and speed
●
As tables are decomposed to conform to
normalization requirements
– Number of database tables expands
36
Denormalization
●
Joining a larger number of tables
– Takes additional input/output (I/O) operations and processing logic
– Reduces system speed
●
But, defects in unnormalized tables will happen
– Data updates are less efficient because tables are larger
– Indexing is more cumbersome
– No simple strategies for creating virtual tables known as views
37
Data Modeling Checklist
●
Business rules
– Properly document and verify all business rules with the end users
– Ensure that all business rules are written precisely, clearly, and
simply
– The business rules must help identify entities, attributes,
relationships, and constraints
– Identify the source of all business rules, and ensure that each
business rule is justified, dated, and signed off by an approving
authority
38
Data Modeling Checklist
●
Data modeling
– Naming conventions: all names should be limited in
length (database-dependent size)
39
Naming Conventions
●
Entity names:
– Should be nouns that are familiar to business and should be short
and meaningful
– Should document abbreviations, synonyms, and aliases for each
entity
– Should be unique within the model
– For composite entities, may include a combination of abbreviated
names of the entities linked through the composite entity
40
Naming Conventions
●
Attribute names:
– Should be unique within the entity
– Should use the entity abbreviation as a prefix
– Should be descriptive of the characteristic
– Should use suffixes such as _ID, _NUM, or _CODE for the PK
attribute
– Should not be a reserved word
– Should not contain spaces or special characters such as @, !, or &
41
Naming Conventions
●
Relationship names:
– Should be active or passive verbs that clearly
indicate the nature of the relationship
42
Data Modeling Checklist
●
Entities:
– Each entity should represent a single subject
– Each entity should represent a set of distinguishable entity instances
– All entities should be in 3NF or higher
– Any entities below 3NF should be justified
– Granularity of the entity instance should be clearly defined
– PK should be clearly defined and support the selected data
granularity
43
Data Modeling Checklist
●
Attributes:
– Should be simple and single-valued (atomic data)
– Should document default values, constraints, synonyms, and
aliases
– Derived attributes should be clearly identified and include source(s)
– Should not be redundant unless this is required for transaction
accuracy, performance, or maintaining a history
– Nonkey attributes must be fully dependent on the PK attribute
44
Data Modeling Checklist
●
Relationships:
– Should clearly identify relationship participants
– Should clearly define participation, connectivity, and
document cardinality
45
Data Modeling Checklist
●
ER model:
– Should be validated against expected processes: inserts, updates, and
deletions
– Should evaluate where, when, and how to maintain a history
– Should not contain redundant relationships except as required (see
attributes)
– Should minimize data redundancy to ensure single-place updates
– Should conform to the minimal data rule: All that is needed is there,
and all that is there is needed
46