0% found this document useful (0 votes)
7 views

Chapter-06

Chapter 6 of COMP255 discusses the normalization of database tables, outlining its importance in database design and detailing the various normal forms (1NF, 2NF, 3NF, BCNF, and 4NF). It emphasizes the need to eliminate redundancies and anomalies through proper structuring of tables and the application of normalization rules. The chapter also addresses the concept of denormalization and provides a data-modeling checklist to ensure compliance with design principles.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Chapter-06

Chapter 6 of COMP255 discusses the normalization of database tables, outlining its importance in database design and detailing the various normal forms (1NF, 2NF, 3NF, BCNF, and 4NF). It emphasizes the need to eliminate redundancies and anomalies through proper structuring of tables and the application of normalization rules. The chapter also addresses the concept of denormalization and provides a data-modeling checklist to ensure compliance with design principles.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

COMP255

Chapter 6
Normalization of Database Tables

1
Learning Objectives

Explain normalization and its role in the database design process

Identify and describe each of the normal forms: 1NF, 2NF, 3NF, BCNF, and 4NF

Explain how normal forms can be transformed from lower normal forms to
higher normal forms

Apply normalization rules to evaluate and correct table structures

Identify situations that require denormalization to generate information efficiently

Use a data-modeling checklist to check that the ERD meets a set of minimum
requirements

2
Normalization

The process of improving the database design

Assigns attributes to tables based on
determination (Chapter 3)
– Knowing the value of one attribute makes it
possible to determine the value of another

Reduces redundancies and anomalies

3
Data with Bad Structure

4
Issues

Data grouped by project (repeating groups)

Many redundancies
– Employee name
– Job class
– Hourly charge

Primary key?

5
Goals

Create well-formed relations (tables)
– Each table represents a single subject
– Each row/column intersection contains only one value and not a
group of values
– No data item will be unnecessarily stored in more than one table
– All nonprime attributes in a table are dependent on the primary
key
– Each table has no insertion, update, or deletion anomalies

6
Normal Forms

7
Definitions from Chapter 3

Determination
– State in which knowing the value of one attribute makes it possible to
determine the value of another

Functional dependence
– Within a relation R, an attribute B is functionally dependent on an
attribute A if and only if a given value of attribute A determines exactly
one value of attribute B
– The relationship “B is dependent on A” is equivalent to “A determines
B” and is written as A → B

8
Bad Dependencies

Partial dependency

Transitive dependency

9
Partial dependency

Functional dependence in which the
determinant is only part of the primary key
– Assumption: one candidate key
– Straight forward
– Easy to identify

10
Partial Dependency

(PROJ_NUM, EMP_NUM) is the primary key

Dependencies
– (PROJ_NUM, EMP_NUM) → (EMP_NAME, HOURS)
– EMP_NUM → EMP_NAME

Then the functional dependence EMP_NUM → EMP_NAME is partial
dependency
Dependency on primary key

PROJ_NUM EMP_NUM EMP_NAME HOURS

Partial dependency
11
Transitive dependency

Attribute is dependent on another attribute that
is not part of the primary key
– More difficult to identify among a set of data
– Occurs only when a functional dependence exists
among nonprime attributes

12
Transitive Dependency

EMP_NUM is the primary key

Dependencies
– (EMP_NUM) → (EMP_NAME, MGR_NUM, MGR_NAME)
– MGR_NUM → MGR_NAME

MGR_NUM is not part of the primary key
– MGR_NUM is a transitive dependency

Dependency on primary key

EM_NUM EMP_NAME MGR_NUM MGR_NAME

Transitive dependency
13
To First Normal Form (1NF)

Eliminate repeating groups

Identify primary key

Identify dependencies
– Draw a dependency diagram

14
Bad to 1NF

Primary Key: PROJ_NUM and EMP_NUM


15
Dependency Diagram

16
Still Problems

Update anomalies
– Modifying the JOB_CLASS for employee Annelise Jones requires updating many
entries; otherwise, it will generate data inconsistencies

Insertion anomalies
– Adding a new employee requires the employee to be assigned to a project and
therefore to enter duplicate project information. If the employee is not yet assigned to
a project, a phantom project must be created to complete the employee data entry

Deletion anomalies
– Suppose that only one employee is associated with a given project. If that employee
is deleted, the project information will also be deleted.

17
To Second Normal Form (2NF)

Make new tables to remove partial dependencies

Table is in 2NF if:
– It is in 1NF
– Has no partial dependencies

Tables with a single attribute primary key and in
1NF are already 2NF

18
Not in 2NF

19
To Third Normal Form (3NF)

Make new tables to remove transitive
dependencies

Table is in 3NF when it:
– Is in 2NF
– Contains no transitive dependencies

20
Now in 3NF

21
More Things to Look At

Evaluate PK assignments and naming
conventions
– Long JOB_CODE entries not the best
– Use a surrogate primary key

22
More...

Refine attribute atomicity
– Atomic attribute: cannot be further subdivided
– Atomicity: characteristic of an atomic attribute

Possibly split employee name into first, last,
middle initial

23
More...

Identify new attributes and new relationships
– Will probably need to store more attributes on
employees
– Original data showed project managers

Add relationship

24
More...

Refine primary keys as required for data
granularity
– Granularity: Level of detail represented by the
values stored in a table’s row

What does ASSIGN_HOURS represent?
– Identify time frame and update design

25
More...

Maintain historical accuracy
– JOB_CHG_HOUR can change over time
– Save value at time of employee assignment

Evaluate using derived attributes
– Possibly store total costs (hours * hourly charge)

26
Final Results

27
Issues with Surrogate Keys


Two entries for the same job

Trade off designers have to make

28
Boyce-Codd Normal Form (BCNF)

Note the dependency ●
It is not partial
C→B ●
Table is not in BCNF

It is not transitive

29
BCNF Formal Definition

A special type of third normal form (3NF) in which every
determinant is a candidate key
– Determinant: Any attribute in a specific row whose value directly
determines other values in that row
– Candidate key: A minimal superkey; that is, a key that does not contain
a subset of attributes that is itself a superkey
– Superkey: An attribute or attributes that uniquely identify each entity in
a table

A table in BCNF must be in 3NF

30
Concrete Example

31
Fourth Normal Form (4NF)

Rules
– All attributes must be dependent on the primary key, but they
must be independent of each other
– No row may contain two or more multivalued facts about an entity

Table is in 4NF when it:
– Is in 3NF
– Has no multivalued dependencies

32
Example

An Employee can volunteer for many organizations

An Employee can have many assignments

All three table versions are not good

33
Split into Additional Tables

34
Normalization and Data Design

Normalization should be part of the design process
– Proposed entities must meet required the normal form before
table structures are created

Principles and normalization procedures to be
understood to redesign and modify databases
– ERD is created through an iterative process
– Normalization focuses on the characteristics of specific entities

35
Denormalization

Opposing design goals
– Creation of normalized relations
– Processing requirements and speed

As tables are decomposed to conform to
normalization requirements
– Number of database tables expands

36
Denormalization

Joining a larger number of tables
– Takes additional input/output (I/O) operations and processing logic
– Reduces system speed

But, defects in unnormalized tables will happen
– Data updates are less efficient because tables are larger
– Indexing is more cumbersome
– No simple strategies for creating virtual tables known as views

37
Data Modeling Checklist

Business rules
– Properly document and verify all business rules with the end users
– Ensure that all business rules are written precisely, clearly, and
simply
– The business rules must help identify entities, attributes,
relationships, and constraints
– Identify the source of all business rules, and ensure that each
business rule is justified, dated, and signed off by an approving
authority

38
Data Modeling Checklist

Data modeling
– Naming conventions: all names should be limited in
length (database-dependent size)

39
Naming Conventions

Entity names:
– Should be nouns that are familiar to business and should be short
and meaningful
– Should document abbreviations, synonyms, and aliases for each
entity
– Should be unique within the model
– For composite entities, may include a combination of abbreviated
names of the entities linked through the composite entity

40
Naming Conventions

Attribute names:
– Should be unique within the entity
– Should use the entity abbreviation as a prefix
– Should be descriptive of the characteristic
– Should use suffixes such as _ID, _NUM, or _CODE for the PK
attribute
– Should not be a reserved word
– Should not contain spaces or special characters such as @, !, or &

41
Naming Conventions

Relationship names:
– Should be active or passive verbs that clearly
indicate the nature of the relationship

42
Data Modeling Checklist

Entities:
– Each entity should represent a single subject
– Each entity should represent a set of distinguishable entity instances
– All entities should be in 3NF or higher
– Any entities below 3NF should be justified
– Granularity of the entity instance should be clearly defined
– PK should be clearly defined and support the selected data
granularity

43
Data Modeling Checklist

Attributes:
– Should be simple and single-valued (atomic data)
– Should document default values, constraints, synonyms, and
aliases
– Derived attributes should be clearly identified and include source(s)
– Should not be redundant unless this is required for transaction
accuracy, performance, or maintaining a history
– Nonkey attributes must be fully dependent on the PK attribute

44
Data Modeling Checklist

Relationships:
– Should clearly identify relationship participants
– Should clearly define participation, connectivity, and
document cardinality

45
Data Modeling Checklist

ER model:
– Should be validated against expected processes: inserts, updates, and
deletions
– Should evaluate where, when, and how to maintain a history
– Should not contain redundant relationships except as required (see
attributes)
– Should minimize data redundancy to ensure single-place updates
– Should conform to the minimal data rule: All that is needed is there,
and all that is there is needed

46

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy