0% found this document useful (0 votes)
88 views

Database Systems I Relational Algebra: CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52

This document provides an overview of relational algebra and relational query languages. It introduces the basic concepts and operations of relational algebra, including selection, projection, join, union, and Cartesian product. It explains that relational algebra uses relations as operands and allows expressions to be composed by applying operators. The document also discusses relational calculus and how both serve as the basis for structured query languages like SQL. Formal query languages allow for the retrieval and manipulation of data from a relational database.

Uploaded by

dnlkaba
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Database Systems I Relational Algebra: CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52

This document provides an overview of relational algebra and relational query languages. It introduces the basic concepts and operations of relational algebra, including selection, projection, join, union, and Cartesian product. It explains that relational algebra uses relations as operands and allows expressions to be composed by applying operators. The document also discusses relational calculus and how both serve as the basis for structured query languages like SQL. Formal query languages allow for the retrieval and manipulation of data from a relational database.

Uploaded by

dnlkaba
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52

Database Systems I

Relational Algebra
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 53
Relational Query Languages
Query languages: Allow manipulation and
retrieval of data from a database.
Relational model supports simple, powerful
query languages:
Strong formal foundation based on logic.
High level, abstract formulation of queries.
Easy to program.
Allows the DBS to do much optimization.
DBS can choose, e.g., most efficient sorting
algorithm or the order of basic operations.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 54
Relational Query Languages
Query Languages != programming languages!
QLs not expected to be Turing complete.
QLs not intended to be used for complex
calculations.
QLs support easy, efficient access to large data sets.
E.g., in a QL cannot
determine whether the number of tuples of a table is
even or odd,
create a visualization of the results of a query,
ask the user for additional input.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 55
Formal Query Languages
Two mathematical query languages form the
basis for real languages (e.g. SQL), and for
implementation:
Relational Algebra (RA): More procedural, very
useful for representing execution plans, relatively
close to SQL.
Relational Calculus (RC): Lets users describe what
they want, rather than how to compute it. (Non-
procedural, declarative.)
Understanding these formal query languages
is important for understanding SQL and
query processing.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 56
Relational Algebra
An algebra consists of operators and operands.
Operands can be either variables or constants.
In the algebra of arithmetic, atomic operands
are variables such as x or y and constants
such as 15. Operators are the usual arithmetic
operators such as +, -, *.
Expressions are formed by applying operators
to atomic operands or other expressions.
For example,
15
x + 15
(x + 15) * y
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 57
Relational Algebra
Algebraic expressions can be re-ordered
according to commutativity or associativity
laws without changing their resulting value.
E.g.,
15 + 20 = 20 + 15
(x * y) * z = x * (y * z)
Parentheses group operators and define
precedence of operators, e.g.
(x + 15) * y
x + (15 *y)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 58
Relational Algebra
In relational algebra, operands are relations /
tables, and an expression evaluates to a
relation / set of tuples.
The relational algebra operators are
set operations,
operations removing rows (selection) or columns
(projection) from a relation,
operations combining two relations into a new
one (Cartesian product, join),
a renaming operation, which changes the name of
the relation or of its attributes.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 59
Preliminaries
A query is applied to relation instances, and the
result of a query is also a relation instance.
Schemas of input relations for a query are fixed (but
query will run regardless of instance!)
The schema for the result of a given query is also
fixed! Determined by definition of input relations and
query language constructs.
Positional vs. named-attribute notation:
Positional notation easier for formal definitions.
Named-attribute notation more readable.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 60
Example Instances
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
Sailors and
Reserves relations
for our examples.
Well use positional or
named attribute
notation, assume that
names of attributes in
query results are
`inherited from
names of attributes in
query input relations.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 61
Relational Algebra Operations
Basic operations
Selection ( )
Selects a subset of rows from relation.
Projection ( )
Deletes unwanted columns from relation.
Cartesian product ( )
Combine two relations.
Set-difference ( )
Tuples in relation 1, but not in relation 2.
Union ( )
Tuples in relation 1 or in relation 2.
o
t

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 62


Relational Algebra Operations
Renaming of relations / attributes.
Additional operations:
Intersection, join, division.
Not essential, can be implemented using the five
basic operations.
But (very!) useful.
Since each operation returns a relation,
operations can be composed, i.e. output of one
operation can be input of the next operation.
Algebra is closed!
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 63
Renaming
Renames relations / attributes, without changing
the relation instance.

relation R is renamed to S,
attributes are renamed A1, . . ., An
Rename only some attributes

using the positional notation to reference attributes
No renaming of attributes
) ( ) ,..., 2 , 1 ( R An A A S
) ( ) ,..., 1 1 ( R Ak k A S > >
) (R S
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 64
Projection
One input relation.
Deletes attributes that are not in projection list.
Schema of result contains exactly the attributes in
the projection list, with the same names that they
had in the (only) input relation.
Projection operation has to eliminate duplicates,
since relations are sets.
Duplicate elimination is expensive.
Therefore, commercial DBMS typically dont do
duplicate elimination unless the user explicitly
asks for it.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 65
Projection
sname rating
yuppy 9
lubber 8
guppy 5
rusty 10
t
sname rating
S
,
( ) 2
age
35.0
55.5
t
age
S ( ) 2
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 66
Selection
One input relation.
Selects all tuples that satisfy selection condition.
No duplicates in result!
(Why?)
Schema of result identical to schema of (only)
input relation.
Selection conditions:
simple conditions comparing attribute values
(variables) and / or constants or
complex conditions that combine simple conditions
using logical connectives AND and OR.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 67
Selection
o
rating
S
>8
2 ( )
sid sname rating age
28 yuppy 9 35.0
58 rusty 10 35.0
sname rating
yuppy 9
rusty 10
t o
sname rating rating
S
,
( ( ))
>8
2
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 68
Union, Intersection, Set-Difference
All of these set operations
take two input relations,
which must be union-
compatible:
Same sets of attributes.
Corresponding attributes
have same type.
What is the schema of result?
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
44 guppy 5 35.0
28 yuppy 9 35.0
sid sname rating age
31 lubber 8 55.5
58 rusty 10 35.0
S S 1 2
S S 1 2
sid sname rating age
22 dustin 7 45.0
S S 1 2
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 69
Cartesian Product
Also referred to as cross-product or product.
Two input relations.
Each tuple of the one relation is paired with each
tuple of the other relation.
Result schema has one attribute per attribute of
both input relations, with attribute names
`inherited if possible.
In the result, there may be two attributes with
the same name, e.g. both S1 and R1 have an
attribute called sid.
Then, apply the renaming operation, e.g.
) 1 ( ) 2 5 , 1 1 ( 1 R sid sid S
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 70
Cartesian Product
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid bid day
22 101 10/10/96
58 103 11/12/96

R1
S1
2 1 S S
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 71
Join
Similar to Cartesian product with same result
schema.
Each tuple of the one relation is paired with
each tuple of the other relation if the two
tuples satisfy the join condition.
Theta-Join:





R
c
S
c
R S = o ( )
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
1 1 : Example
. 1 . 1
R S
sid R sid S <

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 72
Join
Equi-Join: A special case of Theta-join where
the condition c contains only equalities.




Result schema similar to Cartesian product,
but only one copy of attributes for which
equality is specified.
Natural Join: Equi-join on all common
attributes.
sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
1 1 : Example R S
sid

1 1 : Example R S
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 73
Division
Not supported as a primitive operation, but
useful for expressing queries like:
Find sailors who have reserved all boats.
Let A have 2 attributes, x and y; B have only
attribute y:
A/B =
i.e., A/B contains all x tuples (sailors) such that for
every y tuple (boat) in B, there is an xy tuple
(reservation) in A.
In general, x and y can be any lists of attributes;
y is the list of attributes in B, and x y is the list
of attributes of A.
{ } A y x B y x >e < - e , : |

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 74


Division
sno pno
s1 p1
s1 p2
s1 p3
s1 p4
s2 p1
s2 p2
s3 p2
s4 p2
s4 p4
pno
p2
pno
p2
p4
pno
p1
p2
p4
sno
s1
s2
s3
s4
sno
s1
s4
sno
s1
A
B1
B2
B3
A/B1 A/B2 A/B3
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 75
Division
Division is not an essential operation; can be
implemented using the five basic operations.
Also true of joins, but joins are so common that
systems implement joins specially.
Idea: For A/B, compute all x values in A that are
not `disqualified by some y value in B.
x value in A is disqualified if by attaching y value
from B, we obtain an xy tuple that is not in A.

Disqualified x values:
A/B:
t t
x x
A B A (( ( ) ) )
t
x
A ( ) all disqualified x values
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 76
Find names of sailors whove reserved boat #103.
Solution 1:

Solution 2:




Solution 3:
t o
sname
bid
serves Sailors (( Re ) )
=103

) Re (
103
1 serves
bid
Temp
=
o
) 1 ( 2 Sailors Temp Temp
t
sname
Temp ( ) 2
t o
sname
bid
serves Sailors ( (Re ))
=103

Example Queries
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 77
Find names of sailors whove reserved a red boat.
Information about boat color only available in
Boats; so need an extra join:


A more efficient solution:


A query optimizer can find the second solution
given the first one.
t o
sname
color red
Boats serves Sailors ((
' '
) Re )
=

t t t o
sname
sid bid color red
Boats s Sailors ( ((
' '
) Re ) )
=

Example Queries
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 78
Find sailors whove reserved a red or a green
boat.
Can identify all red or green boats, then find
sailors whove reserved one of these boats:



Can also define Tempboats using union! (How?)
What happens if OR is replaced by AND in this
query?
)
' ' ' '
( Boats
green color OR red color
Tempboats
= =
o
t
sname
Tempboats serves Sailors ( Re )
Example Queries
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 79
Find sailors whove reserved a red and a green
boat.
Previous approach wont work! Must identify
sailors whove reserved red boats, sailors
whove reserved green boats, then find the
intersection (note that sid is a key for Sailors):


)) Re )
' '
( ( ( serves Boats
red color sid
Tempred
=
o t
t
sname
Tempred Tempgreen Sailors (( ) )
)) Re )
' '
( ( ( serves Boats
green color sid
Tempgreen
=
o t
Example Queries
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 80
Find the names of sailors whove reserved all
boats.
Uses division; schemas of the input relations
must be carefully chosen:



To find sailors whove reserved all Interlake
boats:
)) ( / ) Re
,
(( Boats
bid
serves
bid sid
Tempsids t t
t
sname
Tempsids Sailors ( )
)
' '
( / Boats
Interlake bname bid =
o t
Example Queries
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 81
Query Optimization
A user of a commercial DBMS formulates SQL
queries.
The query optimizer translates this query into
an equivalent RA query, i.e. an RA query with
the same result.
In order to optimize the efficiency of query
processing, the query optimizer can re-order
the individual operations within the RA query.
Re-ordering has to preserve the query
semantics and is based on RA equivalences.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 82
Query Optimization
Why can re-ordering improve the efficiency?
Different orders can imply different sizes of the
intermediate results.
The smaller the intermediate results, the more
efficient.
Example:


much (!) more efficient than

) ) Re )
' '
((( Sailors serves Boats
red color

=
o
) ) ((Re
' '
Boats Sailors serves
red color

=
o
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 83
Relational Algebra Equivalences
The most important RA equivalences are
commutative and associative laws.
A commutative law about some operation states
that the order of (two) arguments does not
matter.
An associative law about some (binary) operation
states that (more than two) arguments can be
grouped either from the left or from the right.
If an operation is both commutative and
associative, then any number of arguments can
be (re-)ordered in an arbitrary manner.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 84
Relational Algebra Equivalences
The following (binary) RA operations are
commutative and associative:
For example, we have:



Proof method: show that each tuple produced
by the expression on the left is also produced
by the expression on the right and vice versa.

(R S) (S R)

(Commutative)

R (S T) (R S) T

(Associative)




CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 85
Relational Algebra Equivalences
Selections are crucial from the point of view of
query optimization, because they typically
reduce the size of intermediate results by a
significant factor.
Laws for selections only:



( ) ( ) ( ) R R
cn c cn AND AND c
o o o ...
1 ... 1

( )
( )
( )
( )
o o o o
c c c c
R R
1 2 2 1

(Splitting)
(Commutative)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 86
Relational Algebra Equivalences
Laws for the combination of selections and
other operations:
if R has all attributes mentioned in c


if S has all attributes mentioned in c


The above laws can be applied to push
selections down as much as possible in an
expression, i.e. performing selections as early
as possible.
S R S R
c c
) ( ) ( o o
) ( ) ( S R S R
c c
o o
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 87
Relational Algebra Equivalences
A projection commutes with a selection that
only uses attributes retained by the projection.
Selection between attributes of the two
arguments of a Cartesian product converts
Cartesian product to a join.
Similarly, if a projection follows a join R S,
we can `push it by retaining only attributes of
R (and S) that are needed for the join or are
kept by the projection.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 88


Summary
The relational model has formal query
languages that are easy to use and allow
efficient optimization by the DBS.
Relational algebra (RA) is more procedural;
used as internal representation for SQL query
evaluation plans.
Five basic RA operations: selection, projection,
Cartesian product, union, set-difference.
Additional operations as shorthand for
important cases: intersection, join, division.
These operations can be implemented using the
basic operations.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 89
Summary
Several ways of expressing a given query; a
query optimizer chooses the most efficient
version.
Query optimization exploits RA
equivalencies to re-order the operations
within an RA expression.
Optimization criterion is to minimize the size
of intermediate relations.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy