Database Systems I Relational Algebra: CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52
Database Systems I Relational Algebra: CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52
Database Systems I
Relational Algebra
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 53
Relational Query Languages
Query languages: Allow manipulation and
retrieval of data from a database.
Relational model supports simple, powerful
query languages:
Strong formal foundation based on logic.
High level, abstract formulation of queries.
Easy to program.
Allows the DBS to do much optimization.
DBS can choose, e.g., most efficient sorting
algorithm or the order of basic operations.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 54
Relational Query Languages
Query Languages != programming languages!
QLs not expected to be Turing complete.
QLs not intended to be used for complex
calculations.
QLs support easy, efficient access to large data sets.
E.g., in a QL cannot
determine whether the number of tuples of a table is
even or odd,
create a visualization of the results of a query,
ask the user for additional input.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 55
Formal Query Languages
Two mathematical query languages form the
basis for real languages (e.g. SQL), and for
implementation:
Relational Algebra (RA): More procedural, very
useful for representing execution plans, relatively
close to SQL.
Relational Calculus (RC): Lets users describe what
they want, rather than how to compute it. (Non-
procedural, declarative.)
Understanding these formal query languages
is important for understanding SQL and
query processing.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 56
Relational Algebra
An algebra consists of operators and operands.
Operands can be either variables or constants.
In the algebra of arithmetic, atomic operands
are variables such as x or y and constants
such as 15. Operators are the usual arithmetic
operators such as +, -, *.
Expressions are formed by applying operators
to atomic operands or other expressions.
For example,
15
x + 15
(x + 15) * y
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 57
Relational Algebra
Algebraic expressions can be re-ordered
according to commutativity or associativity
laws without changing their resulting value.
E.g.,
15 + 20 = 20 + 15
(x * y) * z = x * (y * z)
Parentheses group operators and define
precedence of operators, e.g.
(x + 15) * y
x + (15 *y)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 58
Relational Algebra
In relational algebra, operands are relations /
tables, and an expression evaluates to a
relation / set of tuples.
The relational algebra operators are
set operations,
operations removing rows (selection) or columns
(projection) from a relation,
operations combining two relations into a new
one (Cartesian product, join),
a renaming operation, which changes the name of
the relation or of its attributes.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 59
Preliminaries
A query is applied to relation instances, and the
result of a query is also a relation instance.
Schemas of input relations for a query are fixed (but
query will run regardless of instance!)
The schema for the result of a given query is also
fixed! Determined by definition of input relations and
query language constructs.
Positional vs. named-attribute notation:
Positional notation easier for formal definitions.
Named-attribute notation more readable.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 60
Example Instances
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
Sailors and
Reserves relations
for our examples.
Well use positional or
named attribute
notation, assume that
names of attributes in
query results are
`inherited from
names of attributes in
query input relations.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 61
Relational Algebra Operations
Basic operations
Selection ( )
Selects a subset of rows from relation.
Projection ( )
Deletes unwanted columns from relation.
Cartesian product ( )
Combine two relations.
Set-difference ( )
Tuples in relation 1, but not in relation 2.
Union ( )
Tuples in relation 1 or in relation 2.
o
t
(R S) (S R)
(Commutative)
R (S T) (R S) T
(Associative)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 85
Relational Algebra Equivalences
Selections are crucial from the point of view of
query optimization, because they typically
reduce the size of intermediate results by a
significant factor.
Laws for selections only:
( ) ( ) ( ) R R
cn c cn AND AND c
o o o ...
1 ... 1
( )
( )
( )
( )
o o o o
c c c c
R R
1 2 2 1
(Splitting)
(Commutative)
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 86
Relational Algebra Equivalences
Laws for the combination of selections and
other operations:
if R has all attributes mentioned in c
if S has all attributes mentioned in c
The above laws can be applied to push
selections down as much as possible in an
expression, i.e. performing selections as early
as possible.
S R S R
c c
) ( ) ( o o
) ( ) ( S R S R
c c
o o
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 87
Relational Algebra Equivalences
A projection commutes with a selection that
only uses attributes retained by the projection.
Selection between attributes of the two
arguments of a Cartesian product converts
Cartesian product to a join.
Similarly, if a projection follows a join R S,
we can `push it by retaining only attributes of
R (and S) that are needed for the join or are
kept by the projection.