DBMS
DBMS
In defining relational algebra and calculus, the alternative of referring to fields by position is
more convenient than referring to fields by name: Queries often involve the computation of
intermediate results, which are themselves relation instances, and if we use field names to refer
to fields, the definition of query language constructs must specify the names of fields for all
intermediate relation instances.
The key fields are underlined, and the domain of each field is listed after the field name.
Thus sid is the key for Sailors, bid is the key for Boats, and all three fields together form the key
for Reserves. Fields in an instance of one of these relations will be referred to by name, or
positionally, using the order in which they are listed above.
RELATIONAL ALGEBRA
Relational algebra is one of the two formal query languages associated with the re-
lational model. Queries in algebra are composed using a collection of operators. A fundamental
property is that every operator in the algebra accepts (one or two) rela-tion instances as
arguments and returns a relation instance as the result.
Each relational query describes a step-by-step procedure for computing the desired
answer, based on the order in which operators are applied in the query.
Selection and Projection
Relational algebra includes operators to select rows from a relation (σ) and to project columns
(π). These operations allow us to manipulate data in a single relation. Con - sider the instance
σrating>8(S2)
The selection operator σ specifies the tuples to retain through a selection condition. In general,
the selection condition is a boolean combination (i.e., an expression using the logical connectives
∧ and ∨) of terms that have the form attribute op constant or attribute1 op attribute2, where op is
one of the comparison operators <, <=, =, =, >=, or >.
The projection operator π allows us to extract columns from a relation; for example, we can find
out all sailor names and ratings by using π. The expression πsname,rating(S2)
Suppose that we wanted to find out only the ages of sailors. The expression
πage(S2)
a single tuple with age=35.0 appears in the result of the projection. This follows from
the definition of a relation as a set of tuples. However, our discussion of relational algebra and
calculus assumes that duplicate elimination is always done so that relations are always sets of
tuples.
Set Operations
The following standard operations on sets are also available in relational algebra: union (U),
intersection (∩), set-difference (−), and cross-product (×).
Union: R u S returns a relation instance containing all tuples that occur in either
relation instance R or relation instance S (or both). R and S must be union-compatible, and
the schema of the result is defined to be identical to the schema of R.
Figure 4.9 S1 ∩ S2
sid sname rating age
22 Dustin 7 45.0
Figure 4.10 S1 − S2
We introduce a renaming operator ρ for this purpose. The expression ρ(R(F ), E) takes an
arbitrary relational algebra expression E and returns an instance of a (new) relation called R. R
contains the same tuples as the result of E, and has the same schema as E, but some fields are
renamed. The field names in relation R are the same as in E, except for fields renamed in the
renaming list F.
For example, the expression ρ(C(1 → sid1, 5 → sid2), S1 × R1) returns a relation that contains
the tuples shown in Figure 4.11 and has the followi ng schema: C(sid1: integer, sname: string,
rating: integer, age: real, sid2: integer, bid: integer,day: dates).
The join operation is one of the most useful operations in relational algebra and is the most
commonly used way to combine information from two or more relations. Although a join can
be defined as a cross-product followed by selections and projections, joins arise much more
frequently in practice than plain cross-products.joins have received a lot of attention, and there
are several variants of the join operation.
Condition Joins
The most general version of the join operation accepts a join condition c and a pair of relation
instances as arguments, and returns a relation instance. The join condition is identical to a
selection condition in form. The operation is defined as follows:
R ⊲⊳c S = σc(R × S)
Thus ⊲⊳ is defined to be a cross-product followed by a selection. Note that the condition c can
(and typically does) refer to attributes of both R and S.
(sid) sname rating age (sid) bid day
22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 58 103 11/12/96
Figure 4.12 S1 ⊲⊳S1.sid<R1.sid R1
Equijoin
A common special case of the join operation R ⊲⊳ S is when the join condition con-sists solely
of equalities (connected by ∧) of the form R.name1 = S.name2, that is, equalities between two
fields in R and S. In this case, obviously, there is some redun-dancy in retaining both attributes
in the result.
Natural Join
A further special case of the join operation R ⊲⊳ S is an equijoin in which equalities are
specified on all fields having the same name in R and S. In this case, we can simply omit the
join condition; the default is that the join condition is a collection of equalities on all common
fields.
The division operator is useful for expressing certain kinds of queries, for example: “Find the
names of sailors who have reserved all boats.” Understanding how to use the basic operators of
the algebra to define division is a useful exercise.
(Q1) Find the names of sailors who have reserved boat 103.
(Q2) Find the names of sailors who have reserved a red boat.
πsname((σcolor=′red′ Boats) ⊲⊳ Reserves ⊲⊳ Sailors
This query involves a series of two joins. First we choose (tuples describing) red boats.
(Q4) Find the names of sailors who have reserved at least one boat.
πsname(Sailors ⊲⊳ Reserves)
(Q5) Find the names of sailors who have reserved a red or a green boat.
ρ(T empboats, (σcolor=′red′ Boats) U (σcolor=′green′ Boats))
πsname(Tempboats ⊲⊳Reserves ⊲⊳Sailors)
(Q6) Find the names of sailors who have reserved a red and a green boat
ρ(T empboats2, (σcolor=′red′ Boats) ∩ (σcolor=′green′ Boats))
πsname(Tempboats2 ⊲⊳ Reserves ⊲⊳ Sailors)
However, this solution is incorrect —it instead tries to compute sailors who have re-served a boat
that is both red and green.
ρ(T empred, πsid((σcolor=′red′ Boats) ⊲⊳ Reserves))
ρ(T empgreen, πsid((σcolor=′green′ Boats) ⊲⊳ Reserves))
πsname((Tempred ∩ Tempgreen) ⊲⊳ Sailors)
(Q7) Find the names of sailors who have reserved at least two boats.
πsname1σ(sid1=sid2) ∩ (bid1=bid2)Reservationpairs
(Q8) Find the sids of sailors with age over 20 who have not reserved a red boat.
πsid(σage>20Sailors) −πsid((σcolor=′red′ Boats) ⊲⊳ Reserves ⊲⊳ Sailors)
This query illustrates the use of the set-difference operator. Again, we use the fact that sid is the
key for Sailors.
(Q9) Find the names of sailors who have reserved all boats.
The use of the word all (or every) is a good indication that the division operation might be
applicable:
ρ(T empsids, (πsid,bidReserves)/(πbidBoats))
πsname(Tempsids ⊲⊳ Sailors)
(Q10) Find the names of sailors who have reserved all boats called Interlake.
ρ(T empsids, (πsid,bidReserves)/(πbid(σbname=′Interlake′ Boats)))
πsname(Tempsids ⊲⊳ Sailors)
RELATIONAL CALCULUS
A tuple variable is a variable that takes on tuples of a particular relation schema as values. That
is, every value assigned to a given tuple variable has the same number and type of fields.
F is of the form ¬p, and p is not true; or of the form p ^ q, and both p and q are true;
or of the form p V q, and one of them is true, or of the form p q and q is true
whenever4 p is true.
F is of the form R(p(R)), and there is some assignment of tuples to the free variables
in p(R), including the variable R,5 that makes the formula p(R) true.
F is of the form R(p(R)), and there is some assignment of tuples to the free
variables in p(R) that makes the formula p(R) true no matter what tuple is assigned to
R.
(Q12) Find the names and ages of sailors with a rating above 7 .
(Q13) Find the sailor name, boat id, and reservation date for each reservation
{P | R Reserves S Sailors
(R.sid = S.sid P.bid = R.bid P.day = R.day P.sname = S.sname)}
(Q1) Find the names of sailors who have reserved boat 103.
This query can be read as follows: “Retrieve all sailor tuples for which there exists
a tuple in Reserves, having the same value in the sid field, and with bid = 103.”
(Q2) Find the names of sailors who have reserved a red boat.
{P | S Sailors R Reserves(R.sid = S.sid P.sname = S.sname
B Boats(B.bid = R.bid B.color =′red′))}
(Q7) Find the names of sailors who have reserved at least two boats. {P |
S Sailors R1 Reserves R2 Reserves (S.sid = R1.sid
R1.sid = R2.sid R1.bid ≠ R2.bid P.sname = S.sname)}
(Q9) Find the names of sailors who have reserved all boats.
{P | S Sailors B Boats
(R Reserves(S.sid = R.sid R.bid = B.bid P.sname = S.sname))}
(Q14) Find sailors who have reserved all red boats.
{S | S Sailors B Boats
(B.color =′red′ (R ∈ Reserves(S.sid = R.sid R.bid = B.bid)))}
A domain variable is a variable that ranges over the values in the domain of some attribute (e.g.,
the variable can be assigned an integer if it appears in an attribute
whose domain is the set of integers). A DRC query has the form {〈 x1, x2, . . . , xn〉 |
p(〈 x1,x2,.. ., xn〉 )}, where each xi is either a domain variable or a constant and p(〈 x1,x2,.. .,
xn〉) denotes a DRC formula whose only free variables are thevari-ables among the xi, 1 ≤ i ≤ n.
The result of this query is the set of all tuples 〈x1, x2,.. .,xn〉 for which the formula evaluates to
true.
(Q1) Find the names of sailors who have reserved boat 103.
{〈N 〉 | I, T, A(〈I, N, T, A〉 Sailors
Ir, Br, D(〈Ir, Br, D〉 Reserves Ir = I Br =
103))} (Q2) Find the names of sailors who have reserved a red boat.
This section presents the syntax of a simple SQL query and explains its meaning through a
conceptual evaluation strategy. A conceptual evaluation strategy is a way to evaluate the query
that is intended to be easy to understand, rather than efficient. A DBMS would typically execute
a query in a different and more efficient way.
The answer to this query with and without the keyword DISTINCT on instance S3 of Sailors is
shown in Figures 5.4 and 5.5. The only difference is that the tuple for Horatio appears twice if
DISTINCT is omitted; this is because there are two sailors called Horatio and age 35.
SELECT S.sid, S.sname, S.rating, S.age FROM Sailors AS S WHERE S.rating > 7
(Q16) Find the sids of sailors who have reserved a red boat.
SELECT R.sid FROM Boats B, Reserves R WHERE B.bid = R.bid AND B.color = ‘red’
(Q2) Find the names of sailors who have reserved a red boat.
SELECT S.sname FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND
R.bid = B.bid AND B.color = ‘red’
(Q4) Find the names of sailors who have reserved at least one boat.
SQL supports a more general version of the select-list than just a list of columns. Each item in a
select-list can be of the form expression AS column name, where expression is any arithmetic or
string expression over column names (possibly prefixed by range variables) and constants.
(Q5) Compute increments for the ratings of persons who have sailed two different boats on
the same day.
(Q6) Find the ages of sailors whose name begins and ends with B and has at least three
characters.
SQL provides three set-manipulation constructs that extend the basic query form pre-sented
earlier. Since the answer to a query is a multiset of rows, it is natural to consider the use of
operations such as union, intersection, and difference. SQL supports these operations under the
names UNION, INTERSECT, and EXCEPT.4 SQL also provides other set operations: IN (to
check if an element is in a given set),op ANY,op ALL(tocom-pare a value with the elements in a
given set, using comparison operator op), and EXISTS (to check if a set is empty). IN and
EXISTS can be prefixed by NOT, with the obvious modification to their meaning. We cover
UNION, INTERSECT, and EXCEPT in this section. Consider the following query:
(Q1) Find the names of sailors who have reserved both a red and a green boat.
SELECT S.sname FROM Sailors S, Reserves R1, Boats B1, Reserves R2, Boats
B2 WHERE S.sid = R1.sid AND R1.bid = B1.bid AND S.sid = R2.sid AND R2.bid
= B2.bid AND B1.color=‘red’ AND B2.color = ‘green’
(Q2) Find the sids of all sailors who have reserved red boats but not green boats.
A nested query is a querythat has another query embedded within it; the embedded query is
called a subquery.
(Q1) Find the names of sailors who have reserved boat 103.
SELECT S.sname
FROM Sailors S
WHERE S.sid IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid = 103 )
(Q2) Find the names of sailors who have reserved a red boat.
SELECT S.sname
FROM Sailors S
WHERE S.sid IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = ‘red’ )
(Q3) Find the names of sailors who have not reserved a red boat.
SELECT S.sname
FROM Sailors S
WHERE S.sid NOT IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = ‘red’ )
In the nested queries that we have seen thus far, the inner subquery has been completely
independent of the outer query:
(Q1) Find the names of sailors who have reserved boat number 103.
SELECT S.sname
FROM Sailors S
WHERE EXISTS ( SELECT *
FROM Reserves R
WHERE R.bid = 103
AND R.sid = S.sid )
Set-Comparison Operators
(Q1) Find sailors whose rating is better than some sailor called Horatio.
SELECT S.sid
FROM Sailors S
WHERE S.rating > ANY ( SELECT S2.rating
FROM Sailors S2
WHERE S2.sname = ‘Horatio’ )
SELECT S.sid
FROM Sailors S
WHERE S.rating >= ALL ( SELECT S2.rating
FROM Sailors S2 )
(Q1) Find the names of sailors who have reserved both a red and a green boat.
SELECT S.sname
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’
AND S.sid IN ( SELECT S2.sid
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid
AND B2.color = ‘green’ )
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS (( SELECT B.bid
FROM Boats B )
EXCEPT
(SELECT R.bid
FROM Reserves R
WHERE R.sid = S.sid ))
AGGREGATE OPERATORS
We now consider a powerful class of constructs for computing aggregate values such as MIN
and SUM.
Q32) Find the age of the youngest sailor who is eligible to vote (i.e., is at least 18 years old)
for each rating level with at least two such sailors.
Q3) For each red boat, find the number of reservations for this boat.
SELECT B.bid, COUNT (*) AS sailorcount FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = ‘red’ GROUP BY B.bid
(Q4) Find the average age of sailors for each rating level that has at least two sailors.
(Q5) Find the average age of sailors who are of voting age (i.e., at least 18 years old) for each
rating level that has at least two sailors.
SELECT S.rating, AVG ( S.age ) AS avgage
FROM Sailors S
WHERE S. age >= 18
GROUP BY S.rating
HAVING 1 < ( SELECT COUNT (*)
(Q6) Find the average age of sailors who are of voting age (i.e., at least 18 years old) for
each rating level that has at least two such sailors.
SELECT S.rating, AVG ( S.age ) AS avgage
FROM Sailors S
WHERE S. age > 18
GROUP BY S.rating
The above formulation of the query reflects the fact that it is a variant of Q35. The answer to
Q36 on instance S3 is shown in Figure 5.16. It differs from the answer to Q35 in that there is no
tuple for rating 10, since there is only one tuple with rating 10 and age ≥ 18.
This formulation of Q36 takes advantage of the fact that the WHERE clause is applied before
grouping is done; thus, only sailors with age > 18 are left when grouping is done. It is
instructive to consider yet another way of writing this query:
SELECT Temp.rating, Temp.avgage
FROM ( SELECT S.rating, AVG ( S.age ) AS
avgage, COUNT (*) AS
ratingcount
FROM Sailors S WHERE S. age > 18 GROUP BY S.rating ) AS Temp
WHERE Temp.ratingcount > 1
NULL VALUES
we have assumed that column values in a row are always known. In practice column values can
be unknown. For example, when a sailor, say Dan, joins a yacht club, he may not yet have a
rating assigned. Since the definition for the Sailors table has a rating column, what row should
we insert for Dan? What is needed here is a special value that denotes unknown.
Consider a comparison such as rating = 8. If this is applied to the row for Dan, is this condition
true or false? Since Dan’s rating is unknown, it is reasonable to say that this comparison should
evaluate to the value unknown.
SQL also provides a special comparison operator IS NULL to test whether a column value is
null; for example, we can say rating IS NULL, which would evaluate to true on the row
representing Dan. We can also say rating IS NOT NULL, which would evaluate to false on the
row for Dan.
Now, what about boolean expressions such as rating = 8 OR age < 40 and rating = 8 AND
age < 40? Considering the row for Dan again, because age < 40, the first expression evaluates to
true regardless of the value of rating, but what about the second? We can only say unknown.
INTRODUCTION TO VIEWS
A view is a table whose rows are not explicitly stored in the database but are computed as needed
from a view de nition. Consider the Students and Enrolled relations.
This view can be used just like a base table, or explicitly stored table, in de ning new queries or
views.
If we decide that we no longer need a base table and want to destroy it (i.e., delete all the rows
and remove the table de nition information), we can use the DROP TABLE command. For
example, DROP TABLE Students RESTRICT destroys the Students table unless some view or
integrity constraint refers to Students; if so, the command fails. If the keyword RESTRICT is
replaced by CASCADE, Students is dropped and any ref-erencing views or integrity constraints
are (recursively) dropped as well; one of these two keywords must always be speci ed. A view
can be dropped using the DROP VIEW command, which is just like DROP TABLE.
ALTER TABLE modi es the structure of an existing table. To add a column called maiden-name
to Students, for example, we would use the following command:
TRIGGERS
A trigger action can examine the answers to the query in the condition part of the trigger, refer to
old and new values of tuples modified by the statement activating the trigger, execute new
queries, and make changes to the database.
(identifying the modified table, Students, and the kind of modifying statement, an
INSERT), and the third field is the number of inserted Students tuples with age < 18.
(The trigger in Figure 5.19 only computes the count; an additional trigger is required to
insert the appropriate tuple into the statistics table.)