sid

advertisement
SQL
CS 186, Spring 2007, Lecture 7
R&G, Chapter 5
Mary Roth
The important thing is not to
stop questioning.
Albert Einstein
Life is just a bowl of queries.
-Anon
(not Forrest Gump)

Administrivia
•
Homework 1 due Thursday, Feb 8 10 p.m.
•
Source code for diskmgr and global are available on
class web site
•
Coming up:
– Homework 2 handed out Feb 13
– Midterm 1: in class February 22
Questions?
•
Review
• Query languages provide 2 key advantages:
– Less work for user asking query
– More opportunities for optimization
• Algebra and safe calculus are simple and powerful
models for query languages for relational model
– Have same expressive power
– Algebra is more operational; calculus is more
declarative
• SQL can express every query that is expressible in
relational algebra/calculus. (and more)
Review: Where have we been?
Theory
Relational Calculus
Practice
Lecture 6
Query Optimization
and Execution
Relational Operators
Relational Algebra
Relational Model
Lecture 5
Files and Access Methods
Lectures 3 &4
Buffer Management
Disk Space Management
Lecture 2
DB
Where are we going next?
This week
After the
midterm
Practice
Query Optimization
and Execution
Relational Operators
SQL
Next week
Files and Access Methods
Buffer Management
Disk Space Management
DB
Review: Relational Calculus Example
Find names, ages and reservation dates
of sailors rated > 7 who’ve reserved
boat #103
3 quantifiers, but only 1 is free.
The free quantifier defines the shape of the
result.
sid
S 22
S 31
S 58
R
{S1 |  SSailors  S.rating > 7 
 R(RReserves  R.bid = 103
 R.sid = S.sid) 
(S1.sname = S.name 
S1.age = S.age 
S1.day = R.day)}
R
sname rating age
dustin
7
45.0
lubber
8
55.5
rusty
10
35.0
sid
22
58
bid
101
103
sname
S1
day
10/10/96
11/12/96
age
day
rusty 35.0 11/12/96
Review: The SQL Query Language
• The most widely used relational query
language.
• Standardized
(although most systems add their own “special sauce”
-- including PostgreSQL)
• We will study SQL92 -- a basic subset
Review: SQL
• Two sublanguages:
– DDL – Data Definition Language
• Define and modify schema (at all 3 levels)
– DML – Data Manipulation Language
• Queries and IUD (insert update delete)
• DBMS is responsible for efficient evaluation.
– Relational completeness means we can define
precise semantics for relational queries.
– Optimizer can re-order operations, without affecting
query answer.
– Choices driven by “cost model”
Review: DDL
CREATE TABLE Sailors
(sid INTEGER,NOT NULL,
sname CHAR(20),
rating INTEGER,
age REAL,
PRIMARY KEY sid)
CREATE TABLE Boats
(bid INTEGER,NOT NULL,
bname CHAR (20),
color CHAR(10)
PRIMARY KEY bid)
CREATE TABLE Reserves
(sid INTEGER, NOT NULL,
bid INTEGER, NOT NULL,
day DATE,NOT NULL,
PRIMARY KEY (sid, bid, day),
FOREIGN KEY sid REFERENCES Sailors,
FOREIGN KEY bid REFERENCES Boats)
Sailors
sid sname rating age
1
Frodo
7
22
2
Bilbo
2
39
3
Sam
8
27
Boats
bid
bname
color
101
Nina
red
102
Pinta
blue
103
Santa Maria red
Reserves
sid
bid
day
1
102
9/12
2
102
9/13
Integrity Constraints (ICs)
• A foreign key constraint is an Integrity Constraint:
– a condition that must be true for any instance of the database;
– Specified when schema is defined.
– Checked when relations are modified.
• Primary/foreign key constraints; but databases support
more general constraints as well.
– e.g. domain constraints like:
• Rating must be between 1 and 10
ALTER TABLE SAILORS
ADD CONSTRAINT RATING
CHECK (RATING >= 1 AND RATING < 10)
• Or even more complex (and potentially nonsensical):
ALTER TABLE SAILORS
ADD CONSTRAINT RATING
CHECK (RATING*AGE/4 <= SID)
DBMSs have fairly sophisticated support
for constraints!
• Specify them on CREATE or ALTER TABLE statements
• Column Constraints:
expressions for column constraint must produce boolean results and
reference the related column’s value only.
NOT NULL | NULL | UNIQUE | PRIMARY KEY | CHECK ( expression)
FOREIGN KEY (column) referenced_table
[ ON DELETE action ] [ ON UPDATE action ] }
action is one of:
NO ACTION, CASCADE, SET NULL, SET DEFAULT
DBMSs have fairly sophisticated support
for constraints!
• Table Constraints:
UNIQUE ( column_name [, ... ] )
PRIMARY KEY ( column_name [, ... ] ) |
CHECK ( expression ) |
FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable
[ ON DELETE action ]
[ ON UPDATE action ] }
Here, expressions, keys, etc can include multiple columns
Integrity Constraints can help prevent
data consistency errors
• …but they have drawbacks:
– Expensive
– Can’t always return a meaningful error back to the
application.
e.g: What if you saw this error when you enrolled in a
course online?
“A violation of the constraint imposed by a unique
index or a unique constraint occurred”.
– Can be inconvenient
e.g. What if the ‘Sailing Class’ application wants to register
new (unrated) sailors with rating 0?
• So they aren’t widely used
– Software developers often prefer to keep the integrity
logic in applications instead
Intermission
SQL DML
• DML includes 4 main statements:
SELECT (query), INSERT, UPDATE and DELETE
We’ll spend a lot of time on this one
• e.g: To find the names of all 19 year old students:
PROJECT
SELECT S.name
FROM Students S
WHERE S.age=19
SELECT
sid
name
login
age gpa
53666 Jones jones@cs
18 3.4
53688 Smith smith@ee
18 3.2
53650 Smith smith@math 19 3.8
Querying Multiple Relations
• Can specify a join over two tables as follows:
SELECT S.name, E.cid PROJECT
SELECT
FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=‘B'
JOIN
sid
53831
53831
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
sid
name
53666 Jones
login
18
3.4
53688 Smith smith@ee 18
3.2
result =
jones@cs
age gpa
S.name
Jones
E.cid
History105
Basic SQL Query
DISTINCT: optional keyword indicating target-list : A list of attributes
answer should not contain duplicates.
of tables in relation-list
In SQL, default is that duplicates
are not eliminated! (Result is called
a “multiset”)
SELECT
[DISTINCT] target-list
FROM
relation-list
WHERE
qualification
qualification : Comparisons
combined using AND, OR and
NOT. Comparisons are Attr op
const or Attr1 op Attr2, where op is
one of ,,,, etc.
relation-list : A list of relation
names, possibly with a rangevariable after each name
Query Semantics
• Semantics of an SQL query are defined in terms of the
following conceptual evaluation strategy:
1. FROM clause: compute cross-product of all tables
2. WHERE clause: Check conditions, discard tuples that fail.
(called “selection”).
3. SELECT clause: Delete unwanted fields. (called
“projection”).
4. If DISTINCT specified, eliminate duplicate rows.
• Probably the least efficient way to compute a query!
– An optimizer will find more efficient strategies to get the
same answer.
Query Semantics Example
SELECT sname
FROM Sailors, Reserves
WHERE Sailors.sid=Reserves.sid AND bid=103
Boats
Sailors
sid sname rating age
1
Frodo
7
22
2
Bilbo
2
39
3
Sam
8
27
bid
bname
color
101
Nina
red
102
Pinta
blue
103
Santa Maria red
X
Reserves
sid
bid
day
1
102
9/12
2
103
9/13
Step 1: Compute the cross product
Sailors
Reserves
sid sname rating age
sid
bid
day
1
Frodo
7
22
1
102
9/12
2
Bilbo
2
39
2
103
9/13
3
Sam
8
27
...
SailorsXReserves
sid
sname
rating
age
sid
bid
day
1
Frodo
7
22
1
102
9/12
1
Frodo
7
22
2
103
9/13
2
Bilbo
2
39
1
102
9/12
2
Bilbo
2
39
2
103
9/13
3
Sam
8
27
1
103
9/12
3
Sam
8
27
2
103
9/13
Step 1: How big?
Sailors
Reserves
sid sname rating age
sid
bid
day
1
Frodo
7
22
1
102
9/12
2
Bilbo
2
39
2
103
9/13
3
Sam
8
27
Question:
If |S| is cardinality of Sailors, and
|R| is cardinality of Reserves,
What is the cardinality of Sailors X Reserves?
Answer: |S| * |R|
|Sailors X Reserves| = 3X2 = 6
Step 2: Check conditions in where clause
SELECT sname
FROM Sailors, Reserves
WHERE Sailors.sid=Reserves.sid AND bid=103
SailorsXReserves
sid
sname
rating
age
sid
bid
day
1
Frodo
7
22
1
102
9/12
1
Frodo
7
22
2
103
9/13
2
Bilbo
2
39
1
102
9/12
2
Bilbo
2
39
2
103
9/13
3
Sam
8
27
1
102
9/12
3
Sam
8
27
2
103
9/13
Step 3: Delete unwanted fields
SELECT sname
FROM Sailors, Reserves
WHERE Sailors.sid=Reserves.sid AND bid=103
SailorsXReserves
sid
sname
rating
age
sid
bid
day
1
Frodo
7
22
1
102
9/12
1
Frodo
7
22
2
103
9/13
2
Bilbo
2
39
1
102
9/12
2
Bilbo
2
39
2
103
9/13
3
Sam
8
27
1
102
9/12
3
Sam
8
27
2
103
9/13
Range Variables
•Used for short hand
•Needed when ambiguity could arise
e.g two tables with the same column name:
SELECT sname
FROM Sailors, Reserves
WHERE Sailors.sid=Reserves.sid AND Reserves.bid=103
SELECT sname
FROM Sailors S, Reserves R
WHERE S.sid=R.sid AND R.bid=103
Question: do range variables remind you of anything?
Variables in relational calculus
Sometimes you need a range variable
e.g a Self-join:
SELECT R1.bid, R1.date
FROM Reserves R1, Reserves R2
WHERE R1.bid = R2.bid and
R1.date = R2.date and
R1.sid != R2.sid
R1
day
103
9/12
103
9/12
Reserves
Reserves
R1
R1
bid
sid
bid
day
1
102
9/12
3
103
9/12
4
103
9/13
2
103
9/12
R2
R2
R2
R2
sid
bid
day
1
102
9/12
3
103
9/12
4
103
9/13
2
103
9/12
Sometimes you need a range variable
SELECT R1.bid, R1.day
FROM Reserves R1, Reserves R2
WHERE R1.bid = R2.bid and
R1.day = R2.day and
R1.sid != R2.sid
bid
day
103
9/12
103
9/12
What are we computing?
Boats reserved on the same day
by different sailors
SELECT Clause Expressions
• Can use “*” if you want all columns:
SELECT *
FROM Sailors x
WHERE x.age > 20
• Can use arithmetic expressions (add other operations
we’ll discuss later)
SELECT S.age, S.age-5 AS age1, 2*S.age AS age2
FROM Sailors S
WHERE S.sname = ‘Dustin’
• Can use AS to provide column names
SELECT S1.sname AS name1, S2.sname AS name2
FROM Sailors S1, Sailors S2
WHERE 2*S1.rating = S2.rating - 1
WHERE Clause Expressions
• Can also have expressions in WHERE clause:
SELECT S1.sname AS name1, S2.sname AS name2
FROM Sailors S1, Sailors S2
WHERE 2*S1.rating = S2.rating - 1
•“LIKE” is used for string matching.
SELECT S.age, S.age-5 AS age1, 2*S.age AS age2
FROM Sailors S
WHERE S.sname LIKE ‘B_l%o’
`_’ stands for any one character and `%’ stands for 0 or more
arbitrary characters.
SELECT DISTINCT
Sailors
Reserves
sid sname rating age
sid
bid
day
1
Frodo
7
22
1
102
9/12
2
Bilbo
2
39
2
103
9/12
3
Sam
8
27
2
102
9/13
SELECT DISTINCT S.sid
FROM
Sailors S, Reserves R
WHERE S.sid=R.sid
sid
Find sailors that have reserved at least one boat
1
2
SELECT DISTINCT
• How about:
SELECT S.sid
FROM
Sailors S, Reserves R
WHERE S.sid=R.sid
sid
1
2
2
Sailors
SELECT DISTINCT
How about:
SELECT S.sname
FROM Sailors S,
Reserves R
WHERE S.sid=R.sid
vs:
SELECT DISTINCT
S.sname
FROM Sailors S,
Reserves R
WHERE S.sid=R.sid
sid sname rating age
sname
Frodo
Bilbo
Bilbo
1
Frodo
7
22
2
Bilbo
2
39
3
Sam
8
27
4
Bilbo
5
32
Reserves
sid
bid
day
1
102
9/12
2
103
9/13
4
105
9/13
sname
Frodo
Bilbo
Do we find all sailors
that reserved at least
one boat?
ANDs, ORs, UNIONs and INTERSECTs
Find sids of sailors who’ve reserved a red or a green boat
SELECT R.sid
FROM Boats B,Reserves R
WHERE(B.color=‘red’ OR
B.color=‘green’)
AND R.bid=B.bid
sid
2
Sailors
sid sname rating age
X
1
Frodo
7
22
2
Bilbo
2
39
3
Sam
8
27
4
Boats
bid
bname
color
101
Nina
red
102
Pinta
blue
103
Santa Maria red
105
Titanic
green
Reserves
sid
bid
day
1
102
9/12
2
103
9/13
4
105
9/13
ANDs and ORs
Find sids of sailors who’ve reserved a red and a green
boat
Boats
SELECT R.sid
bid
bname
FROM Boats B,Reserves R
101
Nina
WHERE(B.color=‘red’ AND
102
Pinta
B.color=‘green’)
103
Santa Maria
AND R.bid=B.bid
X
105
Sailors
sid sname rating age
1
Frodo
7
22
2
Bilbo
2
39
3
Sam
8
27
Titanic
color
red
blue
red
green
Reserves
sid
bid
day
1
101
9/12
2
103
9/13
1
105
9/13
Use INTERSECT instead of
AND
SELECT R.sid
FROM Boats B,Reserves R
WHERE B.color = ‘red’
AND R.bid=B.bid
INTERSECT
SELECT R.sid
FROM Boats B,Reserves R
WHERE B.color = ‘green’
AND R.bid=B.bid
sid
1
2

sid
1
=
Exercise: try to rewrite this
query using a self join instead
of INTERSECT!
Boats
bid
bname
color
101
Nina
red
102
Pinta
blue
103
Santa Maria red
105
Titanic
green
Reserves
sid
bid
day
sid
1
101
9/12
1
2
103
9/13
1
105
9/13
Could also use UNION for
the OR query
SELECT R.sid
FROM Boats B, Reserves R
WHERE B.color = ‘red’
AND R.bid=B.bid
UNION
SELECT R.sid
FROM Boats B, Reserves R
WHERE B.color = ‘green’
AND R.bid=B.bid
sid

2
sid
4
=
Boats
bid
bname
color
101
Nina
red
102
Pinta
blue
103
Santa Maria red
105
Titanic
green
Reserves
sid
sid
bid
day
2
1
102
9/12
4
2
103
9/13
4
105
9/13
EXCEPT: Set Difference
Find sids of sailors who have not
reserved a boat
SELECT S.sid
FROM
Sailors S
EXCEPT
SELECT S.sid
FROM
Sailors S,
Reserves R
WHERE S.sid=R.sid
First find the set of sailors who
have reserved a boat…
and then compare it with the
rest of the sailors
Sailors
sid sname rating age
1
Frodo
7
22
2
Bilbo
2
39
3
Sam
8
27
Reserves
sid
bid
day
1
102
9/12
2
103
9/13
1
105
9/13
sid
3
Download