CS425 - Database Organization - 2022 Fall

Course webpage for CS425 - 2022 Fall taught by Boris Glavic

Lecture Notes for CS425

Overview

Lecture <2022-08-24 Wed>

Relations

  • $$D_1 = \{ false, true \}$$
  • $$D_2 = \{ 1,2,3 \}$$

R = (A,B)

$$D_1 \times D_2$$

A B
false 1
false 2
false 3
true 1
true 2
true 3
A B
false 1
true 1
true 3
A B
true 3
false 1
true 1

PKs and FKs

Instructor

ID Name SSN salary
1 Peter Smith 11-22 4000
2 Peter Smith 22-33 5000
3 Alice Smith 55-55 4000
  • Superkeys all attributes are always a superkey {ID, Name, SSN, salary }
  • Superkeys: K1 = {ID}, K2 = {SSN}, K3 = {ID,SSN}, K4 = {ID,Name}
  • Candidate keys are K1, K2 (the only minimal superkeys)

    • e.g., K3 = {ID,SSN} is not minimal, because superkey K1 = {ID} is a proper subset of K3

JSON

  • JSON data allows nesting of object ({}) and array ([]) collection types

    • e.g., the value of the address field available for some customers is itself an object
  • JSON is semi-structured, e.g., only some customers have an address field
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
    "customers": [
        {
        "name": "Peter",
        "age": 79
    },
        {
        "name": "Peter",
        "address": {
            "city": "Chicago",
            "zip": 60616,
            "street": "40 31st Street"
        }
    },
        {
        "name": "Peter",
        "age": 79
    }
    ],
    "products": [
        {
        "id": 15324,
        "price": 56,
        "name": "Lawnmover"
    }
    ]
}

Lecture <2022-08-31 Wed>

Recap relational model

  • structured data model

    • compare to semi-structured (JSON) and unstructure (text document)

Schema vs. Instance (data)

schema
  • relation schema: R(A1: D1, ..., An: Dn), R(A1, ..., An)

    • Student(Name,CWID,GPA)
  • database schema: D = {R1, ..., Rm}

    • UniversityDB = { Student(Name,CWID,GPA), Course(Title, Nr, Major), ...
instance (data)
  • relation instance of schema R(A1: D1, ..., An: Dn):

    • it is a set of tuples (rows) which is a subset D1 x ... x Dn
  • database instance of DB schema UniversityDB = { Student(Name:string,CWID:string,GPA:float), Course(Title:string, Nr:int, Major:string)}

    • is a set of relation instance, one for each relation schema

Integrity Constraints

  • candidate keys / primary keys

    • super key: set of attributes that uniquely identify each row in a table
    • candidate key: minimal superkeys
    • primary key: user-selected candidate used to identify rows
  • foreign keys

    • "link" tuples from one table to another by referring to the primary key of the other table
foreign key violations
IName Dname
A BIO
B BIO
C BIO
D CS
E CS
DName Building
BIO the big building
CS the small building
  • delete BIO
  • change departments of existing BIO instructors to something else and then delete
  • (1) change departments of existing BIO instructors to NULL
  • (2) delete instructors for BIO
  • (3) don't do it and throw ERROR
  • we will see later that relational database implement options (1) to (3) and allow the user to choose when they define a foreign key, which option should be used

Lecture <2022-09-14 Wed> and <2022-09-19 Mon>

Application Domain for Project

banking system

persons (roles)
  • clients / account holders
  • employee

    • teller
    • loan specialist
    • manager
data
  • account holders
  • accounts

    • account types
  • loans
processes
  • deposit / withdraw / transfer

    • data: accounts
    • persons: clients
  • manage accounts (create / close / change type?)

    • data: accounts, account holders
    • persons: clients, employee
  • loan management

    • data: loans, accounts, clients
    • persons: employee, clients
  • interest management

    • data: accounts
    • person: (automatic)
  • analytics

    • data: accounts, loans, clients, account types
    • person: managers

Postgres Overview

Postgres is a server-based database. That the postgres process runs in the background listing on a network port on the machine you are running it on. You can then connect to the server using a client application that understand the network protocol postgres is using. A Postgres server process manages a directory called the cluster which stores the content of your database on disk. When a postgres server is started, it is assigned a cluster directory to manage. Once you have a running Postgres server you can connect with a client application to send SQL commands to the server and receive results. Some important client applications are discussed below.

Cluster directory

Depending on how you installed Postgres, a cluster directory may have been created for you already. If not then use the initdb command as explained below to create one yourself. A cluster directory is used by Postgres to store your database files. The directory also contains important configuration files:

  • postgresql.conf - configuration settings

    • listen_addresses = '*' - allow connections from anywhere
  • pg_hba.conf - controls access to the database

    • each line is rule, first matching rule applies

Postgres binaries

initdb - creates clusters
1
2
mkdir cs425
initdb -D cs425
pg_ctl - start / stop servers
  • start server at cluster directory DIR writing logs to LOGFILE
1
pg_ctl -D DIR -l LOGFILE start
  • stop server at cluster directory DIR
1
pg_ctl -D DIR stop
psql - commandline client
  • connect to running server and run SQL commands
  • -p - network port
  • -h - host (IP address) where 127.0.0.1 is localhost
  • -U - postgres user to connect with
  • last one is the database to connect to (default DB created is postgres)
1
psql -h 127.0.0.1 -U lord_pretzel -p 5556 postgres

Client applications (pgAdmin)

Many client applications exist that can talk to a postgres server. For our purpose we want a GUI or CLI to iteratively run SQL commands and inspect their results. We already discussed the build-in psql CLI above. And explain how to use pgAdmin a widely used GUI client in the following.

pgAdmin

You can download pgAdmin from here: https://www.pgadmin.org/.

Setup
  1. When you first start pgAdmin, it will ask you to set a master password. Do that first.
  2. To add a new database connection for your Postgres server, right-click on the browser shown on the left-hand side and select Register->Server
Register Server with pgAdmin
Register Server with pgAdmin
  1. give the server a name (any name would do)
Select connection parameters
Select connection parameters
  1. then click on connection and enter the connection information

    1. host: if you are running postgres locally this would be 127.0.0.1 (the local machine)
    2. port: unless you are running the postgres server on a non-standard port this will be 5432
    3. maintenance database: the first database to connect to (for most installation methods this would be postgres)
    4. username: your database user name (for most installations this would be postgres)
    5. password: the password of your database user
    6. Save password: select yes to not have to enter the password every time you connect to the server
Inspecting the database schema

To inspect the schema of a database, select the server in the left-hand side browser and expand it. Then expand the schema public and then select tables.

Browse database schema
Browse database schema

Writing SQL

Rightclick on a database in the browser on the left and select QueryTool

Open the Query Tool
Open the Query Tool

Then you can enter queries and execute them with the "play" button.

Create schema and insert data

The following sql script generates the university example schema. The script is also available here: https://github.com/IITDBGroup/cs425/blob/master/university_schema_postgres.sql

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
DROP TABLE IF EXISTS prereq CASCADE;
DROP TABLE IF EXISTS time_slot CASCADE;
DROP TABLE IF EXISTS advisor CASCADE;
DROP TABLE IF EXISTS takes CASCADE;
DROP TABLE IF EXISTS student CASCADE;
DROP TABLE IF EXISTS teaches CASCADE;
DROP TABLE IF EXISTS section CASCADE;
DROP TABLE IF EXISTS instructor CASCADE;
DROP TABLE IF EXISTS course CASCADE;
DROP TABLE IF EXISTS department CASCADE;
DROP TABLE IF EXISTS classroom CASCADE;

CREATE TABLE classroom
  (building		varchar(15),
  room_number	varchar(7),
  capacity		numeric(4,0),
  PRIMARY KEY (building, room_number)
  );

CREATE TABLE department
  (dept_name		varchar(20),
  building		varchar(15),
  budget		        numeric(12,2) CHECK (budget > 0),
  PRIMARY KEY (dept_name)
  );

CREATE TABLE course
  (course_id		varchar(8),
  title			varchar(50),
  dept_name		varchar(20),
  credits		numeric(2,0) CHECK (credits > 0),
  PRIMARY KEY (course_id),
  FOREIGN KEY (dept_name) REFERENCES department
  ON DELETE SET NULL
  );

CREATE TABLE instructor
  (ID			varchar(5),
  name			varchar(20) NOT NULL,
  dept_name		varchar(20),
  salary			numeric(8,2) CHECK (salary > 29000),
  PRIMARY KEY (ID),
  FOREIGN KEY (dept_name) REFERENCES department
  ON DELETE SET NULL
  );

CREATE TABLE section
  (course_id		varchar(8),
  sec_id			varchar(8),
  semester		varchar(6)  CHECK (semester in ('Fall', 'Winter', 'Spring', 'Summer')),
  year			numeric(4,0) CHECK (year > 1701 AND year < 2100),
  building		varchar(15),
  room_number		varchar(7),
  time_slot_id		varchar(4),
  PRIMARY KEY (course_id, sec_id, semester, year),
  FOREIGN KEY (course_id) REFERENCES course
  ON DELETE CASCADE,
  FOREIGN KEY (building, room_number) REFERENCES classroom
  ON DELETE SET NULL
  );

CREATE TABLE teaches
  (ID			varchar(5),
  course_id		varchar(8),
  sec_id			varchar(8),
  semester		varchar(6),
  year			numeric(4,0),
  PRIMARY KEY (ID, course_id, sec_id, semester, year),
  FOREIGN KEY (course_id,sec_id, semester, year) REFERENCES section
  ON DELETE CASCADE,
  FOREIGN KEY (ID) REFERENCES instructor
  ON DELETE CASCADE
  );

CREATE TABLE student
  (ID			varchar(5),
  name			varchar(20) NOT NULL,
  dept_name		varchar(20),
  tot_cred		numeric(3,0) CHECK (tot_cred >= 0),
  PRIMARY KEY (ID),
  FOREIGN KEY (dept_name) REFERENCES department
  ON DELETE SET NULL
  );

CREATE TABLE takes
  (ID			varchar(5),
  course_id		varchar(8),
  sec_id			varchar(8),
  semester		varchar(6),
  year			numeric(4,0),
  grade		        varchar(2),
  PRIMARY KEY (ID, course_id, sec_id, semester, year),
  FOREIGN KEY (course_id,sec_id, semester, year) REFERENCES section
  ON DELETE CASCADE,
  FOREIGN KEY (ID) REFERENCES student
  ON DELETE CASCADE
  );

CREATE TABLE advisor
  (s_ID			varchar(5),
  i_ID			varchar(5),
  PRIMARY KEY (s_ID),
  FOREIGN KEY (i_ID) REFERENCES instructor (ID)
  ON DELETE SET NULL,
  FOREIGN KEY (s_ID) REFERENCES student (ID)
  ON DELETE CASCADE
  );

CREATE TABLE time_slot
  (time_slot_id		varchar(4),
  day			varchar(1),
  start_hr		numeric(2) CHECK (start_hr >= 0 AND start_hr < 24),
  start_min		numeric(2) CHECK (start_min >= 0 AND start_min < 60),
  end_hr			numeric(2) CHECK (end_hr >= 0 AND end_hr < 24),
  end_min		numeric(2) CHECK (end_min >= 0 AND end_min < 60),
  PRIMARY KEY (time_slot_id, day, start_hr, start_min)
  );

CREATE TABLE prereq
  (course_id		varchar(8),
  prereq_id		varchar(8),
  PRIMARY KEY (course_id, prereq_id),
  FOREIGN KEY (course_id) REFERENCES course
  ON DELETE CASCADE,
  FOREIGN KEY (prereq_id) REFERENCES course
  );

DELETE FROM prereq;
DELETE FROM time_slot;
DELETE FROM advisor;
DELETE FROM takes;
DELETE FROM student;
DELETE FROM teaches;
DELETE FROM section;
DELETE FROM instructor;
DELETE FROM course;
DELETE FROM department;
DELETE FROM classroom;
INSERT INTO classroom VALUES ('Packard', '101', '500');
INSERT INTO classroom VALUES ('Painter', '514', '10');
INSERT INTO classroom VALUES ('Taylor', '3128', '70');
INSERT INTO classroom VALUES ('Watson', '100', '30');
INSERT INTO classroom VALUES ('Watson', '120', '50');
INSERT INTO department VALUES ('Biology', 'Watson', '90000');
INSERT INTO department VALUES ('Comp. Sci.', 'Taylor', '100000');
INSERT INTO department VALUES ('Elec. Eng.', 'Taylor', '85000');
INSERT INTO department VALUES ('Finance', 'Painter', '120000');
INSERT INTO department VALUES ('History', 'Painter', '50000');
INSERT INTO department VALUES ('Music', 'Packard', '80000');
INSERT INTO department VALUES ('Physics', 'Watson', '70000');
INSERT INTO course VALUES ('BIO-101', 'Intro. to Biology', 'Biology', '4');
INSERT INTO course VALUES ('BIO-301', 'Genetics', 'Biology', '4');
INSERT INTO course VALUES ('BIO-399', 'Computational Biology', 'Biology', '3');
INSERT INTO course VALUES ('CS-101', 'Intro. to Computer Science', 'Comp. Sci.', '4');
INSERT INTO course VALUES ('CS-190', 'Game Design', 'Comp. Sci.', '4');
INSERT INTO course VALUES ('CS-315', 'Robotics', 'Comp. Sci.', '3');
INSERT INTO course VALUES ('CS-319', 'Image Processing', 'Comp. Sci.', '3');
INSERT INTO course VALUES ('CS-347', 'Database System Concepts', 'Comp. Sci.', '3');
INSERT INTO course VALUES ('EE-181', 'Intro. to Digital Systems', 'Elec. Eng.', '3');
INSERT INTO course VALUES ('FIN-201', 'Investment Banking', 'Finance', '3');
INSERT INTO course VALUES ('HIS-351', 'World History', 'History', '3');
INSERT INTO course VALUES ('MU-199', 'Music Video Production', 'Music', '3');
INSERT INTO course VALUES ('PHY-101', 'Physical Principles', 'Physics', '4');
INSERT INTO instructor VALUES ('10101', 'Srinivasan', 'Comp. Sci.', '65000');
INSERT INTO instructor VALUES ('12121', 'Wu', 'Finance', '90000');
INSERT INTO instructor VALUES ('15151', 'Mozart', 'Music', '40000');
INSERT INTO instructor VALUES ('22222', 'Einstein', 'Physics', '95000');
INSERT INTO instructor VALUES ('32343', 'El Said', 'History', '60000');
INSERT INTO instructor VALUES ('33456', 'Gold', 'Physics', '87000');
INSERT INTO instructor VALUES ('45565', 'Katz', 'Comp. Sci.', '75000');
INSERT INTO instructor VALUES ('58583', 'Califieri', 'History', '62000');
INSERT INTO instructor VALUES ('76543', 'Singh', 'Finance', '80000');
INSERT INTO instructor VALUES ('76766', 'Crick', 'Biology', '72000');
INSERT INTO instructor VALUES ('83821', 'Brandt', 'Comp. Sci.', '92000');
INSERT INTO instructor VALUES ('98345', 'Kim', 'Elec. Eng.', '80000');
INSERT INTO section VALUES ('BIO-101', '1', 'Summer', '2009', 'Painter', '514', 'B');
INSERT INTO section VALUES ('BIO-301', '1', 'Summer', '2010', 'Painter', '514', 'A');
INSERT INTO section VALUES ('CS-101', '1', 'Fall', '2009', 'Packard', '101', 'H');
INSERT INTO section VALUES ('CS-101', '1', 'Spring', '2010', 'Packard', '101', 'F');
INSERT INTO section VALUES ('CS-190', '1', 'Spring', '2009', 'Taylor', '3128', 'E');
INSERT INTO section VALUES ('CS-190', '2', 'Spring', '2009', 'Taylor', '3128', 'A');
INSERT INTO section VALUES ('CS-315', '1', 'Spring', '2010', 'Watson', '120', 'D');
INSERT INTO section VALUES ('CS-319', '1', 'Spring', '2010', 'Watson', '100', 'B');
INSERT INTO section VALUES ('CS-319', '2', 'Spring', '2010', 'Taylor', '3128', 'C');
INSERT INTO section VALUES ('CS-347', '1', 'Fall', '2009', 'Taylor', '3128', 'A');
INSERT INTO section VALUES ('EE-181', '1', 'Spring', '2009', 'Taylor', '3128', 'C');
INSERT INTO section VALUES ('FIN-201', '1', 'Spring', '2010', 'Packard', '101', 'B');
INSERT INTO section VALUES ('HIS-351', '1', 'Spring', '2010', 'Painter', '514', 'C');
INSERT INTO section VALUES ('MU-199', '1', 'Spring', '2010', 'Packard', '101', 'D');
INSERT INTO section VALUES ('PHY-101', '1', 'Fall', '2009', 'Watson', '100', 'A');
INSERT INTO teaches VALUES ('10101', 'CS-101', '1', 'Fall', '2009');
INSERT INTO teaches VALUES ('10101', 'CS-315', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('10101', 'CS-347', '1', 'Fall', '2009');
INSERT INTO teaches VALUES ('12121', 'FIN-201', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('15151', 'MU-199', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('22222', 'PHY-101', '1', 'Fall', '2009');
INSERT INTO teaches VALUES ('32343', 'HIS-351', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('45565', 'CS-101', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('45565', 'CS-319', '1', 'Spring', '2010');
INSERT INTO teaches VALUES ('76766', 'BIO-101', '1', 'Summer', '2009');
INSERT INTO teaches VALUES ('76766', 'BIO-301', '1', 'Summer', '2010');
INSERT INTO teaches VALUES ('83821', 'CS-190', '1', 'Spring', '2009');
INSERT INTO teaches VALUES ('83821', 'CS-190', '2', 'Spring', '2009');
INSERT INTO teaches VALUES ('83821', 'CS-319', '2', 'Spring', '2010');
INSERT INTO teaches VALUES ('98345', 'EE-181', '1', 'Spring', '2009');
INSERT INTO student VALUES ('00128', 'Zhang', 'Comp. Sci.', '102');
INSERT INTO student VALUES ('12345', 'Shankar', 'Comp. Sci.', '32');
INSERT INTO student VALUES ('19991', 'Brandt', 'History', '80');
INSERT INTO student VALUES ('23121', 'Chavez', 'Finance', '110');
INSERT INTO student VALUES ('44553', 'Peltier', 'Physics', '56');
INSERT INTO student VALUES ('45678', 'Levy', 'Physics', '46');
INSERT INTO student VALUES ('54321', 'Williams', 'Comp. Sci.', '54');
INSERT INTO student VALUES ('55739', 'Sanchez', 'Music', '38');
INSERT INTO student VALUES ('70557', 'Snow', 'Physics', '0');
INSERT INTO student VALUES ('76543', 'Brown', 'Comp. Sci.', '58');
INSERT INTO student VALUES ('76653', 'Aoi', 'Elec. Eng.', '60');
INSERT INTO student VALUES ('98765', 'Bourikas', 'Elec. Eng.', '98');
INSERT INTO student VALUES ('98988', 'Tanaka', 'Biology', '120');
INSERT INTO takes VALUES ('00128', 'CS-101', '1', 'Fall', '2009', 'A');
INSERT INTO takes VALUES ('00128', 'CS-347', '1', 'Fall', '2009', 'A-');
INSERT INTO takes VALUES ('12345', 'CS-101', '1', 'Fall', '2009', 'C');
INSERT INTO takes VALUES ('12345', 'CS-190', '2', 'Spring', '2009', 'A');
INSERT INTO takes VALUES ('12345', 'CS-315', '1', 'Spring', '2010', 'A');
INSERT INTO takes VALUES ('12345', 'CS-347', '1', 'Fall', '2009', 'A');
INSERT INTO takes VALUES ('19991', 'HIS-351', '1', 'Spring', '2010', 'B');
INSERT INTO takes VALUES ('23121', 'FIN-201', '1', 'Spring', '2010', 'C+');
INSERT INTO takes VALUES ('44553', 'PHY-101', '1', 'Fall', '2009', 'B-');
INSERT INTO takes VALUES ('45678', 'CS-101', '1', 'Fall', '2009', 'F');
INSERT INTO takes VALUES ('45678', 'CS-101', '1', 'Spring', '2010', 'B+');
INSERT INTO takes VALUES ('45678', 'CS-319', '1', 'Spring', '2010', 'B');
INSERT INTO takes VALUES ('54321', 'CS-101', '1', 'Fall', '2009', 'A-');
INSERT INTO takes VALUES ('54321', 'CS-190', '2', 'Spring', '2009', 'B+');
INSERT INTO takes VALUES ('55739', 'MU-199', '1', 'Spring', '2010', 'A-');
INSERT INTO takes VALUES ('76543', 'CS-101', '1', 'Fall', '2009', 'A');
INSERT INTO takes VALUES ('76543', 'CS-319', '2', 'Spring', '2010', 'A');
INSERT INTO takes VALUES ('76653', 'EE-181', '1', 'Spring', '2009', 'C');
INSERT INTO takes VALUES ('98765', 'CS-101', '1', 'Fall', '2009', 'C-');
INSERT INTO takes VALUES ('98765', 'CS-315', '1', 'Spring', '2010', 'B');
INSERT INTO takes VALUES ('98988', 'BIO-101', '1', 'Summer', '2009', 'A');
INSERT INTO takes VALUES ('98988', 'BIO-301', '1', 'Summer', '2010', null);
INSERT INTO advisor VALUES ('00128', '45565');
INSERT INTO advisor VALUES ('12345', '10101');
INSERT INTO advisor VALUES ('23121', '76543');
INSERT INTO advisor VALUES ('44553', '22222');
INSERT INTO advisor VALUES ('45678', '22222');
INSERT INTO advisor VALUES ('76543', '45565');
INSERT INTO advisor VALUES ('76653', '98345');
INSERT INTO advisor VALUES ('98765', '98345');
INSERT INTO advisor VALUES ('98988', '76766');
INSERT INTO time_slot VALUES ('A', 'M', '8', '0', '8', '50');
INSERT INTO time_slot VALUES ('A', 'W', '8', '0', '8', '50');
INSERT INTO time_slot VALUES ('A', 'F', '8', '0', '8', '50');
INSERT INTO time_slot VALUES ('B', 'M', '9', '0', '9', '50');
INSERT INTO time_slot VALUES ('B', 'W', '9', '0', '9', '50');
INSERT INTO time_slot VALUES ('B', 'F', '9', '0', '9', '50');
INSERT INTO time_slot VALUES ('C', 'M', '11', '0', '11', '50');
INSERT INTO time_slot VALUES ('C', 'W', '11', '0', '11', '50');
INSERT INTO time_slot VALUES ('C', 'F', '11', '0', '11', '50');
INSERT INTO time_slot VALUES ('D', 'M', '13', '0', '13', '50');
INSERT INTO time_slot VALUES ('D', 'W', '13', '0', '13', '50');
INSERT INTO time_slot VALUES ('D', 'F', '13', '0', '13', '50');
INSERT INTO time_slot VALUES ('E', 'T', '10', '30', '11', '45 ');
INSERT INTO time_slot VALUES ('E', 'R', '10', '30', '11', '45 ');
INSERT INTO time_slot VALUES ('F', 'T', '14', '30', '15', '45 ');
INSERT INTO time_slot VALUES ('F', 'R', '14', '30', '15', '45 ');
INSERT INTO time_slot VALUES ('G', 'M', '16', '0', '16', '50');
INSERT INTO time_slot VALUES ('G', 'W', '16', '0', '16', '50');
INSERT INTO time_slot VALUES ('G', 'F', '16', '0', '16', '50');
INSERT INTO time_slot VALUES ('H', 'W', '10', '0', '12', '30');
INSERT INTO prereq VALUES ('BIO-301', 'BIO-101');
INSERT INTO prereq VALUES ('BIO-399', 'BIO-101');
INSERT INTO prereq VALUES ('CS-190', 'CS-101');
INSERT INTO prereq VALUES ('CS-315', 'CS-101');
INSERT INTO prereq VALUES ('CS-319', 'CS-101');
INSERT INTO prereq VALUES ('CS-347', 'CS-101');
INSERT INTO prereq VALUES ('EE-181', 'PHY-101');
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
DELETE 0
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1

More DDL examples

  • create a table for storing information about seminars
1
2
3
4
5
6
7
8
CREATE TABLE seminars (
  title VARCHAR(200), -- could use TEXT is postgres
  presenter VARCHAR(200),
  semdate TIMESTAMP,
  building		varchar(15), -- PK attributes from classroom for the FK
  room_number	varchar(7), -- PK attributes from classroom for the FK
  FOREIGN KEY (building, room_number) REFERENCES classroom -- seminars should be scheduled in existing rooms
  );
CREATE TABLE

DDL change table definitions

dropping a table

1
DROP TABLE seminars;
DROP TABLE

alter table

  • adding an attribute (new values will be NULL
1
ALTER TABLE seminars ADD organizing_dept varchar(20);
ALTER TABLE
1
SELECT * FROM seminars;
title presenter semdate building room_number organizing_dept
  • add foreign key
1
ALTER TABLE seminars ADD FOREIGN KEY (organizing_dept) REFERENCES department;
ALTER TABLE

DML - inserts and delete

Insert

  • insert a single row into a table
1
INSERT INTO seminars VALUES ('Why databases are great', 'Bert', '2022-09-28 9:00', 'Packard', '101', 'Comp. Sci.');
INSERT 0 1
1
INSERT INTO seminars VALUES ('Why databases are stupid', 'Bert', '2022-09-29 10:00', 'Packard', '101', 'Comp. Sci.');
INSERT 0 1
1
SELECT * FROM seminars;
title presenter semdate building room_number organizing_dept
Why databases are great Bert 2022-09-28 09:00:00 Packard 101 Comp. Sci.
Why databases are stupid Bert 2022-09-29 10:00:00 Packard 101 Comp. Sci.

Delete

  • delete all rows that fulfill WHERE condition from table
1
DELETE FROM seminars WHERE presenter = 'Bert';
DELETE 2
1
SELECT * FROM seminars;
title presenter semdate building room_number organizing_dept
1
2
3
INSERT INTO seminars VALUES ('Why databases are great', 'Bert', '2022-09-28 9:00', 'Packard', '101', 'Comp. Sci.');
INSERT INTO seminars VALUES ('Why databases are stupid', 'Bert', '2022-09-29 10:00', 'Packard', '101', 'Comp. Sci.');
INSERT INTO seminars VALUES ('Why Mozart is great', 'Alice', '2022-10-30 11:00', 'Packard', '101', 'Music');
INSERT 0 1
INSERT 0 1
INSERT 0 1
1
SELECT * FROM seminars;
title presenter semdate building room_number organizing_dept
Why databases are great Bert 2022-09-28 09:00:00 Packard 101 Comp. Sci.
Why databases are stupid Bert 2022-09-29 10:00:00 Packard 101 Comp. Sci.
Why Mozart is great Alice 2022-10-30 11:00:00 Packard 101 Music
1
DELETE FROM seminars WHERE organizing_dept = 'Music' OR semdate > '2022-10-01 0:00';
DELETE 1
1
SELECT * FROM seminars;
title presenter semdate building room_number organizing_dept
Why databases are great Bert 2022-09-28 09:00:00 Packard 101 Comp. Sci.
Why databases are stupid Bert 2022-09-29 10:00:00 Packard 101 Comp. Sci.

DML - queries

basic SELECT-FROM-WHERE blocks

examples
  • * means all attributes from tables in the FROM clause
1
SELECT * FROM student;
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
12345 Shankar Comp. Sci. 32
19991 Brandt History 80
23121 Chavez Finance 110
44553 Peltier Physics 56
45678 Levy Physics 46
54321 Williams Comp. Sci. 54
55739 Sanchez Music 38
70557 Snow Physics 0
76543 Brown Comp. Sci. 58
76653 Aoi Elec. Eng. 60
98765 Bourikas Elec. Eng. 98
98988 Tanaka Biology 120
  • Names of students studying CS
1
SELECT name FROM student WHERE dept_name = 'Comp. Sci.';
name
Zhang
Shankar
Williams
Brown
  • alias in FROM clause: TABLENAME ALIAS or TABLENAME AS ALIAS
  • dot notation to specify which table an attribute belongs to TABLE.ATTR
  • in SELECT clause us AS to rename a result column ATTRIBUTE AS NEWNAME
1
2
3
SELECT name, d.dept_name AS major, building AS dept_building
FROM student s, department d
WHERE s.dept_name = d.dept_name AND (building = 'Taylor' OR building = 'Watson');
name major dept_building
Zhang Comp. Sci. Taylor
Shankar Comp. Sci. Taylor
Peltier Physics Watson
Levy Physics Watson
Williams Comp. Sci. Taylor
Snow Physics Watson
Brown Comp. Sci. Taylor
Aoi Elec. Eng. Taylor
Bourikas Elec. Eng. Taylor
Tanaka Biology Watson
relational algebra equivalence
1
2
3
SELECT A1, ..., An
FROM R1, ..., Rm
WHERE P

is equivalent to $$\pi_{A_1, \ldots, A_n}(\sigma_{P}(R_1 \times \ldots \times R_m)$$

casting

SQL databases automatically cast between datatypes where possible, e.g., in the condition mydatecolumn < '2002-01-01' we are comparing a DATE against a VARCHAR (string) value. Postgres would detect that it does not have a comparison operator < for these types and correctly decide to try to cast '2002-01-01' as a DATE.

You can also manually cast between types:

  • cast result of expression as datatype: expr::datatype
  • SQL standard CAST(expr AS datatype)
1
2
3
SELECT room_number::INT AS p,
       CAST(room_number AS INT) AS x
  FROM classroom;
p x
101 101
514 514
3128 3128
100 100
120 120

keywords are case-insensitive

1
SeLeCT name FRom student;
name
Zhang
Shankar
Brandt
Chavez
Peltier
Levy
Williams
Sanchez
Snow
Brown
Aoi
Bourikas
Tanaka

joins

natural join
1
2
SELECT *
  FROM instructor NATURAL JOIN teaches;
id name dept_name salary course_id sec_id semester year
10101 Srinivasan Comp. Sci. 65000.00 CS-101 1 Fall 2009
10101 Srinivasan Comp. Sci. 65000.00 CS-315 1 Spring 2010
10101 Srinivasan Comp. Sci. 65000.00 CS-347 1 Fall 2009
12121 Wu Finance 90000.00 FIN-201 1 Spring 2010
15151 Mozart Music 40000.00 MU-199 1 Spring 2010
22222 Einstein Physics 95000.00 PHY-101 1 Fall 2009
32343 El Said History 60000.00 HIS-351 1 Spring 2010
45565 Katz Comp. Sci. 75000.00 CS-101 1 Spring 2010
45565 Katz Comp. Sci. 75000.00 CS-319 1 Spring 2010
76766 Crick Biology 72000.00 BIO-101 1 Summer 2009
76766 Crick Biology 72000.00 BIO-301 1 Summer 2010
83821 Brandt Comp. Sci. 92000.00 CS-190 1 Spring 2009
83821 Brandt Comp. Sci. 92000.00 CS-190 2 Spring 2009
83821 Brandt Comp. Sci. 92000.00 CS-319 2 Spring 2010
98345 Kim Elec. Eng. 80000.00 EE-181 1 Spring 2009
outer joins
  • LEFT, RIGHT, FULL
1
SELECT * FROM course NATURAL LEFT OUTER JOIN prereq;
course_id title dept_name credits prereq_id
BIO-301 Genetics Biology 4 BIO-101
BIO-399 Computational Biology Biology 3 BIO-101
CS-190 Game Design Comp. Sci. 4 CS-101
CS-315 Robotics Comp. Sci. 3 CS-101
CS-319 Image Processing Comp. Sci. 3 CS-101
CS-347 Database System Concepts Comp. Sci. 3 CS-101
EE-181 Intro. to Digital Systems Elec. Eng. 3 PHY-101
MU-199 Music Video Production Music 3
HIS-351 World History History 3
FIN-201 Investment Banking Finance 3
PHY-101 Physical Principles Physics 4
BIO-101 Intro. to Biology Biology 4
CS-101 Intro. to Computer Science Comp. Sci. 4
specifying join conditions
  • NATURAL
  • ON cond, specify join condition cond (as in $\theta$-joins)
  • USING (A1, ... An) join on equality on all attributes A1, ..., An
1
2
SELECT i.name, t.course_id
FROM instructor i JOIN teaches t ON (i.ID = t.ID)
name course_id
Srinivasan CS-101
Srinivasan CS-315
Srinivasan CS-347
Wu FIN-201
Mozart MU-199
Einstein PHY-101
El Said HIS-351
Katz CS-101
Katz CS-319
Crick BIO-101
Crick BIO-301
Brandt CS-190
Brandt CS-190
Brandt CS-319
Kim EE-181

Lecture <2022-09-21 Wed>

SQL is bag semantics

  • bag semantics means we can have duplicates!
1
2
3
SELECT dept_name
FROM student
ORDER BY dept_name;
dept_name
Biology
Comp. Sci.
Comp. Sci.
Comp. Sci.
Comp. Sci.
Elec. Eng.
Elec. Eng.
Finance
History
Music
Physics
Physics
Physics

subqueries

  • FROM clause can contain queries (and not just tables)
1
2
3
4
5
SELECT d.*
FROM department d,
     (SELECT DISTINCT dept_name
        FROM student) dn
WHERE d.dept_name = dn.dept_name;
dept_name building budget
Biology Watson 90000.00
Comp. Sci. Taylor 100000.00
Elec. Eng. Taylor 85000.00
Finance Painter 120000.00
History Painter 50000.00
Music Packard 80000.00
Physics Watson 70000.00

WITH clause

  • query qj can refer to query qi if i < j
1
2
3
4
5
WITH name1 AS q1,
     name2 AS q2,
     ...
     namem AS qm
SELECT ...
1
2
3
4
5
6
WITH dn AS (SELECT DISTINCT dept_name
        FROM student)
SELECT d.*
FROM department d,
     dn
WHERE d.dept_name = dn.dept_name;
dept_name building budget
Biology Watson 90000.00
Comp. Sci. Taylor 100000.00
Elec. Eng. Taylor 85000.00
Finance Painter 120000.00
History Painter 50000.00
Music Packard 80000.00
Physics Watson 70000.00

DISTINCT

  • remove duplicates
1
2
SELECT DISTINCT dept_name
FROM student;
dept_name
Finance
History
Physics
Music
Comp. Sci.
Biology
Elec. Eng.

String operations

1
2
SELECT 'A' || ID || substring(name, 1, 1) AS aid
FROM Student;
aid
A00128Z
A12345S
A19991B
A23121C
A44553P
A45678L
A54321W
A55739S
A70557S
A76543B
A76653A
A98765B
A98988T
  • string pattern matching LIKE

    • . matches any one character
    • % matches any sequence of characters (including empty sequence)
  • return students whose name starts with Z
1
2
3
SELECT *
FROM student
WHERE name LIKE 'Z%';
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
  • return students whose name contains 'a' or starts with 'Z'
1
2
3
SELECT *
FROM student
WHERE name LIKE '%a%' OR name LIKE 'Z%';
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
12345 Shankar Comp. Sci. 32
19991 Brandt History 80
23121 Chavez Finance 110
54321 Williams Comp. Sci. 54
55739 Sanchez Music 38
98765 Bourikas Elec. Eng. 98
98988 Tanaka Biology 120
  • string length
1
2
SELECT name, character_length(name) AS namel
FROM student;
name namel
Zhang 5
Shankar 7
Brandt 6
Chavez 6
Peltier 7
Levy 4
Williams 8
Sanchez 7
Snow 4
Brown 5
Aoi 3
Bourikas 8
Tanaka 6

Case distinction and coalesce

CASE

  • list of rules, first condition $c_i$ that matches, we return $e_i$
1
2
3
4
5
6
case
         when c1 then e1
         when c2 then e2
          
          [else en]
end
1
2
3
4
5
6
7
SELECT name, tot_cred, dept_name AS major,
       CASE
           WHEN dept_name LIKE 'Comp%' AND tot_cred >= 40 THEN 'Ready'
           WHEN tot_cred >= 45 THEN 'Ready'
           ELSE 'Not ready'
       END || ' to graduate' AS is_ready
FROM student
name tot_cred major is_ready
Zhang 102 Comp. Sci. Ready to graduate
Shankar 32 Comp. Sci. Not ready to graduate
Brandt 80 History Ready to graduate
Chavez 110 Finance Ready to graduate
Peltier 56 Physics Ready to graduate
Levy 46 Physics Ready to graduate
Williams 54 Comp. Sci. Ready to graduate
Sanchez 38 Music Not ready to graduate
Snow 0 Physics Not ready to graduate
Brown 58 Comp. Sci. Ready to graduate
Aoi 60 Elec. Eng. Ready to graduate
Bourikas 98 Elec. Eng. Ready to graduate
Tanaka 120 Biology Ready to graduate
1
2
3
4
5
6
SELECT name, tot_cred, dept_name AS major
FROM student
WHERE CASE
          WHEN (dept_name LIKE 'Comp%' AND tot_cred >= 40) OR tot_cred >= 45 THEN true
          ELSE false
      END
name tot_cred major
Zhang 102 Comp. Sci.
Brandt 80 History
Chavez 110 Finance
Peltier 56 Physics
Levy 46 Physics
Williams 54 Comp. Sci.
Brown 58 Comp. Sci.
Aoi 60 Elec. Eng.
Bourikas 98 Elec. Eng.
Tanaka 120 Biology
1
2
3
SELECT name, tot_cred, dept_name AS major
FROM student
WHERE (dept_name LIKE 'Comp%' AND tot_cred >= 40) OR tot_cred >= 45
name tot_cred major
Zhang 102 Comp. Sci.
Brandt 80 History
Chavez 110 Finance
Peltier 56 Physics
Levy 46 Physics
Williams 54 Comp. Sci.
Brown 58 Comp. Sci.
Aoi 60 Elec. Eng.
Bourikas 98 Elec. Eng.
Tanaka 120 Biology

coalesce

  • return first non-null value COALESCE(e1,...,en) return first non-null value
1
2
SELECT name, COALESCE(tot_cred, 0)
FROM student;
  • insert student with NULL
1
2
3
4
5
DROP TABLE IF EXISTS otherstudents;
CREATE TABLE otherstudents (name VARCHAR(200), tot_cred NUMERIC(3,0));
DELETE FROM otherstudents;
INSERT INTO otherstudents VALUES ('Peter', NULL);
INSERT INTO otherstudents VALUES ('Alice', 100);
DROP TABLE
CREATE TABLE
DELETE 0
INSERT 0 1
INSERT 0 1
  • can be expressed with CASE
1
2
3
4
5
6
SELECT name,
       CASE
           WHEN tot_cred >= 0 THEN tot_cred
           ELSE 0
       END AS tot_cred
FROM otherstudents;
name tot_cred
Peter 0
Alice 100

IS NULL and IS NOT NULL

  • check whether a value is NULL (or not)
1
2
3
SELECT *
  FROM otherstudents
 WHERE tot_cred IS NOT NULL;
name tot_cred
Alice 100

Ordering results

  • ORDER BY clause orders results on a list of expressions

    • ASC ascending order (smallest first)
    • DESC descending order (largest first)
1
2
3
SELECT name, id AS cid, tot_cred, dept_name
FROM student
ORDER BY tot_cred DESC, cid ASC;
name cid tot_cred dept_name
Tanaka 98988 120 Biology
Chavez 23121 110 Finance
Zhang 00128 102 Comp. Sci.
Bourikas 98765 98 Elec. Eng.
Brandt 19991 80 History
Aoi 76653 60 Elec. Eng.
Brown 76543 58 Comp. Sci.
Peltier 44553 56 Physics
Williams 54321 54 Comp. Sci.
Levy 45678 46 Physics
Sanchez 55739 38 Music
Shankar 12345 32 Comp. Sci.
Snow 70557 0 Physics

Set operations

  • three set operations

    • UNION, INTERSECT, and EXCEPT (set difference)
    • come in two flavors:

      • set version
      • bag version (by add ALL after the operation)
    • applied to queries
1
2
3
4
5
6
Q1 UNION Q2
Q1 UNION ALL Q2
Q1 INTERSECT Q2
Q1 INTERSECT ALL Q2
Q1 EXCEPT Q2
Q1 EXCEPT ALL Q2

set union

1
2
3
4
5
6
SELECT * FROM (
(SELECT name FROM student)
UNION
(SELECT name FROM instructor)
) AS names
ORDER BY name;
name
Aoi
Bourikas
Brandt
Brown
Califieri
Chavez
Crick
Einstein
El Said
Gold
Katz
Kim
Levy
Mozart
Peltier
Sanchez
Shankar
Singh
Snow
Srinivasan
Tanaka
Williams
Wu
Zhang

bag union

  • if we have n duplicate of a row in the left input and m duplicates in the right input, then there will n + m duplicates of this row in the result of UNION ALL
1
2
3
4
5
6
SELECT * FROM (
(SELECT name FROM student)
UNION ALL
(SELECT name FROM instructor)
) AS names
ORDER BY name;
name
Aoi
Bourikas
Brandt
Brandt
Brown
Califieri
Chavez
Crick
Einstein
El Said
Gold
Katz
Kim
Levy
Mozart
Peltier
Sanchez
Shankar
Singh
Snow
Srinivasan
Tanaka
Williams
Wu
Zhang
1
2
3
4
5
6
7
8
SELECT * FROM (
(SELECT name FROM student)
UNION ALL
(SELECT name FROM instructor)
UNION ALL
(SELECT name FROM instructor)
) AS names
ORDER BY name;
name
Aoi
Bourikas
Brandt
Brandt
Brandt
Brown
Califieri
Califieri
Chavez
Crick
Crick
Einstein
Einstein
El Said
El Said
Gold
Gold
Katz
Katz
Kim
Kim
Levy
Mozart
Mozart
Peltier
Sanchez
Shankar
Singh
Singh
Snow
Srinivasan
Srinivasan
Tanaka
Williams
Wu
Wu
Zhang

set & bag intersection

  • set intersection
1
2
3
4
5
6
SELECT * FROM (
(SELECT name FROM student)
INTERSECT
(SELECT name FROM instructor)
) AS names
ORDER BY name;
name
Brandt
  • bag intersection

    • if we have n duplicate of a row in the left input and m duplicates in the right input, then there will min(n,m) duplicates of this row in the result of UNION ALL
1
2
3
4
5
6
SELECT * FROM (
((SELECT name FROM student) UNION ALL (SELECT name FROM student) UNION ALL (SELECT name FROM student))
INTERSECT ALL
((SELECT name FROM instructor) UNION ALL (SELECT name FROM instructor))
) AS names
ORDER BY name;
name
Brandt
Brandt

set & bag difference

1
2
3
4
5
6
SELECT * FROM (
(SELECT name FROM student)
EXCEPT
(SELECT name FROM instructor)
) AS names
ORDER BY name;
name
Aoi
Bourikas
Brown
Chavez
Levy
Peltier
Sanchez
Shankar
Snow
Tanaka
Williams
Zhang
  • no Brandt
  • bag intersection

    • if we have n duplicate of a row in the left input and m duplicates in the right input, then there will max(n - m, 0) duplicates of this row in the result of EXCEPT ALL
1
2
3
4
5
6
SELECT * FROM (
((SELECT name FROM student) UNION ALL (SELECT name FROM student) UNION ALL (SELECT name FROM student))
EXCEPT ALL
((SELECT name FROM instructor))
) AS names
ORDER BY name;
name
Aoi
Aoi
Aoi
Bourikas
Bourikas
Bourikas
Brandt
Brandt
Brown
Brown
Brown
Chavez
Chavez
Chavez
Levy
Levy
Levy
Peltier
Peltier
Peltier
Sanchez
Sanchez
Sanchez
Shankar
Shankar
Shankar
Snow
Snow
Snow
Tanaka
Tanaka
Tanaka
Williams
Williams
Williams
Zhang
Zhang
Zhang

Aggregation and Grouping

  • aggregation are part of the SELECT clause

    • count, avg, min, max, sum
1
2
SELECT count(*) AS num_students, avg(tot_cred) AS avgcred
FROM student;
num_students avgcred
13 65.6923076923076923
  • GROUP BY clause
1
2
3
SELECT dept_name AS major, count(*)::float / 1000.0 AS num_students, round(avg(tot_cred::float / 3.0)::numeric,2) AS avgcred
FROM student
GROUP BY dept_name;
major num_students avgcred
Finance 0.001 36.67
History 0.001 26.67
Physics 0.003 11.33
Music 0.001 12.67
Comp. Sci. 0.004 20.50
Biology 0.001 40.00
Elec. Eng. 0.002 26.33
  • if aggregation and / or group by used, only group-by expressions or aggregated columns can be used in the SELECT clause
1
2
3
SELECT name, dept_name AS major, count(*)::float / 1000.0 AS num_students, round(avg(tot_cred::float / 3.0)::numeric,2) AS avgcred
FROM student
GROUP BY dept_name;
ERROR:  column "student.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT name, dept_name AS major, count(*)::float / 1000.0 AS...

Lecture <2022-09-26 Mon>

Aggregation and NULL and empty inputs

empty tables

1
2
CREATE TABLE IF NOT EXISTS myemptytable(a int, b int);
TRUNCATE myemptytable;
CREATE TABLE
  • count over an empty table returns 0
  • sum and other aggregates return NULL
1
2
SELECT count(*), sum(A), sum(A) IS NULL
FROM myemptytable;
count sum ?column?
0 t

some null values in tables

1
2
INSERT INTO myemptytable VALUES (NULL, 10);
INSERT INTO myemptytable VALUES (NULL, 20);
INSERT 0 1
INSERT 0 1
1
SELECT * FROM myemptytable;
a b
10
20
1
2
SELECT count(*), sum(A), sum(A) IS NULL
FROM myemptytable;
count sum ?column?
2 t
1
2
INSERT INTO myemptytable VALUES (5, 10);
INSERT INTO myemptytable VALUES (5, 20);
INSERT 0 1
INSERT 0 1
  • if there are some non-null values, all null values are ignored
1
2
SELECT count(*), sum(A), sum(A) IS NULL
FROM myemptytable;
count sum ?column?
4 10 f

Aggregation with HAVING

  • filter departments with less than 3 students
1
2
3
4
5
6
SELECT nst, dept_name -- can use *
  FROM
    (SELECT count(*) AS nst, dept_name
       FROM student
      GROUP BY dept_name) AS sc
 WHERE nst < 3;
nst dept_name
1 Finance
1 History
1 Music
1 Biology
2 Elec. Eng.
  • HAVING clause filters after aggregation, evaluated after GROUP BY, but before SELECT

    • can only refer to group-by expression and aggregated results
1
2
3
4
SELECT count(*) AS nst, dept_name
  FROM student
 GROUP BY dept_name
HAVING count(*) < 3;
nst dept_name
1 Finance
1 History
1 Music
1 Biology
2 Elec. Eng.

LIMIT and OFFSET and top-k queries

  • order students on tot_creds
1
2
3
SELECT *
FROM student
ORDER BY tot_cred DESC;
id name dept_name tot_cred
98988 Tanaka Biology 120
23121 Chavez Finance 110
00128 Zhang Comp. Sci. 102
98765 Bourikas Elec. Eng. 98
19991 Brandt History 80
76653 Aoi Elec. Eng. 60
76543 Brown Comp. Sci. 58
44553 Peltier Physics 56
54321 Williams Comp. Sci. 54
45678 Levy Physics 46
55739 Sanchez Music 38
12345 Shankar Comp. Sci. 32
70557 Snow Physics 0
  • LIMIT n returns the first n row

    • evaluated after ORDER BY
  • return 3 students with highest tot_cred
1
2
3
4
SELECT *
FROM student
ORDER BY tot_cred DESC
LIMIT 3;
id name dept_name tot_cred
98988 Tanaka Biology 120
23121 Chavez Finance 110
00128 Zhang Comp. Sci. 102
  • return 3 students with lowest tot_cred
1
2
3
4
SELECT *
FROM student
ORDER BY tot_cred ASC
LIMIT 3;
id name dept_name tot_cred
70557 Snow Physics 0
12345 Shankar Comp. Sci. 32
55739 Sanchez Music 38
  • OFFSET n skips first n rows

    • return 4th to 6th students with highest tot_cred
1
2
3
4
SELECT *
FROM student
ORDER BY tot_cred DESC
OFFSET 3 LIMIT 3;
id name dept_name tot_cred
98765 Bourikas Elec. Eng. 98
19991 Brandt History 80
76653 Aoi Elec. Eng. 60

Bag (multiset) relational algebra

Nested & correlated subqueries

scalar subqueries

  • students whose tot_cred are larger than the avg tot_cred
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT s.*
  FROM student s,
       (SELECT avg(tot_cred) AS atc
          FROM student) AS avgct
WHERE tot_cred > atc;

WITH avgct AS (SELECT avg(tot_cred) AS atc
          FROM student)
SELECT s.*
  FROM student s, acgct
WHERE tot_cred > atc;
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
19991 Brandt History 80
23121 Chavez Finance 110
98765 Bourikas Elec. Eng. 98
98988 Tanaka Biology 120
  • scalar subqueries are queries that evaluate to a single row, and can be used like scalar values in, e.g., WHERE clause
1
2
3
4
SELECT *
FROM student
WHERE tot_cred > (SELECT avg(tot_cred)
                    FROM student);
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
19991 Brandt History 80
23121 Chavez Finance 110
98765 Bourikas Elec. Eng. 98
98988 Tanaka Biology 120
  • what happens if scalar subquery returns more than one row?

    • runtime error
1
2
3
4
5
SELECT *
FROM student
WHERE tot_cred > (SELECT avg(tot_cred)
                    FROM student
                  GROUP BY dept_name);

EXISTS subqueries

  • EXISTS q is a boolean operator that returns true, if q returns at least result, false otherwise
1
2
3
SELECT *
FROM student
WHERE EXISTS (SELECT * FROM student WHERE tot_cred > 100);
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
12345 Shankar Comp. Sci. 32
19991 Brandt History 80
23121 Chavez Finance 110
44553 Peltier Physics 56
45678 Levy Physics 46
54321 Williams Comp. Sci. 54
55739 Sanchez Music 38
70557 Snow Physics 0
76543 Brown Comp. Sci. 58
76653 Aoi Elec. Eng. 60
98765 Bourikas Elec. Eng. 98
98988 Tanaka Biology 120

IN subqueries

  • return CS instructors that teach at least one course
1
2
3
SELECT *
FROM instructor i
WHERE dept_name = 'Comp. Sci.' AND (i.id IN (SELECT id FROM teaches));
id name dept_name salary
10101 Srinivasan Comp. Sci. 65000.00
12121 Wu Finance 90000.00
15151 Mozart Music 40000.00
22222 Einstein Physics 95000.00
32343 El Said History 60000.00
45565 Katz Comp. Sci. 75000.00
76766 Crick Biology 72000.00
83821 Brandt Comp. Sci. 92000.00
98345 Kim Elec. Eng. 80000.00

correlations

  • use attribute from the outer query inside the nested subquery (inner query)

    • semantics: for each row from the outer query FROM clause substitute its values for the correlated attributes, then evaluate the resulting inner query.
  • find students where there is at least one student in the same department that has tot_cred > 100
1
2
3
SELECT *
FROM student s1
WHERE EXISTS (SELECT * FROM student s2 WHERE s2.tot_cred > 100 AND s1.dept_name = s2.dept_name);
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
12345 Shankar Comp. Sci. 32
23121 Chavez Finance 110
54321 Williams Comp. Sci. 54
76543 Brown Comp. Sci. 58
98988 Tanaka Biology 120

Lecture <2022-09-28 Wed>

nested subqueries

example of evaluating nested subqueries with correlations

  • find students where there is at least one student in the same department that has tot_cred > 100
1
2
3
SELECT *
FROM student s1
WHERE EXISTS (SELECT * FROM student s2 WHERE s2.tot_cred > 100 AND s1.dept_name = s2.dept_name);
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
12345 Shankar Comp. Sci. 32
23121 Chavez Finance 110
54321 Williams Comp. Sci. 54
76543 Brown Comp. Sci. 58
98988 Tanaka Biology 120
1
SELECT * FROM student ORDER BY dept_name;
id name dept_name tot_cred
98988 Tanaka Biology 120
12345 Shankar Comp. Sci. 32
54321 Williams Comp. Sci. 54
00128 Zhang Comp. Sci. 102
76543 Brown Comp. Sci. 58
76653 Aoi Elec. Eng. 60
98765 Bourikas Elec. Eng. 98
23121 Chavez Finance 110
19991 Brandt History 80
55739 Sanchez Music 38
44553 Peltier Physics 56
70557 Snow Physics 0
45678 Levy Physics 46
  • foreach student, replacet s1.dept_name with the student's dept_name and then evaluate the query and WHERE
id name dept_name tot_cred
12345 Shankar Comp. Sci. 32
1
SELECT * FROM student s2 WHERE s2.tot_cred > 100 AND 'Comp. Sci.' = s2.dept_name;
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
1
SELECT EXISTS (SELECT * FROM student s2 WHERE s2.tot_cred > 100 AND 'Comp. Sci.' = s2.dept_name);
exists
t

other nesting constructs ANY and ALL

  • ANY and ALL can be used comparison operators

    • ANY returns true if for at least one result of the nested query the comparison evaluates to true
    • ALL returns true if for all results of the nested query the comparison evaluates to true
1
2
3
SELECT *
FROM student s1
WHERE tot_cred >= ALL (SELECT tot_cred FROM student s2)
id name dept_name tot_cred
98988 Tanaka Biology 120
1
2
3
SELECT *
FROM student s1
WHERE tot_cred > ANY (SELECT tot_cred FROM student s2)
id name dept_name tot_cred
00128 Zhang Comp. Sci. 102
12345 Shankar Comp. Sci. 32
19991 Brandt History 80
23121 Chavez Finance 110
44553 Peltier Physics 56
45678 Levy Physics 46
54321 Williams Comp. Sci. 54
55739 Sanchez Music 38
76543 Brown Comp. Sci. 58
76653 Aoi Elec. Eng. 60
98765 Bourikas Elec. Eng. 98
98988 Tanaka Biology 120
1
2
3
4
5
6
7
SELECT *
FROM student s1
WHERE tot_cred = ANY (SELECT tot_cred FROM student s2 WHERE s1.dept_name = s2.dept_name AND s1.id <> s2.id);

SELECT *
FROM student s1
WHERE tot_cred IN (SELECT tot_cred FROM student s2 WHERE s1.dept_name = s2.dept_name AND s1.id <> s2.id);
id name dept_name tot_cred

universal quantification with nested subqueries

  • EXISTS is a boolean operator, so we can for example, negate it
  • students that are the only students in their department
1
2
3
SELECT *
FROM student s1
WHERE NOT EXISTS (SELECT * FROM student s2 WHERE s1.id <> s2.id AND s1.dept_name = s2.dept_name);
id name dept_name tot_cred
19991 Brandt History 80
23121 Chavez Finance 110
55739 Sanchez Music 38
98988 Tanaka Biology 120
  • departments where all of the students have a tot_cred > 70

    • first with ALL
1
2
3
SELECT DISTINCT dept_name
FROM student s1
WHERE 70 < ALL (SELECT tot_cred FROM student s2 WHERE s1.dept_name = s2.dept_name);
dept_name
Biology
Finance
History
1
2
3
SELECT DISTINCT dept_name
FROM student s1
WHERE NOT EXISTS (SELECT tot_cred FROM student s2 WHERE s1.dept_name = s2.dept_name AND s2.tot_cred <= 70);
dept_name
Finance
History
Biology
  • Give me the name of students that have taken all Biology classes

    • for student s, FORALL x in Biology classes: EXISTS takes(s,x)

      • NOT EXISTS x in Biology classes: (NOT EXISTS takes(s,x))
1
2
3
4
5
6
7
WITH bioclasses AS (SELECT course_id FROM course WHERE dept_name = 'Biology')

SELECT DISTINCT id
FROM takes t1
WHERE NOT EXISTS (SELECT * FROM bioclasses c
                   WHERE NOT EXISTS (SELECT * FROM takes t2
                                      WHERE c.course_id = t2.course_id AND t1.id = t2.id))
id

Window functions

  • OVER()
1
2
SELECT name, avg(tot_cred) OVER () AS avgc
FROM student;
name avgc
Zhang 65.6923076923076923
Shankar 65.6923076923076923
Brandt 65.6923076923076923
Chavez 65.6923076923076923
Peltier 65.6923076923076923
Levy 65.6923076923076923
Williams 65.6923076923076923
Sanchez 65.6923076923076923
Snow 65.6923076923076923
Brown 65.6923076923076923
Aoi 65.6923076923076923
Bourikas 65.6923076923076923
Tanaka 65.6923076923076923
  • PARTITION BY - basically group by
1
2
SELECT name, tot_cred, dept_name, avg(tot_cred) OVER (PARTITION BY dept_name) AS tcred
  FROM student;
name tot_cred dept_name tcred
Tanaka 120 Biology 120.0000000000000000
Shankar 32 Comp. Sci. 61.5000000000000000
Williams 54 Comp. Sci. 61.5000000000000000
Zhang 102 Comp. Sci. 61.5000000000000000
Brown 58 Comp. Sci. 61.5000000000000000
Aoi 60 Elec. Eng. 79.0000000000000000
Bourikas 98 Elec. Eng. 79.0000000000000000
Chavez 110 Finance 110.0000000000000000
Brandt 80 History 80.0000000000000000
Sanchez 38 Music 38.0000000000000000
Peltier 56 Physics 34.0000000000000000
Snow 0 Physics 34.0000000000000000
Levy 46 Physics 34.0000000000000000
  • can also include / exclude rows based on ordering
1
2
3
SELECT name, tot_cred, dept_name, sum(tot_cred) OVER (ORDER BY tot_cred ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS tcred
  FROM student
ORDER BY tot_cred;
name tot_cred dept_name tcred
Snow 0 Physics 0
Shankar 32 Comp. Sci. 32
Sanchez 38 Music 70
Levy 46 Physics 116
Williams 54 Comp. Sci. 170
Peltier 56 Physics 226
Brown 58 Comp. Sci. 284
Aoi 60 Elec. Eng. 344
Brandt 80 History 424
Bourikas 98 Elec. Eng. 522
Zhang 102 Comp. Sci. 624
Chavez 110 Finance 734
Tanaka 120 Biology 854
1
2
3
SELECT name, tot_cred, dept_name, sum(tot_cred) OVER (ORDER BY tot_cred ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS tcred
  FROM student
ORDER BY tot_cred;
name tot_cred dept_name tcred
Snow 0 Physics 854
Shankar 32 Comp. Sci. 854
Sanchez 38 Music 822
Levy 46 Physics 784
Williams 54 Comp. Sci. 738
Peltier 56 Physics 684
Brown 58 Comp. Sci. 628
Aoi 60 Elec. Eng. 570
Brandt 80 History 510
Bourikas 98 Elec. Eng. 430
Zhang 102 Comp. Sci. 332
Chavez 110 Finance 230
Tanaka 120 Biology 120
1
2
3
SELECT name, tot_cred, dept_name, sum(tot_cred) OVER (ORDER BY tot_cred ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS tcred
  FROM student
ORDER BY tot_cred;
name tot_cred dept_name tcred
Snow 0 Physics 32
Shankar 32 Comp. Sci. 70
Sanchez 38 Music 116
Levy 46 Physics 138
Williams 54 Comp. Sci. 156
Peltier 56 Physics 168
Brown 58 Comp. Sci. 174
Aoi 60 Elec. Eng. 198
Brandt 80 History 238
Bourikas 98 Elec. Eng. 280
Zhang 102 Comp. Sci. 310
Chavez 110 Finance 332
Tanaka 120 Biology 230
  • specify how many rows to include

    • ROWS BETWEEN x AND y

      • n PRECEDING
      • n FOLLOWING
      • UNLIMITED PRECEDING / FOLLOWING
      • CURRENT ROW
1
2
SELECT name, tot_cred, dept_name, min(tot_cred) OVER (ORDER BY tot_cred ROWS BETWEEN 3 PRECEDING AND 2 PRECEDING) AS tcred
  FROM student;
name tot_cred dept_name tcred
Snow 0 Physics
Shankar 32 Comp. Sci.
Sanchez 38 Music 0
Levy 46 Physics 0
Williams 54 Comp. Sci. 32
Peltier 56 Physics 38
Brown 58 Comp. Sci. 46
Aoi 60 Elec. Eng. 54
Brandt 80 History 56
Bourikas 98 Elec. Eng. 58
Zhang 102 Comp. Sci. 60
Chavez 110 Finance 80
Tanaka 120 Biology 98

Lecture <2022-10-03 Mon>

window functions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
CREATE TABLE sales (year INT, month INT, totsales NUMERIC(9,2));

INSERT INTO sales VALUES (2022, 1, 40.54);
INSERT INTO sales VALUES (2022, 2, 100.54);
INSERT INTO sales VALUES (2022, 3, 12.2 );
INSERT INTO sales VALUES (2022, 4, 66.66);
INSERT INTO sales VALUES (2022, 5, 8.888);
INSERT INTO sales VALUES (2022, 6, 10.00);
INSERT INTO sales VALUES (2022, 7, 100.0);
INSERT INTO sales VALUES (2022, 8, 54453.12);
INSERT INTO sales VALUES (2022, 9, 12.1);
INSERT INTO sales VALUES (2022, 10, 15.34);
INSERT INTO sales VALUES (2022, 11, 65.33);
INSERT INTO sales VALUES (2022, 12, 102.2);
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
1
SELECT * FROM sales;
year month totsales
2022 1 40.54
2022 2 100.54
2022 3 12.20
2022 4 66.66
2022 5 8.89
2022 6 10.00
2022 7 100.00
2022 8 54453.12
2022 9 12.10
2022 10 15.34
2022 11 65.33
2022 12 102.20
  • running total of sales in 2022 over the month
1
2
3
SELECT year, month, totsales, sum(totsales) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM sales
WHERE year = 2022
year month totsales running_total
2022 1 40.54 40.54
2022 2 100.54 141.08
2022 3 12.20 153.28
2022 4 66.66 219.94
2022 5 8.89 228.83
2022 6 10.00 238.83
2022 7 100.00 338.83
2022 8 54453.12 54791.95
2022 9 12.10 54804.05
2022 10 15.34 54819.39
2022 11 65.33 54884.72
2022 12 102.20 54986.92
  • sales for current month and the two previous months
1
2
3
SELECT year, month, totsales, sum(totsales) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS running_3_month_total
FROM sales
WHERE year = 2022
year month totsales running_3_month_total
2022 1 40.54 40.54
2022 2 100.54 141.08
2022 3 12.20 153.28
2022 4 66.66 179.40
2022 5 8.89 87.75
2022 6 10.00 85.55
2022 7 100.00 118.89
2022 8 54453.12 54563.12
2022 9 12.10 54565.22
2022 10 15.34 54480.56
2022 11 65.33 92.77
2022 12 102.20 182.87

recursive queries (views)

  • flights (Boston, Chicago, 'United', 544)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
DROP TABLE IF EXISTS flights;
CREATE TABLE flights
  (origin TEXT,
  destination TEXT,
  airline TEXT,
  flightnr INT,
  PRIMARY KEY(airline, flightnr));

INSERT INTO flights VALUES ('ORD', 'SEA', 'UA', 455);
INSERT INTO flights VALUES ('SEA', 'LAX', 'UA', 560);
INSERT INTO flights VALUES ('SEA', 'ORD', 'UA', 566);
INSERT INTO flights VALUES ('SEA', 'BER', 'DE', 111);
INSERT INTO flights VALUES ('BER', 'PAR', 'LH', 666);
DROP TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
1
SELECT * FROM flights;
origin destination airline flightnr
ORD SEA UA 455
SEA LAX UA 560
SEA ORD UA 566
SEA BER DE 111
BER PAR LH 666
  • is there a direct connection from ORD to BER
1
2
3
SELECT *
  FROM flights
WHERE origin = 'ORD' AND destination = 'BER';
origin destination airline flightnr
  • is there a direct connection from ORD to BER with at most one stop
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
WITH nostop AS (SELECT 0 AS numstops, f.origin, f.destination
  FROM flights f),

  onestop AS (SELECT 1 AS numstops, f1.origin, f2.destination
       FROM flights f1, flights f2
      WHERE f1.destination = f2.origin),

  twostops AS (SELECT 2 AS numstops, f1.origin, f2.destination
       FROM onestop f1, flights f2
      WHERE f1.destination = f2.origin),

  threestops AS (SELECT 3 AS numstops, f1.origin, f2.destination
       FROM twostops f1, flights f2
      WHERE f1.destination = f2.origin)

SELECT numstops
FROM (SELECT * FROM nostop
      UNION
      SELECT * FROM onestop
      UNION
      SELECT * FROM twostops
      UNION
      SELECT * FROM threestops) f
WHERE f.origin = 'ORD' AND f.destination = 'BER';
numstops
1
3
  • way to express recursive computations in SQL

    • fixpoint computation:

      • initialization (specifies initial content)
      • recursive step (compute next iteration content based on current content)
1
2
3
4
5
6
7
WITH RECURSIVE allconnections AS (
  SELECT origin, destination, 0 AS numstops FROM flights -- initialization
  UNION
  SELECT a.origin, f.destination, a.numstops + 1 FROM allconnections a, flights f WHERE a.destination = f.origin AND numstops < 10 -- recusive step
  )

SELECT * FROM allconnections WHERE origin = 'ORD' ORDER BY destination, numstops;
origin destination numstops
ORD BER 1
ORD BER 3
ORD BER 5
ORD BER 7
ORD BER 9
ORD LAX 1
ORD LAX 3
ORD LAX 5
ORD LAX 7
ORD LAX 9
ORD ORD 1
ORD ORD 3
ORD ORD 5
ORD ORD 7
ORD ORD 9
ORD PAR 2
ORD PAR 4
ORD PAR 6
ORD PAR 8
ORD PAR 10
ORD SEA 0
ORD SEA 2
ORD SEA 4
ORD SEA 6
ORD SEA 8
ORD SEA 10
  • this is a fixpoint computation
  • initialization, let $T_i$ is the state of the result at iteration $i$

\[ T_0 = Q_{init}(D) \]

  • recusive steps

\[ T_{i+1} = T_i \cup Q_{recursive}(T_i, D) \]

  • termination: $T_{i+1} = T_{i}$

explaining query plans

Lecture <2022-10-05 Wed>

DDL

views

  • how many students per department
1
2
3
4
5
DROP VIEW deptheadcount;
CREATE VIEW deptheadcount AS (
SELECT dept_name, count(*) AS headcnt
  FROM student
GROUP BY dept_name);
DROP VIEW
CREATE VIEW
1
2
3
SELECT dept_name, headcnt
FROM deptheadcount
WHERE headcnt < 3;
dept_name headcnt
History 1
Music 1
Biology 1
Elec. Eng. 2
Finance 1
  • this is equivalent to
1
2
3
4
5
SELECT dept_name, headcnt
FROM (SELECT dept_name, count(*) AS headcnt
  FROM student
GROUP BY dept_name) deph
WHERE headcnt < 3;
dept_name headcnt
History 1
Music 1
Biology 1
Elec. Eng. 2
Finance 1
  • materialized view store the view result
1
2
3
4
5
6
DROP VIEW deptheadcount;
CREATE MATERIALIZED VIEW deptheadcount AS (
SELECT dept_name, count(*) AS headcnt
  FROM student
GROUP BY dept_name);
;
DROP VIEW
SELECT 7
1
2
3
SELECT dept_name, headcnt
FROM deptheadcount
WHERE headcnt < 3;
dept_name headcnt
History 1
Music 1
Biology 1
Elec. Eng. 2
Finance 1
  • when the base tables are updated, materialized views get "out of sync"
  • use REFRESH to make sure the view is up to date.
1
SELECT * FROM deptheadcount;
1
DELETE FROM student WHERE name = 'Shankar';
1
REFRESH MATERIALIZED VIEW deptheadcount;
1
SELECT * FROM deptheadcount;
dept_name headcnt
History 1
Music 1
Physics 3
Biology 1
Elec. Eng. 2
Finance 1
Comp. Sci. 2

catalog (information schema)

  • catalog makes schema information query-able
1
2
3
SELECT table_schema, table_name, table_type
  FROM information_schema.tables
  ORDER BY table_schema, table_name;
table_schema table_name table_type
information_schema _pg_foreign_data_wrappers VIEW
information_schema _pg_foreign_servers VIEW
information_schema _pg_foreign_table_columns VIEW
information_schema _pg_foreign_tables VIEW
information_schema _pg_user_mappings VIEW
information_schema administrable_role_authorizations VIEW
information_schema applicable_roles VIEW
information_schema attributes VIEW
information_schema character_sets VIEW
information_schema check_constraint_routine_usage VIEW
information_schema check_constraints VIEW
information_schema collation_character_set_applicability VIEW
information_schema collations VIEW
information_schema column_column_usage VIEW
information_schema column_domain_usage VIEW
information_schema column_options VIEW
information_schema column_privileges VIEW
information_schema column_udt_usage VIEW
information_schema columns VIEW
information_schema constraint_column_usage VIEW
information_schema constraint_table_usage VIEW
information_schema data_type_privileges VIEW
information_schema domain_constraints VIEW
information_schema domain_udt_usage VIEW
information_schema domains VIEW
information_schema element_types VIEW
information_schema enabled_roles VIEW
information_schema foreign_data_wrapper_options VIEW
information_schema foreign_data_wrappers VIEW
information_schema foreign_server_options VIEW
information_schema foreign_servers VIEW
information_schema foreign_table_options VIEW
information_schema foreign_tables VIEW
information_schema information_schema_catalog_name VIEW
information_schema key_column_usage VIEW
information_schema parameters VIEW
information_schema referential_constraints VIEW
information_schema role_column_grants VIEW
information_schema role_routine_grants VIEW
information_schema role_table_grants VIEW
information_schema role_udt_grants VIEW
information_schema role_usage_grants VIEW
information_schema routine_column_usage VIEW
information_schema routine_privileges VIEW
information_schema routine_routine_usage VIEW
information_schema routine_sequence_usage VIEW
information_schema routine_table_usage VIEW
information_schema routines VIEW
information_schema schemata VIEW
information_schema sequences VIEW
information_schema sql_features BASE TABLE
information_schema sql_implementation_info BASE TABLE
information_schema sql_parts BASE TABLE
information_schema sql_sizing BASE TABLE
information_schema table_constraints VIEW
information_schema table_privileges VIEW
information_schema tables VIEW
information_schema transforms VIEW
information_schema triggered_update_columns VIEW
information_schema triggers VIEW
information_schema udt_privileges VIEW
information_schema usage_privileges VIEW
information_schema user_defined_types VIEW
information_schema user_mapping_options VIEW
information_schema user_mappings VIEW
information_schema view_column_usage VIEW
information_schema view_routine_usage VIEW
information_schema view_table_usage VIEW
information_schema views VIEW
pg_catalog pg_aggregate BASE TABLE
pg_catalog pg_am BASE TABLE
pg_catalog pg_amop BASE TABLE
pg_catalog pg_amproc BASE TABLE
pg_catalog pg_attrdef BASE TABLE
pg_catalog pg_attribute BASE TABLE
pg_catalog pg_auth_members BASE TABLE
pg_catalog pg_authid BASE TABLE
pg_catalog pg_available_extension_versions VIEW
pg_catalog pg_available_extensions VIEW
pg_catalog pg_backend_memory_contexts VIEW
pg_catalog pg_cast BASE TABLE
pg_catalog pg_class BASE TABLE
pg_catalog pg_collation BASE TABLE
pg_catalog pg_config VIEW
pg_catalog pg_constraint BASE TABLE
pg_catalog pg_conversion BASE TABLE
pg_catalog pg_cursors VIEW
pg_catalog pg_database BASE TABLE
pg_catalog pg_db_role_setting BASE TABLE
pg_catalog pg_default_acl BASE TABLE
pg_catalog pg_depend BASE TABLE
pg_catalog pg_description BASE TABLE
pg_catalog pg_enum BASE TABLE
pg_catalog pg_event_trigger BASE TABLE
pg_catalog pg_extension BASE TABLE
pg_catalog pg_file_settings VIEW
pg_catalog pg_foreign_data_wrapper BASE TABLE
pg_catalog pg_foreign_server BASE TABLE
pg_catalog pg_foreign_table BASE TABLE
pg_catalog pg_group VIEW
pg_catalog pg_hba_file_rules VIEW
pg_catalog pg_index BASE TABLE
pg_catalog pg_indexes VIEW
pg_catalog pg_inherits BASE TABLE
pg_catalog pg_init_privs BASE TABLE
pg_catalog pg_language BASE TABLE
pg_catalog pg_largeobject BASE TABLE
pg_catalog pg_largeobject_metadata BASE TABLE
pg_catalog pg_locks VIEW
pg_catalog pg_matviews VIEW
pg_catalog pg_namespace BASE TABLE
pg_catalog pg_opclass BASE TABLE
pg_catalog pg_operator BASE TABLE
pg_catalog pg_opfamily BASE TABLE
pg_catalog pg_partitioned_table BASE TABLE
pg_catalog pg_policies VIEW
pg_catalog pg_policy BASE TABLE
pg_catalog pg_prepared_statements VIEW
pg_catalog pg_prepared_xacts VIEW
pg_catalog pg_proc BASE TABLE
pg_catalog pg_publication BASE TABLE
pg_catalog pg_publication_rel BASE TABLE
pg_catalog pg_publication_tables VIEW
pg_catalog pg_range BASE TABLE
pg_catalog pg_replication_origin BASE TABLE
pg_catalog pg_replication_origin_status VIEW
pg_catalog pg_replication_slots VIEW
pg_catalog pg_rewrite BASE TABLE
pg_catalog pg_roles VIEW
pg_catalog pg_rules VIEW
pg_catalog pg_seclabel BASE TABLE
pg_catalog pg_seclabels VIEW
pg_catalog pg_sequence BASE TABLE
pg_catalog pg_sequences VIEW
pg_catalog pg_settings VIEW
pg_catalog pg_shadow VIEW
pg_catalog pg_shdepend BASE TABLE
pg_catalog pg_shdescription BASE TABLE
pg_catalog pg_shmem_allocations VIEW
pg_catalog pg_shseclabel BASE TABLE
pg_catalog pg_stat_activity VIEW
pg_catalog pg_stat_all_indexes VIEW
pg_catalog pg_stat_all_tables VIEW
pg_catalog pg_stat_archiver VIEW
pg_catalog pg_stat_bgwriter VIEW
pg_catalog pg_stat_database VIEW
pg_catalog pg_stat_database_conflicts VIEW
pg_catalog pg_stat_gssapi VIEW
pg_catalog pg_stat_progress_analyze VIEW
pg_catalog pg_stat_progress_basebackup VIEW
pg_catalog pg_stat_progress_cluster VIEW
pg_catalog pg_stat_progress_copy VIEW
pg_catalog pg_stat_progress_create_index VIEW
pg_catalog pg_stat_progress_vacuum VIEW
pg_catalog pg_stat_replication VIEW
pg_catalog pg_stat_replication_slots VIEW
pg_catalog pg_stat_slru VIEW
pg_catalog pg_stat_ssl VIEW
pg_catalog pg_stat_subscription VIEW
pg_catalog pg_stat_sys_indexes VIEW
pg_catalog pg_stat_sys_tables VIEW
pg_catalog pg_stat_user_functions VIEW
pg_catalog pg_stat_user_indexes VIEW
pg_catalog pg_stat_user_tables VIEW
pg_catalog pg_stat_wal VIEW
pg_catalog pg_stat_wal_receiver VIEW
pg_catalog pg_stat_xact_all_tables VIEW
pg_catalog pg_stat_xact_sys_tables VIEW
pg_catalog pg_stat_xact_user_functions VIEW
pg_catalog pg_stat_xact_user_tables VIEW
pg_catalog pg_statio_all_indexes VIEW
pg_catalog pg_statio_all_sequences VIEW
pg_catalog pg_statio_all_tables VIEW
pg_catalog pg_statio_sys_indexes VIEW
pg_catalog pg_statio_sys_sequences VIEW
pg_catalog pg_statio_sys_tables VIEW
pg_catalog pg_statio_user_indexes VIEW
pg_catalog pg_statio_user_sequences VIEW
pg_catalog pg_statio_user_tables VIEW
pg_catalog pg_statistic BASE TABLE
pg_catalog pg_statistic_ext BASE TABLE
pg_catalog pg_statistic_ext_data BASE TABLE
pg_catalog pg_stats VIEW
pg_catalog pg_stats_ext VIEW
pg_catalog pg_stats_ext_exprs VIEW
pg_catalog pg_subscription BASE TABLE
pg_catalog pg_subscription_rel BASE TABLE
pg_catalog pg_tables VIEW
pg_catalog pg_tablespace BASE TABLE
pg_catalog pg_timezone_abbrevs VIEW
pg_catalog pg_timezone_names VIEW
pg_catalog pg_transform BASE TABLE
pg_catalog pg_trigger BASE TABLE
pg_catalog pg_ts_config BASE TABLE
pg_catalog pg_ts_config_map BASE TABLE
pg_catalog pg_ts_dict BASE TABLE
pg_catalog pg_ts_parser BASE TABLE
pg_catalog pg_ts_template BASE TABLE
pg_catalog pg_type BASE TABLE
pg_catalog pg_user VIEW
pg_catalog pg_user_mapping BASE TABLE
pg_catalog pg_user_mappings VIEW
pg_catalog pg_views VIEW
public advisor BASE TABLE
public classroom BASE TABLE
public course BASE TABLE
public department BASE TABLE
public flights BASE TABLE
public instructor BASE TABLE
public myemptytable BASE TABLE
public otherstudents BASE TABLE
public prereq BASE TABLE
public sales BASE TABLE
public section BASE TABLE
public seminars BASE TABLE
public student BASE TABLE
public takes BASE TABLE
public teaches BASE TABLE
public time_slot BASE TABLE
1
2
3
4
SELECT table_name, column_name, ordinal_position, data_type
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name LIKE 's%'
ORDER BY table_name, ordinal_position;
table_name column_name ordinal_position data_type
sales year 1 integer
sales month 2 integer
sales totsales 3 numeric
section course_id 1 character varying
section sec_id 2 character varying
section semester 3 character varying
section year 4 numeric
section building 5 character varying
section room_number 6 character varying
section time_slot_id 7 character varying
seminars title 1 character varying
seminars presenter 2 character varying
seminars semdate 3 timestamp without time zone
seminars building 4 character varying
seminars room_number 5 character varying
seminars organizing_dept 6 character varying
student id 1 character varying
student name 2 character varying
student dept_name 3 character varying
student tot_cred 4 numeric

More on constraints

foreign keys
  • for foreign keys we can specify what action to take if the referenced table is updated

    • for deletions ON DELETE
    • for update ON UPDATE
  • options

    • SET NULL - set fk columns to NULL
    • RESTRICT - disallow deletion / updates for tuples that referenced
    • CASCADE

      • for deletion: delete referencing rows
      • for update: update the values of referencing rows
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE TABLE addr (
  id INT PRIMARY KEY,
  city TEXT,
  zip NUMERIC(5,0)
);

INSERT INTO addr VALUES (1, 'Chicago', 60616);
INSERT INTO addr VALUES (2, 'Chicago', 60612);
INSERT INTO addr VALUES (3, 'Chicago', 60645);
INSERT INTO addr VALUES (4, 'Evanston', 60555);
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
  • persons that reference address
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE person (
  name TEXT,
  addrid INT REFERENCES addr
);

INSERT INTO person VALUES ('A', 1);
INSERT INTO person VALUES ('B', 1);
INSERT INTO person VALUES ('C', 2);
INSERT INTO person VALUES ('D', 4);
INSERT INTO person VALUES ('E', 4);
INSERT INTO person VALUES ('F', 4);
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
1
DELETE FROM addr WHERE id = 1;
1
2
3
4
--ALTER TABLE person DROP CONSTRAINT person_addrid_fkey;
ALTER TABLE person ADD CONSTRAINT ourfk
  FOREIGN KEY (addrid) REFERENCES addr
              ON DELETE CASCADE ON UPDATE SET NULL;
ALTER TABLE
1
DELETE FROM addr WHERE id = 1;
DELETE 1
1
SELECT * FROM person;
name addrid
C 2
D 4
E 4
F 4
1
UPDATE addr SET id = 5 WHERE id = 4;
UPDATE 1
1
SELECT * FROM person;
name addrid
C 2
D
E
F
CHECK constraints
  • check boolean condition over the values of columns (from this table!)
1
SELECT * FROM addr WHERE NOT (zip::TEXT LIKE '60%');
id city zip
1 NY 15555
1
ALTER TABLE addr ADD CONSTRAINT local_check CHECK (zip::TEXT LIKE '60%');
ALTER TABLE
  • violates the constraint since 15555 does not start with 60
1
INSERT INTO addr VALUES (1, 'NY', 15555);

Updates (DML)

transactions

  • allow us to group SQL statements into atomic

    • either all will be executed or none
    • and current transactions cannot interact (later when we talk concurrency control)
  • transaction end if you run a COMMIT or ABORT
  • the database will abort transactions if a statement throws an error
1
2
3
4
BEGIN WORK;
INSERT INTO addr VALUES (6,'Chicago', 60754);
INSERT INTO addr VALUES (7,'NY', 15555);
COMMIT;
BEGIN
INSERT 0 1
1
SELECT * FROM addr;
id city zip
2 Chicago 60612
3 Chicago 60645
5 Evanston 60555

advanced update statement

insert query results
  • INSERT INTO ... SELECT
1
2
3
4
5
INSERT INTO student (SELECT 'AB' || substring(id,0,4),
                            name,
                            'History',
                            tot_cred
                       FROM student);
INSERT 0 11
1
SELECT * FROM student;
id name dept_name tot_cred
19991 Brandt History 80
23121 Chavez Finance 110
44553 Peltier Physics 56
45678 Levy Physics 46
54321 Williams Comp. Sci. 54
55739 Sanchez Music 38
70557 Snow Physics 0
76543 Brown Comp. Sci. 58
76653 Aoi Elec. Eng. 60
98765 Bourikas Elec. Eng. 98
98988 Tanaka Biology 120
AB199 Brandt History 80
AB231 Chavez History 110
AB445 Peltier History 56
AB456 Levy History 46
AB543 Williams History 54
AB557 Sanchez History 38
AB705 Snow History 0
AB765 Brown History 58
AB766 Aoi History 60
AB987 Bourikas History 98
AB989 Tanaka History 120
update / delete with nested subqueries
  • can use nested subqueries in WHERE of delete or update
1
2
3
SELECT count(*), dept_name
   FROM student s2
  GROUP BY s2.dept_name;
count dept_name
3 Physics
1 Biology
2 Elec. Eng.
1 Finance
2 Comp. Sci.
12 History
1 Music
1
2
3
4
DELETE FROM student s
 WHERE 3 > (SELECT count(*)
              FROM student s2
             WHERE s2.dept_name = s.dept_name);
DELETE 7
1
SELECT * FROM student;
id name dept_name tot_cred
19991 Brandt History 80
44553 Peltier Physics 56
45678 Levy Physics 46
70557 Snow Physics 0
AB199 Brandt History 80
AB231 Chavez History 110
AB445 Peltier History 56
AB456 Levy History 46
AB543 Williams History 54
AB557 Sanchez History 38
AB705 Snow History 0
AB765 Brown History 58
AB766 Aoi History 60
AB987 Bourikas History 98
AB989 Tanaka History 120

Lecture <2022-10-12 Wed>

exam topics review and questions

Access control

1
2
3
REVOKE ALL ON student FROM testuser;
DROP USER IF EXISTS testuser;
CREATE USER testuser WITH LOGIN PASSWORD 'test';
REVOKE
DROP ROLE
CREATE ROLE
1
GRANT SELECT ON student TO testuser;
GRANT
1
GRANT INSERT ON student TO testuser;
GRANT
1
2
3
4
5
CREATE VIEW studentstats AS (
  SELECT dept_name, count(*) AS numst, avg(tot_cred) AS avgcred
    FROM student
   GROUP BY dept_name
  );
CREATE VIEW
  • roles

    1
    2
    3
    4
    
    CREATE ROLE hr;
    GRANT SELECT on studentstats TO hr;
    GRANT hr TO testuser;
    REVOKE ALL ON student FROM testuser;
    GRANT
    GRANT ROLE
    REVOKE

User defined types and functions

UDT = user-defined types

  • define new types
  • in postgres use C and SQL to register the new type and its functions: https://www.postgresql.org/docs/current/xtypes.html

    • input: take string and return binary representation
    • output: take binary and return string representation
    • send/receive: serialize / deserialize for network transfer

UDF = user-defined functions and operators

  • can be written in PL like C, Python, …

Procedural extensions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE FUNCTION get_num_students() RETURNS int
AS $$
DECLARE
  cnt int;
BEGIN
    SELECT count(*) INTO cnt
        FROM student;
    RETURN cnt;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
1
SELECT get_num_students();
get_num_students
16
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE FUNCTION get_num_students(dept TEXT) RETURNS int
AS $$
DECLARE
  cnt int;
BEGIN
    SELECT count(*) INTO cnt
      FROM student
      WHERE dept_name = get_num_students.dept;
    RETURN cnt;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
1
2
SELECT name, dept_name, get_num_students(dept_name)
FROM student
name dept_name get_num_students
Brandt History 13
Peltier Physics 3
Levy Physics 3
Snow Physics 3
Brandt History 13
Chavez History 13
Peltier History 13
Levy History 13
Williams History 13
Sanchez History 13
Snow History 13
Brown History 13
Aoi History 13
Bourikas History 13
Tanaka History 13
test History 13
1
2
3
4
5
6
7
8
9
CREATE FUNCTION plus_one(i INT) RETURNS int
AS $$
DECLARE
  res INT;
BEGIN
  res := i + 1;
  RETURN res;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
1
SELECT plus_one(1);
plus_one
2

control flow and iteration

conditional control flow
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
CREATE FUNCTION add_one_if_large_else_add_two(i INT) RETURNS int
AS $$
DECLARE
  res INT;
BEGIN
  IF (i > 100)
  THEN
    res := i + 1;
  ELSE
    res := i + 2;
  END IF;
  RETURN res;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
1
2
SELECT add_one_if_large_else_add_two(3);
SELECT add_one_if_large_else_add_two(200);
add_one_if_large_else_add_two
5
add_one_if_large_else_add_two
201
looping
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
CREATE OR REPLACE FUNCTION add_one_hun(i INT) RETURNS int
AS $$
DECLARE
  j INT;
  res INT;
BEGIN
  j := 0;
  res := i;
  WHILE (j < 100)
    LOOP
    res := res + 1;
    j := j + 1;
  END LOOP;
  RETURN res;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
1
SELECT add_one_hun(100);
add_one_hun
200
looping through query results

https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
CREATE OR REPLACE FUNCTION another_cnt() RETURNS int
AS $$
DECLARE
  res INT;
  st RECORD;
BEGIN
  res := 0;
  FOR st IN (SELECT * FROM student)
  LOOP
    IF (st.dept_name = 'History')
      THEN
        res := res + 1;
    END IF;
  END LOOP;
  RETURN res;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
1
SELECT another_cnt();
another_cnt
13
table returning functions
1
SELECT i FROM generate_series(1,10) AS f(i);
i
1
2
3
4
5
6
7
8
9
10

Lecture <2022-10-19 Wed>

Triggers

  • triggers are special functions that are executed when a certain action is performed on a table
  • in Postgres you first have to define the trigger function and then you can create a trigger

postgres trigger creation syntax

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
CREATE [ OR REPLACE ] [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
    ON table_name
    [ FROM referenced_table_name ]
    [ NOT DEFERRABLE | [ DEFERRABLE ] [ INITIALLY IMMEDIATE | INITIALLY DEFERRED ] ]
    [ REFERENCING { { OLD | NEW } TABLE [ AS ] transition_relation_name } [ ... ] ]
    [ FOR [ EACH ] { ROW | STATEMENT } ]
    [ WHEN ( condition ) ]
    EXECUTE { FUNCTION | PROCEDURE } function_name ( arguments )

where event can be one of:

    INSERT
    UPDATE [ OF column_name [, ... ] ]
    DELETE
    TRUNCATE

trigger function

  • returns value of type trigger
1
2
3
4
5
6
7
8
CREATE FUNCTION trigger_function()
   RETURNS TRIGGER
   LANGUAGE PLPGSQL
AS $$
BEGIN
   -- trigger logic
END;
$$

trigger example: implement history tables with triggers

  • do not loose the history of the table (keep old version)
1
2
3
4
5
6
7
CREATE TABLE nohistory_time (a int PRIMARY KEY, b int, t_begin TIMESTAMP DEFAULT CURRENT_TIMESTAMP);

INSERT INTO nohistory_time (a,b) VALUES (1,1), (2,2), (3,3);

CREATE VIEW nohistory AS (SELECT a, b FROM nohistory_time);

CREATE TABLE history (a int, b int, t_begin TIMESTAMP, t_end TIMESTAMP);
CREATE TABLE
INSERT 0 3
CREATE VIEW
CREATE TABLE
1
SELECT * FROM nohistory_time;
a b t_begin
1 1 2022-10-19 15:31:29.584735
2 2 2022-10-19 15:31:29.584735
3 3 2022-10-19 15:31:29.584735
1
SELECT * FROM history;
a b t_begin t_end
1
2
3
4
5
6
7
8
9
CREATE OR REPLACE FUNCTION keep_history_delete()
   RETURNS TRIGGER
   LANGUAGE PLPGSQL
AS $$
BEGIN
   INSERT INTO history VALUES (OLD.a, OLD.b, OLD.t_begin, CURRENT_TIMESTAMP);
   RETURN NEW;
END;
$$
CREATE FUNCTION
1
2
3
4
CREATE OR REPLACE TRIGGER keep_history_delete_trigger
  AFTER DELETE ON nohistory_time
  FOR EACH ROW
    EXECUTE FUNCTION keep_history_delete();
CREATE TRIGGER
1
DELETE FROM nohistory_time WHERE a = 1;
INSERT 0 1
DELETE 1
1
SELECT * FROM history;
a b t_begin t_end
1 1 2022-10-19 15:35:20.330632 2022-10-19 15:35:20.339711
1
DELETE FROM nohistory_time WHERE a = 2;
DELETE 1
1
INSERT INTO nohistory_time (a,b) VALUES (4,2), (5,3);
INSERT 0 2
1
SELECT * FROM nohistory_time;
a b t_begin
3 3 2022-10-19 15:31:29.584735
4 2 2022-10-19 15:38:07.803667
5 3 2022-10-19 15:38:07.803667
1
DELETE FROM nohistory_time WHERE a <> b;
DELETE 2
1
SELECT * FROM history ORDER BY t_begin, t_end, a, b;
a b t_begin t_end
2 2 2022-10-19 15:31:29.584735 2022-10-19 15:36:48.183113
1 1 2022-10-19 15:35:20.330632 2022-10-19 15:35:20.339711
4 2 2022-10-19 15:38:07.803667 2022-10-19 15:38:13.546615
5 3 2022-10-19 15:38:07.803667 2022-10-19 15:38:13.546615
  • get data as of a certain time
1
SELECT a, b FROM history WHERE '2022-10-19 15:35:20.339710'::TIMESTAMP BETWEEN t_begin AND t_end;
a b
1 1
2 2
  • now updates
1
2
3
4
5
6
7
8
9
CREATE OR REPLACE FUNCTION keep_history_update()
   RETURNS TRIGGER
   LANGUAGE PLPGSQL
AS $$
BEGIN
   INSERT INTO history VALUES (OLD.a, OLD.b, OLD.t_begin, CURRENT_TIMESTAMP);
   RETURN NEW;
END;
$$
CREATE FUNCTION
1
2
3
4
CREATE OR REPLACE TRIGGER keep_history_update_trigger
  AFTER UPDATE ON nohistory_time
  FOR EACH ROW
    EXECUTE FUNCTION keep_history_update();
CREATE TRIGGER
1
INSERT INTO nohistory_time (a,b) VALUES (5,5), (6,6), (7,1), (8,2);
INSERT 0 4
1
SELECT * FROM nohistory_time;
a b t_begin
3 3 2022-10-19 15:31:29.584735
5 5 2022-10-19 15:42:57.311625
6 6 2022-10-19 15:42:57.311625
7 1 2022-10-19 15:44:53.411487
8 2 2022-10-19 15:44:53.411487
1
UPDATE nohistory_time SET b = a WHERE a <> b;
UPDATE 2
1
SELECT * FROM history;
a b t_begin t_end
1 1 2022-10-19 15:35:20.330632 2022-10-19 15:35:20.339711
2 2 2022-10-19 15:31:29.584735 2022-10-19 15:36:48.183113
4 2 2022-10-19 15:38:07.803667 2022-10-19 15:38:13.546615
5 3 2022-10-19 15:38:07.803667 2022-10-19 15:38:13.546615
7 7 2022-10-19 15:42:57.311625 2022-10-19 15:44:53.392018
8 8 2022-10-19 15:42:57.311625 2022-10-19 15:44:53.392018
7 1 2022-10-19 15:44:53.411487 2022-10-19 15:45:03.793728
8 2 2022-10-19 15:44:53.411487 2022-10-19 15:45:03.793728
1
2
3
4
5
6
SELECT a, b
  FROM
    (SELECT * FROM history
      UNION ALL
     SELECT *, NULL AS t_end FROM nohistory_time) sub
 WHERE '2022-10-19 15:35:20.339710'::TIMESTAMP BETWEEN t_begin AND t_end OR (t_begin <= '2022-10-19 15:35:20.339710' AND t_end IS NULL);
a b
1 1
2 2
3 3

ER-model

requirement analysis: music collection

  • genres, albums, artists
  • We store information about songs

    • Songs have a title
    • Songs are written by one or more artists
    • Songs can be on albums as tracks and appear with a particular track number
    • Songs can have a genre
  • We store information about artists.

    • Artists have a name
    • Artists have a date of birth and optionally a date of death
  • We want to store information about albums.

    • Albums have a release date
    • Albums have a number of tracks which are songs and have additionally a track number and length
    • Albums have artists playing on the album
    • Albums are released by some publisher (typically record company)
    • Albums have titles

Lecture <2022-10-24 Mon>

ER-model

Entities and relationships

students taking courses example

../../static/imgs/studentcourse.jpg

Students taking courses ER-digagem
Students taking courses ER-digagem

Examples

Music collection

../../static/imgs/music.jpg

Music collection ER-diagram
Music collection ER-diagram

Lecture <2022-10-31 Mon>

Lecture <2022-11-07 Mon>

Lecture <2022-11-09 Wed>

LEFT OUTER JOIN for not exists checks

1
2
3
4
5
SELECT emp
FROM servedby s LEFT OUTER JOIN (SELECT *, 1 AS indicator FROM mail) m ON (s.addr = m.to_addr
                                           AND s.startdate<=m.delivered_date
                                           AND ( s.enddate IS NULL OR s.enddate>=m.delivered_date))
WHERE m.indicator IS NULL

exam scaling

  • grade = 0.25 * 100/85 * midterm score + …

Lecture <2022-11-14 Mon>

Lecture <2022-11-21 Mon>

Lecture <2022-11-30 Wed>

example of determining normal forms

2NF

  • 1NF (always assumed)
  • $$R = (A,B,C,D,E)$$
  • $$F = \{ A \rightarrow E, BC \rightarrow D, C \rightarrow E \}$$
  • 2NF

    • candidate keys $$\{\{ A,B,C \}\}$$
    • $$ABC^{+} = ABCDE$$
    • non-prime = $$\{D,E\}$$
    • counterexample: $$A \rightarrow E$$ is a dependency from part of a candidate key to a non-prime attribute
  • this is not in 2NF
  • $$R = (A,B,C,D,E)$$
  • $$F = \{ A \rightarrow E, BC \rightarrow AD, C \rightarrow E \}$$
  • candidate keys $$\{\{ B,C \}\}$$

    • non-prime = $$\{A,D,E\}$$
  • counterexample: $$C \rightarrow E$$ is a dependency from part of a candidate key to a non-prime attribute
  • this is in not 2NF
  • $$R = (A,B,C,D,E)$$
  • $$F = \{ A \rightarrow E, BC \rightarrow AD \}$$
  • candidate keys $$\{\{ B,C \}\}$$

    • non-prime = $$\{A,D,E\}$$
  • counterexample: is a dependency from part of a candidate key to a non-prime attribute
  • this is in 2NF

3NF

  • $$R = (A,B,C,D,E)$$
  • $$F = \{ A \rightarrow E, BC \rightarrow AD \}$$
  • candidate keys $$\{\{ B,C \}\}$$

    • non-prime = $$\{A,D,E\}$$
  • counterexample: is a dependency from part of a candidate key to a non-prime attribute
  • this is in 2NF
  • $$A \rightarrow E$$ is neither trivial, nor is A a superkey, nor is E part of a candidate key!
  • this is not in 3NF
  • just an exercise: $$BC \rightarrow AD$$ ok, BC is a superkey
  • $$BC \rightarrow B$$ - trivial and BC is a superkey $$BC^{+} = BCADE$$

BCNF

  • same as first two options for 3NF (exploit that)

attribute closures and canonical cover

  • $$F = \{ A \rightarrow BCD, B \rightarrow A, C \rightarrow D \}$$
  • $$F^{+} = \{ AB \rightarrow B, A \rightarrow D, \ldots \}$$
  • attribute closure $$A^{+} = ABCD$$
  • $$F^{C} = \{ A \rightarrow BCD, B \rightarrow A, C \rightarrow D \}$$

    • find extraneous attributes in RHS, LHS
    • is $$D$$ extraneous in $$A \rightarrow BCD$$

      • $$F' = \{ A \rightarrow BC, B \rightarrow A, C \rightarrow D \}$$

        • attribute closure $$A^{+}$$ = $$ABCD$$
  • $$F^{C} = \{ A \rightarrow BC, B \rightarrow A, C \rightarrow D \}$$

    • is $$B$$ extraneous in $$A \rightarrow BC$$
    • $$F' = \{ A \rightarrow C, B \rightarrow A, C \rightarrow D \}$$

      • attribute closure $$A^{+}$$ = $$ACD$$ (not extraneous)
    • is $$C$$ extraneous in $$A \rightarrow BC$$?
    • $$F' = \{ A \rightarrow B, B \rightarrow A, C \rightarrow D \}$$

      • attribute closure $$A^{+}$$ = $$AB$$ (not extraneous)
  • $$F^{C} = \{ A \rightarrow BC, B \rightarrow A, C \rightarrow D \}$$ is the final result

decomposition into 3NF

Let's use $$F^{C} = \{ A \rightarrow BC, B \rightarrow A, C \rightarrow D \}$$ as an example.

  • start with empty set of fragments $$\{\}$$
  • $$A \rightarrow BC$$ - $$R_1 = (A,B,C)$$
  • $$B \rightarrow A$$ - we already have $$R_1$$
  • $$C \rightarrow D$$ - $$R_2 = (C,D)$$
  • we already have $$R_1$$ containing candidate key $$A$$
  • no contained fragments
  • final decomposed result is $$\{R_1, R_2\}$$
Last updated on 24 Aug 2020
Published on 24 Aug 2020