2020-06-04 20:17
Page 1

# Databases

## Lecture 6: Functional Dependencies and Normal Forms

• Lecture notes: chapter 5
• Book: chapter 3
Page 2

# Functional Dependencies

## Definition

• Consider a relational schema R(a,b,c,d,e).
• What does a functional dependency a b → c d mean?
• If two rows agree on the values of a and b, they should also agree on the values of c and d.
• The values of a and b determine the values of c and d.
Page 3

## Example 2 (from last time)

• Countries
countrycurrencyvalue
FinlandEUR1.09
EstoniaEUR1.09
SwedenSEK0.12
Schema: `Countries(country,currency,value)`
• Functional Dependencies:
• country→currency
currency→value
• If country is the primary key then country→currency will be trivially satisfied.
• But how do we ensure that currency→value is satisfied?
• The functional dependencies help us identify a problematic schema.
Page 4

## Notation

•  a → b c is the same as a → b a → c (a determines both b and c)
• but
•  a b → c is not the same as a → c b → c
• a b → c means that a and b together determine c.
Page 5

# Properties of Functional Dependencies

## Transitivity

•  a → bb → c implies a → c
• A more complex example:
•  a → cb → dc d → e implies a b → e
Page 6

Page 7

## Reflexivity and trivial dependencies

•  a → a an attribute always determines itself
•  a b → a two or more attributes determine a subset of the attributes (reflexivity + augmentation)
Page 8

## F- Minimal basis (minimal cover)

• If F is a set of functional dependencies,
• then minimal basis F- is a simplified but equivalent set of dependencies
• no trivial dependencies
• no dependencies that are implied by transitivity
• no dependencies that are augmentations of other dependencies
Page 9
##### Properties of Functional Dependencies
• Closure
• X+: the set of attributes determined (directly or indirectly) by X.
• Superkey
• X is a superkey if X+ includes all attributes in the relation.
• Key
• A minimal superkey.
• Candidates for primary keys.
Page 10

# Functional Dependencies

• Example 2 (continuing from last time)
• Countries
countrycurrencyvalue
FinlandEUR1.09
EstoniaEUR1.09
SwedenSEK0.12
Schema: `Countries(country,currency,value)`
• Functional Dependencies:
• country→currency
currency→value
country→value (implied by transitivity)
• Closures:
• country+ = {country,currency,value}
currency+ = {currency,value}
value+ = {value}
• Superkeys: {country}, {country,currency}, {country,value}, {country,currency,value}
• Key: {country}
• Non-trivial FD that is not a superkey: currency→value
Page 11

## Split tables to avoid redundancy

• JustCountries
countrycurrency
FinlandEUR
EstoniaEUR
SwedenSEK
Currencies
currencyvalue
EUR1.09
SEK0.12
• Countries = JustCountries ⨝ Currencies
• We can insert new info about which currency is used in a country without knowing the exchange rate.
• We can insert the exchange rate for a new currency without knowing which country it is used in.
Page 12

# Another Example (1)

• Consider a database to keep track of room bookings:

Bookings
codenamedaytimeslotroomseats
TDA357DatabaserTuesday0GD236
TDA357DatabaserTuesday1GD236
ERE033ReglerteknikTuesday0HB4224
ERE033ReglerteknikFriday0GD236
• Problems?
Page 13

# Another Example (2)

• Consider a database to keep track of room bookings:

Bookings
codenamedaytimeslotroomseats
TDA357DatabaserTuesday0GD236
TDA357DatabaserTuesday1GD236
ERE033ReglerteknikTuesday0HB4224
ERE033ReglerteknikFriday0GD236
• There is some redundancy
• For example, the number of seats in GD is mentioned 3 times.
• Problems:
• Insert anomaly: we can't insert the number of seats in a room if there is no booking in that room
• Update anomaly: we get an inconsistency if the number of seats in GD is changed in only one of the rows.
• Delete anomaly: the number of seats in a room is lost of all bookings in that room are removed.
Page 14

## A better design

• Courses
codename
TDA357Databases
ERE033Reglerteknik
Rooms
roomseats
GD236
HB4224
Bookings
codedaytimeslotroom
TDA357Tuesday0GD
TDA357Tuesday1GD
ERE033Tuesday0HB4
ERE033Friday0GD
• Courses ⨝ Bookings ⨝ Rooms gives us the original table
• This is probably what we would get if we started from an Entity-Relationship model
• What would the E-R diagram look like?
• What are the primary keys in these tables?
Page 15

# Database design using FDs

## Another way detect and fix the problems with the original table

• Consider the functional dependencies between the attributes.
• This will reveal that the table is not in a normal form.
• Split it into smaller tables that are in a normal form.
Page 16

# Normal Forms

• 1NF: First Normal Form (1970)
• 2NF: Second Normal Form (1971)
• 3NF: Third Normal Form (1971)
• BCNF (3½NF): Boyce-Codd Normal Form (1974)
• 4NF: Fourth Normal Form (1977)
Page 17

# 1NF: First Normal Form

• Requirement: Only atomic values in the tables
• 1NF
idnamephoneprogram
1Alice1234,
1235
TKDAT
2Bob4567TKITE
3Carl9870MPSC
1NF
idnamephoneprogram
1Alice1234TKDAT
1Alice1235TKDAT
2Bob4567TKITE
3Carl9870MPSC
Page 18
##### 1NF: First Normal Form
• Another way to convert to 1NF
• 1NF
idnamephoneprogram
1Alice1234,
1235
TKDAT
2Bob4567TKITE
3Carl9870MPSC
1NF
idnamephone1phone2program
1Alice12341235TKDAT
2Bob4567TKITE
3Carl9870MPSC
Page 19

# Purpose of Normal Forms (beyond 1NF)

• To free the collection of relations from undesirable insertion, update and deletion dependencies.
• To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs.
• To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

• (Codd, E.F.: "Further Normalization of the Data Base Relational Model")
Page 20

# 2NF: Second Normal Form

• Requirement: 1NF + No Partial Dependencies
• Example: `CompletedCourses(student,course,grade,credits)`
• CompletedCourses (2NF)
1C1G5.0
2C2VG7.5
1C4G9.0
4C3G5.0
4C1VG5.0
2C5G9.0
• Functional Dependencies:
course→credits (partial dependency!)
Page 21

## Convert to 2NF by splitting the table

• ```Grades(student,course,grade) Credits(cource,credits)```
1C1G
2C2VG
1C4G
4C3G
4C1VG
2C5G
Credits (2NF)
coursecredits
C15.0
C27.5
C35.0
C49.0
C59.0
• CompletedCourses = Grades ⨝ Credits
Page 22

# 3NF: Third Normal Form

• Requirement: 2NF + no transitive dependencies from the primary key to non-key attributes
idnamepostcodecity
1Alice41279Göteborg
2Bob43168Mölndal
3Carl41279Göteborg
• Functional dependencies:
• id→name,postcode,city
postcode→city
• Transitive dependency: id→postcode→city
Page 23

## Convert to 3NF by splitting the table

• HomeCodes (3NF)
idnamepostcode
1Alice41296
2Bob43168
3Carl41296
PostcodeCities (3NF)
postcodecity
41279Göteborg
43168Mölndal
• HomeAddress = HomeCodes ⨝ PostcodeCities
Page 24

## Does it avoid all problems with redundancies?

• Consider the following schema and functional dependencies:
• `Postcodes(city,street,postcode)`
• city street → postcode
postcode → city
• {city,street} is a key, since {city,street}+ = {city,street,postcode}
• Is it in 2NF? Is it free from partial dependencies?
• Is it in 3NF? Is it free from transitive dependencies from the key to non-key attributes?
Page 25

## Example

• Postcodes (3NF)
citystreetpostcode
GöteborgFramnäsgatan41264
GöteborgRännvägen41258
GöteborgHörsalsvägen41258
GöteborgBarnhusgatan41102
StockholmBarnhusgatan11123
• The same city,postcode pair can appear many times.
• But postcode → city, so there is some redundancy.
Page 26

# BCNF: Boyce-Codd Normal Form

• Requirement: for each non-trivial functional dependency X→Y, X is a superkey
• BNFC ⇒ 2NF, 3NF
Page 27

# BCNF Normalisation Algorithm

## To normalise a relation R(S)

• Find a non-trivial FD X→y such that X+ ≠ S (X+ is not a superkey)
• If there is no such FD, R is already in BCNF, so we are done.
• Otherwise, split R into R1(X+) and R2(X ∪ (S – X+)).
• Repeat the procedure for R1 and R2.
Page 28

## Example

• ```Postcodes(city,street,postcode) ```
• city street → postcode
postcode → city
• Pick postcode → city:
• postcode+ = {postcode,city} ≠ {city,street,postcode}
• A BNFC violation
• Split into {postcode,city} and {postcode,street}
• These have no BNFC violations.
Page 29
##### BCNF Normalisation Algorithm → Example
• Streets (BCNF)
streetpostcode
Framnäsgatan41264
Rännvägen41258
Hörsalsvägen41258
Barnhusgatan41102
Barnhusgatan11123
Cities (BCNF)
citypostcode
Göteborg41264
Göteborg41258
Göteborg41102
Stockholm11123
• Postcodes = Streets ⨝ Cities
• The city,postcode duplications are avoided, but there are other problems…
Page 30

# Another BCNF Normalisation example

• `Bookings(code,name,day,timeslot,room,seats)`
• code → name
room → seats
day timesslot room → code
day timeslot code → room
Page 31

# Multivalued Dependencies (MVDs)

• Returning to the 1NF example:
• 1NF
idnamephoneprogram
1Alice1234,
1235
TKDAT
2Bob4567TKITE
3Carl9870MPSC
1NF
idnamephoneprogram
1Alice1234TKDAT
1Alice1235TKDAT
2Bob4567TKITE
3Carl9870MPSC
• In some sense, id still determines the phone numbers, even if there is more than one.
• This is called a multivalued dependency: idphone
Page 32

## Another Example

• Courses
coursebookauthorteacher
DatabasesDTCBUllmanJonas
DatabasesDTCBUllmanThomas
ReglerteknikRTB 1Author1Lennart
ReglerteknikRTB 2Author2Lennart
• courseteacher
• coursebook author
Page 33

# Properties of Multivalued Dependencies

• Every FD is also a MVD.
• MVDs do not have the same properties as normal FDs.
• XY Z can not be split into XY and XZ
• Complement: if XY, then XS–Y
• Transitivity: if XY and if YZ then XZ–Y
Page 34

# 4NF: Fourth Normal Form

• Requirement: BCNF + for all non-trivial MVDs XY, X is a superkey
Page 35

# 4NF Normalisation algorithm

## To normalise a relation R(S)

• Find a non-trivial MVD XY such that X+ is not a superkey.
• If there is no such MVD, R is already in 4NF, so we are done.
• Otherwise, split R into R1(X ∪ Y) and R2(S–Y).
• Repeat the procedure for R1 and R2.
Page 36

# Good to know

• A relational schema obtained from an E-R diagram is in 3NF.
• And if there are no additional functional dependencies, it is also in BCNF.
• Sometimes BCNF normalisation causes problems, so using 3NF can be the better choice.
• The lecture notes describe a 3NF synthesis algorithm that starts from a the minimal basis of a set FDs.
Page 37

# Database Design workflow

• Start by creating an E-R model.
• Obtain a relational schema from the E-R model.