2020-06-04 20:17

Page 1
- Lecture notes: chapter 5
- Book: chapter 3

- Consider a relational schema R(a,b,c,d,e).
- What does a functional dependency
`a b → c d`mean?- If two rows agree on the values of
`a`and`b`, they should also agree on the values of`c`and`d`. - The values of
`a`and`b`*determine*the values of`c`and`d`.

- If two rows agree on the values of

Schema:Countries country currency value Finland EUR 1.09 Estonia EUR 1.09 Sweden SEK 0.12 `Countries(country,currency,value)`

- Functional Dependencies:
- country→currency

currency→value

- country→currency
- If country is the primary key then country→currency will be trivially satisfied.
- But how do we ensure that currency→value is satisfied?
- The functional dependencies help us identify a problematic schema.

a → b c is the same as a → b

a → c( `a`determines both`b`and`c`)- but
a b → c is **not**the same asa → c

b → c`a b → c`means that`a`and`b`together determine`c`.

a → b

b → cimplies a → c - A more complex example:
a → c

b → d

c d → eimplies a b → e

a b → d | implies | a b c → d |

a → a an attribute always determines itself a b → a two or more attributes determine a subset of the attributes

(reflexivity + augmentation)

- If F is a set of functional dependencies,
- then
*minimal basis F*is a simplified but equivalent set of dependencies^{-}- no trivial dependencies
- no dependencies that are implied by transitivity
- no dependencies that are augmentations of other dependencies

**Closure**- X
^{+}: the set of attributes determined (directly or indirectly) by X.

- X
**Superkey**- X is a superkey if X
^{+}includes all attributes in the relation.

- X is a superkey if X
**Key**- A minimal superkey.
- Candidates for primary keys.

- Example 2 (continuing from last time)

Schema:Countries country currency value Finland EUR **1.09**Estonia EUR **1.09**Sweden SEK 0.12 `Countries(country,currency,value)`

- Functional Dependencies:
- country→currency

currency→value

country→value (implied by transitivity)

- country→currency
- Closures:
- country+ = {country,currency,value}

currency+ = {currency,value}

value+ = {value}

- country+ = {country,currency,value}
- Superkeys: {country}, {country,currency}, {country,value}, {country,currency,value}
- Key: {country}
*Non-trivial FD that is not a superkey*: currency→value

JustCountries country currency Finland EUR Estonia EUR Sweden SEK Currencies currency value EUR 1.09 SEK 0.12 - Countries = JustCountries ⨝ Currencies
- We can insert new info about which currency is used in a country without knowing the exchange rate.
- We can insert the exchange rate for a new currency without knowing which country it is used in.

Consider a database to keep track of room bookings:

Bookings code name day timeslot room seats TDA357 Databaser Tuesday 0 GD 236 TDA357 Databaser Tuesday 1 GD 236 ERE033 Reglerteknik Tuesday 0 HB4 224 ERE033 Reglerteknik Friday 0 GD 236 - Problems?

Consider a database to keep track of room bookings:

Bookings code name day timeslot room seats TDA357 Databaser Tuesday 0 GD 236 TDA357 Databaser Tuesday 1 GD 236 ERE033 Reglerteknik Tuesday 0 HB4 224 ERE033 Reglerteknik Friday 0 GD 236 - There is some redundancy
- For example, the number of seats in GD is mentioned 3 times.

- Problems:
**Insert anomaly**: we can't insert the number of seats in a room if there is no booking in that room**Update anomaly**: we get an inconsistency if the number of seats in GD is changed in only one of the rows.**Delete anomaly**: the number of seats in a room is lost of all bookings in that room are removed.

Courses code name TDA357 Databases ERE033 Reglerteknik Rooms room seats GD 236 HB4 224 Bookings code day timeslot room TDA357 Tuesday 0 GD TDA357 Tuesday 1 GD ERE033 Tuesday 0 HB4 ERE033 Friday 0 GD - Courses ⨝ Bookings ⨝ Rooms gives us the original table
- This is probably what we would get if we started from an Entity-Relationship
model
- What would the E-R diagram look like?
- What are the primary keys in these tables?

- Consider the
*functional dependencies*between the attributes. - This will reveal that the table is not in a
*normal form*. - Split it into smaller tables that are in a normal form.

- 1NF: First Normal Form (1970)
- 2NF: Second Normal Form (1971)
**3NF**: Third Normal Form (1971)**BCNF**(3½NF): Boyce-Codd Normal Form (1974)**4NF**: Fourth Normal Form (1977)- …

- Requirement:
*Only atomic values in the tables* ~~1NF~~id name phone program 1 Alice 1234,

1235TKDAT 2 Bob 4567 TKITE 3 Carl 9870 MPSC ⟹ 1NF id name phone program 1 Alice 1234 TKDAT 1 Alice 1235 TKDAT 2 Bob 4567 TKITE 3 Carl 9870 MPSC

- Another way to convert to 1NF
~~1NF~~id name phone program 1 Alice 1234,

1235TKDAT 2 Bob 4567 TKITE 3 Carl 9870 MPSC ⟹ 1NF id name phone1 phone2 program 1 Alice 1234 1235 TKDAT 2 Bob 4567 TKITE 3 Carl 9870 MPSC

- To free the collection of relations from undesirable insertion, update and deletion dependencies.
- To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs.
- To make the relational model more informative to users.
- To make the collection of relations neutral to the query statistics,
where these statistics are liable to change as time goes by.

(Codd, E.F.: "Further Normalization of the Data Base Relational Model")

- Requirement:
*1NF + No Partial Dependencies* - Example:
`CompletedCourses(`

__student__,__course__,grade,credits) CompletedCourses ( ~~2NF~~)student course grade credits 1 C1 G 5.0 2 C2 VG 7.5 1 C4 G 9.0 4 C3 G 5.0 4 C1 VG 5.0 2 C5 G 9.0 - Functional Dependencies:
- student,course→grade

course→credits (partial dependency!)

- student,course→grade

`Grades(`

__student__,__course__,grade)

Credits(__cource__,credits)Grades (2NF) student course grade 1 C1 G 2 C2 VG 1 C4 G 4 C3 G 4 C1 VG 2 C5 G Credits (2NF) course credits C1 5.0 C2 7.5 C3 5.0 C4 9.0 C5 9.0 - CompletedCourses = Grades ⨝ Credits

- Requirement:
*2NF + no transitive dependencies from the primary key to non-key attributes* HomeAddress ( ~~3NF~~)__id__name postcode city 1 Alice 41279 Göteborg 2 Bob 43168 Mölndal 3 Carl 41279 Göteborg - Functional dependencies:
- id→name,postcode,city

postcode→city

- id→name,postcode,city
*Transitive dependency*: id→postcode→city

HomeCodes (3NF) __id__name postcode 1 Alice 41296 2 Bob 43168 3 Carl 41296 PostcodeCities (3NF) __postcode__city 41279 Göteborg 43168 Mölndal - HomeAddress = HomeCodes ⨝ PostcodeCities

- Consider the following schema and functional dependencies:
`Postcodes(city,street,postcode)`

- city street → postcode

postcode → city - {city,street} is a key,
since {city,street}
^{+}= {city,street,postcode}

- city street → postcode
- Is it in 2NF? Is it free from partial dependencies?
*Yes!* - Is it in 3NF? Is it free from transitive dependencies from the key
to non-key attributes?
*Yes!*

Postcodes (3NF) city street postcode Göteborg Framnäsgatan 41264 Göteborg Rännvägen 41258 Göteborg Hörsalsvägen 41258 Göteborg Barnhusgatan 41102 Stockholm Barnhusgatan 11123 - The same city,postcode pair can appear many times.
- But postcode → city, so there is some redundancy.

- Requirement:
*for each non-trivial functional dependency X→Y, X is a superkey* - BNFC ⇒ 2NF, 3NF

- Find a non-trivial FD X→y such that X
^{+}≠ S (X^{+}is not a superkey) - If there is no such FD, R is already in BCNF, so we are done.
- Otherwise, split R into R
_{1}(X^{+}) and R_{2}(X ∪ (S – X^{+})). - Repeat the procedure for R
_{1}and R_{2}.

`Postcodes(city,street,postcode)`

- city street → postcode

postcode → city

- city street → postcode
- Pick postcode → city:
- postcode
^{+}= {postcode,city} ≠ {city,street,postcode} - A BNFC violation
- Split into {postcode,city} and {postcode,street}
- These have no BNFC violations.

- postcode

Streets (BCNF) street postcode Framnäsgatan 41264 Rännvägen 41258 Hörsalsvägen 41258 Barnhusgatan 41102 Barnhusgatan 11123 Cities (BCNF) city __postcode__Göteborg 41264 Göteborg 41258 Göteborg 41102 Stockholm 11123 - Postcodes = Streets ⨝ Cities
- The city,postcode duplications are avoided, but there are other problems…

`Bookings(code,name,day,timeslot,room,seats)`

- code → name

room → seats

day timesslot room → code

day timeslot code → room

- code → name
- …

- Returning to the 1NF example:
~~1NF~~id name phone program 1 Alice 1234,

1235TKDAT 2 Bob 4567 TKITE 3 Carl 9870 MPSC ⟹ 1NF id name phone program 1 Alice 1234 TKDAT 1 Alice 1235 TKDAT 2 Bob 4567 TKITE 3 Carl 9870 MPSC - In some sense, id still determines the phone numbers, even if there is more than one.
- This is called a
*multivalued dependency*: id↠phone

Courses course book author teacher Databases DTCB Ullman Jonas Databases DTCB Ullman Thomas Reglerteknik RTB 1 Author1 Lennart Reglerteknik RTB 2 Author2 Lennart - course↠teacher
- course↠book author

- Every FD is also a MVD.
- MVDs do not have the same properties as normal FDs.
- X↠Y Z can
**not**be split into X↠Y and X↠Z - Complement: if X↠Y, then X↠S–Y
- Transitivity: if X↠Y and if Y↠Z then X↠Z–Y
- …

- X↠Y Z can

- Requirement:
*BCNF + for all non-trivial MVDs X↠Y, X is a superkey*

- Find a non-trivial MVD X↠Y such that X
^{+}is not a superkey. - If there is no such MVD, R is already in 4NF, so we are done.
- Otherwise, split R into R
_{1}(X ∪ Y) and R_{2}(S–Y). - Repeat the procedure for R
_{1}and R_{2}.

- A relational schema obtained from an E-R diagram is in 3NF.
- And if there are no additional functional dependencies, it is also in BCNF.

- Sometimes BCNF normalisation causes problems, so using 3NF can be the better choice.
- The lecture notes describe a
*3NF synthesis algorithm*that starts from a the minimal basis of a set FDs.

- Start by creating an E-R model.
- Obtain a relational schema from the E-R model.
- Identify additional functional dependencies.
- Use them to add UNIQUE constraints.
- Apply BNCF decomposition when appropriate.

- Jonas Duregård's slides
- Normal Forms in DBMS (geeksforgeeks.org)
- Database normalization (Wikipedia)
- Multivalued dependency (Wikipedia)