Relation Database Design
Pitfalls in Relational Database Design
A bad design may lead to:
- Repetition of information
- Inability to represent certain information
So the design goal are:
- Avoid redundant data
- Ensure that relationships among attributes are represented
- Facilitate the checking of updates for violation of database integrity constraints.
Functional Dependencies
Let be a relation schema The functional dependency holds on if and only if for any legal relations whenever any two tuples and of agree on the attributes they also agree on the attributes : Using functional dependency define key.
is a super key for relation schema if and only if .
is a candidate key for if and only if:
- for no ,
We use functional dependencies to:
- Test relations to see if they are legal under a given set of functional dependencies
- Specify constraints on the set of legal relations
Trivial dependency: a functional dependency is trivial if it is satisfied by all instances of a relation. In general, is trivial if .
Transitive dependency: a functional dependency is transitive if then is transitive dependency on .
Partial dependency: a functional dependency is partial if: is partially dependent on .
Logical imply: a function dependency is logically implied by , the set of functional dependencies on , if every relation instance satisfies .
Closure of a Set of Functional Dependencies: for , the set of all functional dependencies logically implied by is the closure of , denoting the closure by .
Armstrong's Axioms:
- Reflexivity:
- Augmentation:
- Transitivity:
And some additional rules:
-
union:
-
decomposition:
-
pseudo transitivity:
Closure of attributes, the set of attributes that are functionally determined by under . And using attribute closure:
- Test for superkey: if is a superkey, we computer and check if contains all attributes of .
- Test functional dependencies: To check if a functional dependency holds, just check if
- Computing closure of
Equvialent functional dependencies: Let and be two sets of functional dependencies: if , then and are equvialent.
Extraneous Attributes
An attribute of a functional dependency is said to be extraneous if remove it without changing the closure of the set of functional dependencies.
Consider a set of of functional dependencies and the functional dependency in :
- If and logical implies , is an extraneous attribute.
- If and logical implies , is an extraneous attribute.
Giving two ways to judge whether attribute is a extraneous attribute:
-
Attribute is extraneous in if and logically implies
-
Attibute is extraneous in if and the set of functional dependencies logically implies .
Canonical/Minimal Cover
Set of functional dependencies may contains redundant dependencies that can be inferred from the others.
A canonical cover of , a.k.a is a minimal set of functional dependencies equivalent to , with no redundant dependencies or having redundant parts of dependencies.
No functional dependency in contains an extraeouse attribute. Each left side of fucntional dependenyc in is unique.
Normal Forms
First Normal Form
Domain is atomic if its elements are considered to be indivisible units.
A relational schema is in first normal form if the domains of all attributes of are atomic.
Second Normal Form
A relation schema is in second normal form if each attribute in meets one of the following criteria:
- It apperars in a candiate key
- It is not partially dependent on a candidate key
In practice, if is a first normal form and every non-key attribute is not a partial dependency of candidate key , is third normal form.
Third Normal Form
A realtion schema is in third normal form if for all: at least one of the following holds:
- is trivial
- is a superkey for
- Each attribute in is contained in a candidate key for .
In practise, if is a second normal form and every non-key attribute is not a tranitive dependency of candidate key, is third normal form.
Boyce-Codd Normal Form
A relation schema is in boyce codd normal form with respect to a set of functional dependencies if for all functional dependencies in of the form where and at least one of the following holds:
- is trivial
- is a superkey for
In practise, if any non-keys attribute directly dpendent on all candidate keys.
If a relation is in boyce codd normal form, it is in third normal form.
For emaxple, for relation
SPC(SNO, PNO, CNO)
and functional dependencies: This relation is third normal form but not boyce-codd normal form as is not a superkey of .
求一个关系模式的所有候选键:
将关系模式中的所有属性分成四类:
- L类:只出现在函数依赖左部的属性
- R类:只出现在函数依赖右部的属性
- N类:在函数依赖左右均未出现的属性
- LR类:在函数依赖左右两边均出现的属性
如果LN的闭包包含了R中的所有属性,那么LN就是关系模式的所有候选键。
但是LN的闭包中不包含R中所有的属性,那么需要从LR类属性中添加。
Decomposition
If a relation is not in a good form, decompose it into a set of relations: And all attributes of an original schema must appear in the decomposition:
Lossless-join Decomposition
If a decomposition of into a set of relations has: the decomposition is lossy join decompositions.
A descomposition of into and is lossless join if and only if at least one fo the following dependencies is in :
Dependency Preservation
Decompose a relation schema with a set of fucntional dependencies into , ... . Let be the set of dependencies that include only attributes in .
If this decomposition is dependency preservation.
Follow the algorithm to test for dependency preservation:
Third Normal Function Decompition algorithm
The algorithm ensures:
- Each relation schema is in third normal form
- Decomposition is dependency preservation
- DeComposition is lossless-join
For example:
Firstly, get the canonical cover: Secondly, get sets:
Thirdly, check the candidate key.
Boyce-codd Decomposition Algorithm
The main problem in BCNF decomposition is to get dependency function set after decomposition in the .
Overall Database Design Process
Design Goals
Goal for a relation database design is to :
- BCNF
- Lossless join
- Dependency preservation
And if we cannot achieve this. we accept one of:
- Lack of dependency preservation
- Redundancy due to use of 3NF
When an E-R diagram is carefully designed, identifuing all entities correctly, the tables generated from the E-R diagram should not need further normalization. However, in a real design there can be a function dependence from non-key attributes of an entity to other attributes of the entity. And we may want to use non-normalized schema for performance.