8b4c1724380a4997c89995603a2a411b.ppt
- Количество слайдов: 51
Database Normalization
Normalization • Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data • Normalization is a multi-step process beginning with an “unnormalized”relation
Definition • • • This is the process which allows to remove out redundant data within the database. This involves restructuring the tables to successively meeting higher forms of Normalization. A properly normalized database should have the following characteristics – – Scalar values in each fields Absence of redundancy. Minimal use of null values. Minimal loss of information.
Results of Normalization • Removes the following modification anomalies (integrity errors) with the database – Insertion – Deletion – Update
ANOMALIES • Insertion – inserting one fact in the database requires knowledge of other facts unrelated to the fact being inserted • Deletion – Deleting one fact from the database causes loss of other unrelated data from the database • Update – Updating the values of one fact requires multiple changes to the database
Levels of Normalization 1 NF 2 NF 3 NF 4 NF 5 NF Each higher level is a subset of the lower level
First Normal Form (1 NF) A table is considered to be in 1 NF if all the fields contain only scalar values (as opposed to list/group of values). Example (Not 1 NF) ISBN Title Au. Name Au. Phone Pub. Name Pub. Phone Price 0 -32132 -1 Balloon Sleepy, Snoopy, Grumpy 321 -1111, 232 -234 -1234, 665 -235 -6532 Small House 714 -0000 $34. 00 0 -55 -123456 -9 Main Street Jones, Smith 123 -3333, 654 -223 -3455 Small House 714 -0000 $22. 95 0 -123 -45678 -0 Ulysses Joyce 666 -6666 Alpha Press 999 -9999 $34. 00 1 -22 -233700 -0 Visual Basic Roman 444 -4444 Big House 123 -456 -7890 $25. 00 Author and Au. Phone columns are not scalar
Unnormalized Relation
First Normal Form • In most definitions of the relational model – All data values should be atomic – This means that table entries should be single values, not sets or composite objects • A relation is said to be in first normal form (1 NF) if all data values are atomic
First Normal Form
1 NF Storage Anomalies • Insertion: A new patient has not yet undergone surgery -hence no surgeon # -- Since surgeon # is part of the key, we cannot insert. • Insertion: If a surgeon is newly hired and has not operated yet -- there will be no way to include that person in the database. • Update: If a patient comes in for a new procedure, and has moved, we need to change multiple address entries. • Deletion (type 1): Deleting a patient record may also delete all info about a surgeon. • Deletion (type 2): When there are functional dependencies (like side effects and drug) changing one item eliminates other information.
Functional Dependencies 1. If one set of attributes in a table determines another set of attributes in the table, then the second set of attributes is said to be functionally dependent on the first set of attributes. Example 1 ISBN Title Price 0 -32132 -1 Balloon $34. 00 0 -55 -123456 -9 Main Street $22. 95 0 -123 -45678 -0 Ulysses $34. 00 1 -22 -233700 -0 Visual Basic $25. 00 Table Scheme: {ISBN, Title, Price} Functional Dependencies: {ISBN} {Title} {ISBN} {Price}
Functional Dependency definition • A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y • X Y holds if whenever two tuples have the same value for X, they must have the same value for Y If t 1[X]=t 2[X], then t 1[Y]=t 2[Y] in any relation instance r(R) • X Y in R specifies a constraint on all relation instances r(R) • FDs are derived from the real-world constraints on the attributes
Examples of FD constraints • Social Security Number determines employee name SSN ENAME • Project Number determines project name and location PNUMBER {PNAME, PLOCATION} • Employee SSN and project number determines the hours per week that the employee works on the project {SSN, PNUMBER} HOURS
Additional Useful Inference Rules • Decomposition – If X YZ, then X Y and X Z • Union – If X Y and X Z, then X YZ • Psuedotransitivity – If X Y and WY Z, then WX Z
Functional Dependencies Example 2 Pub. ID Pub. Name Pub. Phone 1 Big House 999 -9999 2 Small House 123 -456 -7890 3 Alpha Press 111 -1111 Table Scheme: {Pub. ID, Pub. Name, Pub. Phone} Functional Dependencies: {Pub. Id} {Pub. Phone} {Pub. Id} {Pub. Name} {Pub. Name, Pub. Phone} {Pub. ID} Example 3 Au. ID Au. Name Au. Phone 1 Sleepy 321 -1111 2 Snoopy 232 -234 -1234 3 Grumpy 665 -235 -6532 4 Jones 123 -3333 5 Smith 654 -223 -3455 6 Joyce 666 -6666 7 Roman 444 -4444 Table Scheme: {Au. ID, Au. Name, Au. Phone} Functional Dependencies: {Au. Id} {Au. Phone} {Au. Id} {Au. Name} {Au. Name, Au. Phone} {Au. ID}
Second Normal Form (2 NF) For a table to be in 2 NF, there are two requirements – – The database is in first normal form All nonkey attributes in the table must be functionally dependent on the entire primary key Note: Remember that we are dealing with non-key attributes Example 1 (Not 2 NF) Scheme {Title, Pub. Id, Au. Id, Price, Au. Address} 1. 2. 3. 4. 5. Key {Title, Pub. Id, Au. Id} {Title, Pub. Id, Au. ID} {Price} {Au. ID} {Au. Address} Au. Address does not belong to a key Au. Address functionally depends on Au. Id which is a subset of a key
Second Normal Form (2 NF) Example 2 (Not 2 NF) Scheme {City, Street, House. Number, House. Color, City. Population} 1. 2. 3. 4. 5. key {City, Street, House. Number} {House. Color} {City} {City. Population} City. Population does not belong to any key. City. Population is functionally dependent on the City which is a proper subset of the key Example 3 (Not 2 NF) Scheme {studio, movie, budget, studio_city} 1. 2. 3. 4. 5. Key {studio, movie} {budget} {studio} {studio_city} studio_city is not a part of a key studio_city functionally depends on studio which is a proper subset of the key
Second Normal Form • A relation is said to be in Second Normal Form when every non-key attribute is fully functionally dependent on the primary key. – That is, every non-key attribute needs the full primary key for unique identification
Why is this not in 2 NF?
Second Normal Form
Second Normal Form
Second Normal Form
1 NF Storage Anomalies Removed • Insertion: Can now enter new patients without surgery. • Insertion: Can now enter Surgeons who have not operated. • Deletion (type 1): If Charles Brown dies, the corresponding tuples from Patient and Surgery tables can be deleted without losing information on David Rosen. • Update: If John White comes in for third time, and has moved, we only need to change the Patient table
2 NF Storage Anomalies • Insertion: Cannot enter the fact that a particular drug has a particular side effect unless it is given to a patient. • Deletion: If John White receives some other drug because of the penicillin rash, and a new drug and side effect are entered, we lose the information that penicillin cause a rash • Update: If drug side effects change (a new formula) we have to update multiple occurrences of side effects.
2 NF - Decomposition 1. 2. 3. If a data item is fully functionally dependent on only a part of the primary key, move that data item and that part of the primary key to a new table. If other data items are functionally dependent on the same part of the key, place them in the new table also Make the partial primary key copied from the original table the primary key for the new table. Place all items that appear in the repeating group in a new table Example 1 (Convert to 2 NF) Old Scheme {Title, Pub. Id, Au. Id, Price, Au. Address} New Scheme {Title, Pub. Id, Au. Id, Price} New Scheme {Au. Id, Au. Address}
2 NF - Decomposition Example 2 (Convert to 2 NF) Old Scheme {Studio, Movie, Budget, Studio. City} New Scheme {Movie, Studio, Budget} New Scheme {Studio, City} Example 3 (Convert to 2 NF) Old Scheme {City, Street, House. Number, House. Color, City. Population} New Scheme {City, Street, House. Number, House. Color} New Scheme {City, City. Population}
Third Normal Form (3 NF) This form dictates that all non-key attributes of a table must be functionally dependent on a candidate key i. e. there can be no interdependencies among non-key attributes. For a table to be in 3 NF, there are two requirements – – The table should be second normal form No attribute is transitively dependent on the primary key Example (Not in 3 NF) Scheme {Title, Pub. ID, Page. Count, Price } 1. 2. 3. 4. 5. Key {Title, Pub. Id} {Page. Count} {Price} Both Price and Page. Count depend on a key hence 2 NF Transitively {Title, Pub. ID} {Price} hence not in 3 NF
Third Normal Form (3 NF) Example 2 (Not in 3 NF) Scheme {Studio, Studio. City, City. Temp} 1. 2. 3. 4. 5. 6. Primary Key {Studio} {Studio. City} {City. Temp} {Studio} {City. Temp} Both Studio. City and City. Temp depend on the entire key hence 2 NF City. Temp transitively depends on Studio hence violates 3 NF Example 3 (Not in 3 NF) Scheme {Building. ID, Contractor, Fee} 1. 2. 3. 4. 5. 6. Building. ID Contractor Fee 100 Randolph 1200 150 Ingersoll 1100 Randolph 1200 Pitkin 1100 Randolph 1200 Primary Key {Building. ID} {Contractor} 250 {Contractor} {Fee} 300 {Building. ID} {Fee} Fee transitively depends on the Building. ID Both Contractor and Fee depend on the entire key hence 2 NF
Third Normal Form • A relation is said to be in Third Normal Form if there is no transitive functional dependency between non-key attributes – When one non-key attribute can be determined with one or more non-key attributes there is said to be a transitive functional dependency. • The side effect column in the Surgery table is determined by the drug administered – Side effect is transitively functionally dependent on drug so Surgery is not 3 NF
Why is this not in 3 NF?
Third Normal Form
Third Normal Form
2 NF Storage Anomalies Removed • Insertion: We can now enter the fact that a particular drug has a particular side effect in the Drug relation. • Deletion: If John White receives some other drug as a result of the rash from penicillin, the information on penicillin and rash is maintained. • Update: The side effects for each drug appear only once.
Transitive Dependency Table: Student-Dorm-Fee SID DORM FEE 101 Oracle 1000 102 Oracle 1000 103 DB 2 800 104 DB 2 800 105 Sybase 500 Bordoloi
Transitive Dependency • Occurs when a non-key attribute is functionally dependent on one or more non-key attributes. Example: HOUSING (SID, DORM, FEE) PRIMARY KEY: SID FUNCTIONAL DEPENDENCIES: SID DORM FEE • A table is in 3 NF if it is in 2 NF and has no transitive dependencies Bordoloi BUILDING FEE
3 NF • Besides SID, FEE is also functionally dependent on DORM which is a non-key attribute. • A table is in 3 NF if it is in 2 NF and has no transitive Dependencies. Bordoloi
3 NF STUDENT_DORM SID 101 102 103 DORM Oracle DB 2 104 105 DB 2 Sybase DOM_FEE DORM Oracle DB 2 Sybase FEE 1000 800 500
3 NF - Decomposition 1. 2. 3. Move all items involved in transitive dependencies to a new entity. Identify a primary key for the new entity. Place the primary key for the new entity as a foreign key on the original entity.
3 NF - Decomposition Example 2 (Convert to 3 NF) Old Scheme {Studio, Studio. City, City. Temp} New Scheme {Studio, Studio. City} New Scheme {Studio. City, City. Temp} Example 3 (Convert to 3 NF) Old Scheme {Building. ID, Contractor, Fee} New Scheme {Building. ID, Contractor} New Scheme {Contractor, Fee} Building. ID Contractor Fee 100 Randolph 1200 150 Ingersoll 1100 200 Randolph Pitkin 1100 250 Pitkin 300 Randolph
Boyce-Codd Normal Form (BCNF) Boyce-Codd normal form (BCNF) A relation is in BCNF, if and only if, every determinant is a candidate key. The difference between 3 NF and BCNF is that for a functional dependency A B, 3 NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key, whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key.
Boyce-Codd Normal Form (BCNF) Example 1 - Address (Not in BCNF) Scheme {City, Street, Zip. Code } 1. Key 1 {City, Street } 2. Key 2 {Zip. Code, Street} 3. No non-key attribute hence 3 NF 4. {City, Street} {Zip. Code} 5. {Zip. Code} {City} 6. Dependency between attributes belonging to a key
Boyce Codd Normal Form (BCNF) Example 2 - Movie (Not in BCNF) Scheme {Movie. Title, Movie. ID, Person. Name, Role, Payment } 1. 2. 3. 4. 5. Key 1 {Movie. Title, Person. Name} Key 2 {Movie. ID, Person. Name} Both role and payment functionally depend on both candidate keys thus 3 NF {Movie. ID} {Movie. Title} Dependency between Movie. ID & Movie. Title Violates BCNF Example 3 - Consulting (Not in BCNF) Scheme {Client, Problem, Consultant} 1. 2. 3. 4. 5. 6. Key 1 {Client, Problem} Key 2 {Client, Consultant} No non-key attribute hence 3 NF {Client, Problem} {Consultant} {Client, Consultant} {Problem} Dependency between attributess belonging to keys violates BCNF
BCNF - Decomposition 1. 2. Place the two candidate primary keys in separate entities Place each of the remaining data items in one of the resulting entities according to its dependency on the primary key. Example 1 (Convert to BCNF) Old Scheme {City, Street, Zip. Code } New Scheme 1 {Zip. Code, Street} New Scheme 2 {City, Street} • Loss of relation {Zip. Code} {City} Alternate New Scheme 1 {Zip. Code, Street } Alternate New Scheme 2 {Zip. Code, City}
1. 2. 3. 4. Decomposition – Loss of Information If decomposition does not cause any loss of information it is called a lossless decomposition. If a decomposition does not cause any dependencies to be lost it is called a dependency-preserving decomposition. Any table scheme can be decomposed in a lossless way into a collection of smaller schemas that are in BCNF form. However the dependency preservation is not guaranteed. Any table can be decomposed in a lossless way into 3 rd normal form that also preserves the dependencies. • 3 NF may be better than BCNF in some cases Use your own judgment when decomposing schemas
BCNF - Decomposition Example 2 (Convert to BCNF) Old Scheme {Movie. Title, Movie. ID, Person. Name, Role, Payment } New Scheme {Movie. ID, Person. Name, Role, Payment} New Scheme {Movie. Title, Person. Name} • Loss of relation {Movie. ID} {Movie. Title} New Scheme {Movie. ID, Person. Name, Role, Payment} New Scheme {Movie. ID, Movie. Title} • We got the {Movie. ID} {Movie. Title} relationship back Example 3 (Convert to BCNF) Old Scheme {Client, Problem, Consultant} New Scheme {Client, Problem}
Fourth Normal Form (4 NF) • Fourth normal form eliminates independent many-to-one relationships between columns. • To be in Fourth Normal Form, – – a relation must first be in Boyce-Codd Normal Form. a given relation may not contain more than one multi-valued attribute. Example (Not in 4 NF) Scheme {Movie. Name, Screening. City, Genre) Primary Key: {Movie. Name, Screening. City, Genre) 1. All columns are a part of the only candidate key, hence BCNF 2. Many Movies can have the same Genre 3. Many Cities can have the same movie Movie Screening. City Hard Code Los Angles 4. Violates 4 NF Genre Comedy Hard Code New York Comedy Bill Durham Santa Cruz Drama Bill Durham Drama The Code Warrier New York Horror
Fourth Normal Form (4 NF) Example 2 (Not in 4 NF) Scheme {Manager, Child, Employee} 1. 2. 3. 4. Manager Primary Key {Manager, Child, Employee} Each manager can have more than one child Each manager can supervise more than one employee 4 NF Violated Employee Jim Child Beth Mary Bob Jane Mary NULL Adam Alice Example 3 (Not in 4 NF) Scheme {Employee, Skill, Foreign. Language} 1. 2. 3. 4. Primary Key {Employee, Skill, Language } Each employee can speak multiple languages Each employee can have multiple skills Thus violates 4 NF Employee Skill Language 1234 Cooking French 1234 Cooking German 1453 Carpentry Spanish 1453 Cooking Spanish 2345 Cooking Spanish
4 NF - Decomposition 1. 2. Move the two multi-valued relations to separate tables Identify a primary key for each of the new entity. Example 1 (Convert to 3 NF) Old Scheme {Movie. Name, Screening. City, Genre} New Scheme {Movie. Name, Screening. City} New Scheme {Movie. Name, Genre} Movie Genre Movie Screening. City Hard Code Comedy Hard Code Los Angles Bill Durham Drama Hard Code New York The Code Warrier Horror Bill Durham Santa Cruz Bill Durham The Code Warrier New York
4 NF - Decomposition Example 2 (Convert to 4 NF) Old Scheme {Manager, Child, Employee} Jim Child Beth New Scheme {Manager, Child} Mary Bob New Scheme {Manager, Employee} Example 3 (Convert to 4 NF) Old Scheme {Employee, Skill, Foreign. Language} New Scheme {Employee, Skill} New Scheme {Employee, Foreign. Language} Employee Skill Employee Language 1234 Cooking 1234 French 1453 Carpentry 1234 German 1453 Cooking 1453 Spanish 2345 Cooking 2345 Spanish Manager Employee Jim Alice Mary Jane Mary Manager Adam
Fifth Normal Form (5 NF) • Fifth normal form is satisfied when all tables are broken into as many tables as possible in order to avoid redundancy. Once it is in fifth normal form it cannot be broken into smaller relations without changing the facts or the meaning.
8b4c1724380a4997c89995603a2a411b.ppt