
fb8c7e450d0c930aeee9066e4928520e.ppt
- Количество слайдов: 56
Lecture No. 3 - Conceptual Design Data. Base Design Modelling Normalisation CSE 3180 Semester 1 2005 week 3 / 1
Notice • Thursday 31 st March is the Semester 1 HECS census date • It is also the last day to discontinue semester 1 or full-year units without ‘WD’ showing on your academic record • Last day to discontinue Semester 1 units without incurring fees • Last day to discontinue full-year units without a ‘WD’ showing on the academic record • Last day to qualify fro a 100% refund of student amenities fee relevant to Semester 1. CSE 3180 Semester 1 2005 week 3 / 2
Lecture 3 This lecture will introduce • More aspects of database design • Modelling and transformations • Normalisation • Some examples of modelling • Some pitfalls CSE 3180 Semester 1 2005 week 3 / 3
Design Stages Requirements Analysis Conceptual Design Physical Design Implementation Design CSE 3180 Semester 1 2005 week 3 / 4
Conceptual Design - Create an enterprise model - Normalise user views - Integrate user views - Create a database design - Review CSE 3180 Semester 1 2005 week 3 / 5
Modelling User Requirements DATA Entity Relationship Modelling (Semantic) Normalisation (Minimality) PROCESS Data Flow Analysis Procedure Formation Functional Analysis CSE 3180 Semester 1 2005 week 3 / 6
Well Structured Data ADVANTAGES • Better integration of applications through shared access to common data. · More adaptable systems because files and databases will follow the natural structures inherent in data rather than application oriented relationships. · Processing should be simpler if data is held in its simplest least redundant form. CSE 3180 Semester 1 2005 week 3 / 7
Data Centered Development 1. Develop broad data model and broad function model. 2. Partition data model by function. 3. Develop detailed data model by each functional partition. 4. Develop the detailed procedure models by functional partition. CSE 3180 Semester 1 2005 week 3 / 8
Attributes · A data item or element that supports one entity type. STUDENT RESULT Student. No Unit Code Year Semester Grade S 1 CSE 4230 2004 1 N S 1 CSE 4230 2004 2 HD PRIMARY KEY or IDENTIFIER · Some or all of the attributes describing an entity type serve to uniquely identify each entity instance. CSE 3180 Semester 1 2005 week 3 / 9
Value Sets or Domains · Each attribute type describes one entity type and is based on an underlying domain or value set. EMPLOYEE Emp. No E 1 Emp_Gender Female ? Range M/F Emp_Name Jones Present Emp_Hire_Date 20/6/1989 Date OK ? DEPENDENT Emp. No Depend_Name Depend_Gender Birthdate E 1 Smith ? Range ? Present Male M/F 23/6/1946 Date OK ? CSE 3180 Semester 1 2005 week 3 / 10
Value Sets or Domains • Domains in Relational Database can be extensive and complex. • A ‘domain’ (a restriction of value or expression) can be applied to the result of a function or of a derived value. For example, the multiplication of a person’s age by the person’s I. D. would not lead to a realistic value • The addition (+) of an I. D. and date of birth would also be a non realistic value CSE 3180 Semester 1 2005 week 3 / 11
Value Sets or Domains • Domains can be used to limit which attributes can be associated with other attributes - this leads to interesting and complex processes - Rules and Procedures (Ingres) and Triggers and Constraints (Oracle). CSE 3180 Semester 1 2005 week 3 / 12
Synonyms or Aliases · Be careful to differentiate between things and their names. ITEM = PART = PRODUCT · ? ? ? SYNONYM or ALIAS The same object called different names by different people. · This problem concerns entity types, relationship types and attribute types. CSE 3180 Semester 1 2005 week 3 / 13
Homonyms Different things called the same name by different people ? ? EMPLOYEE START DATE · In the Personnel area this may mean the date an employee started with the company. · In a given department this may mean the date an employee started with the department. · This problem also concerns entity types, relationships types and attribute types. CSE 3180 Semester 1 2005 week 3 / 14
Design and Data Structures Data structures are the mechanisms which hold databases together. Data structures (for the ANSI/SPARC standard) are defined in the internal model level and implemented in the physical data organisation. Data structures are often hidden from the application programmer, since they are primarily used by the DBMS and Operating System. Understanding data structures is important for performance reasons, to improve program design and allow easier communication with DBMS specialists. CSE 3180 Semester 1 2005 week 3 / 15
Goals of Relational Design What Relations should exist and what Attributes should they contain? Avoid Redundancy if possible - minimise storage space Avoid Anomalies Avoid Nulls (if practical - but test first) Avoid Joins which produce unnecessary or confusing rows CSE 3180 Semester 1 2005 week 3 / 16
Some Assumptions A group of attributes has a natural “inherent” structure This structure is independent of the way the data is used Normalisation • introduced by E. F. Codd together with relational database theory • originally Codd defined three normal forms • later expanded to Boyce-Codd and fourth and fifth normal forms • Oracle state that 3 rd Normal Form is adequate for all databases CSE 3180 Semester 1 2005 week 3 / 17
A Sample Data Structure PRIME _MINISTER PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP MARRIAGE PM_NAME SPOUSE_NAME MAR_YR PM_AGE NR_CHILDREN MINISTRY MIN_NR PM_NAME PARTY DAY_COMM MTH_COMM YR_COMM CSE 3180 Semester 1 2005 week 3 / 18
Normalisation Formal measure of why one grouping of attributes may be better than another Each Normal Form requires that a Relation satisfies criteria for that form and this eliminates a different kind of redundancy or anomaly Normalised Relations will remain consistent following database operations and will store each fact only once Database operations applied to unnormalised relations may lead to anomalies PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP CSE 3180 Semester 1 2005 week 3 / 19
Some Anomalies Relation : ASSIGN Person-Id Project_budget Project Time Spent on Project S 75 32 P 1 7 S 75 40 P 2 8 S 79 32 P 1 4 S 79 27 P 3 1 S 80 40 P 2 5 - 17 P 4 - Null Values are considered to be anomalies CSE 3180 Semester 1 2005 week 3 / 20
Anomalies Insertion Anomaly add row (ASSIGN , <S 85, 35, P 1, 9>) - two conflicting budgets for P 1 S 75 32 P 1 7 S 79 32 P 1 4 Deletion Anomaly delete row(ASSIGN, <S 79, 27, P 3, 11>) - removes project budget for P 3 S 79 27 P 3 1 CSE 3180 Semester 1 2005 week 3 / 21
Functional Dependencies - the values of one set of attributes affect the values of another attributes X Y The value of X determines the value of Y The value of Y depends on the value of X The simplest case is where 1 attribute determines another single attribute CSE 3180 Semester 1 2005 week 3 / 22
Functional Dependencies Project Budget Person-Id Project Time Spent on Project Functional Dependency Diagram Project Budget CSE 3180 Semester 1 2005 week 3 / 23
Functional Dependencies • Two attributes are FUNCTIONALLY DEPENDENT if a value for ONE attribute specifies a SINGLE value for the other attribute Grade Rate of Pay Driver 265. 00 Crane Driver 310. 95 Keyboard Operator 215. 55 Systems Analyst 450. 00 The Rate of Pay is functionally dependent on Grade or, Grade determines Rate of Pay or, grade----> Rate of Pay. Grade is a DETERMINANT CSE 3180 Semester 1 2005 week 3 / 24
Functional Dependencies Some Terms: Primary Key Candidate Key Foreign Key Concatenated Key Non-Ambiguous Keys: An attribute (or attributes) of a table is the Primary Key for that table if 1. All attributes in the table are functionally dependent on the Primary Key AND 2. Where the Primary Key is a collection of attributes (concatenated key), no subcollection of the attributes used also exhibits the functional dependence property CSE 3180 Semester 1 2005 week 3 / 25
Functional Dependencies Translation: 1. The primary key will identify a given row (unique) 2. No portion of the primary key can be an independent primary key CSE 3180 Semester 1 2005 week 3 / 26
Minimal FDs Employee Department Employee determines Department Location Department determines Location Employee determines Location - THIS Functional Dependency IS REDUNDANT CSE 3180 Semester 1 2005 week 3 / 27
Normalisation • 1 st Normal Form Repeating groups must not occur Unit CSE 3180 CSE 2138 CSE 9001 Student No. 9142717 9131618 9077184 8967384 8737980 9142717 6932475 Result Name C D P N P P HD Talmun Renoir Gilbey Breton Balzac Talmen Gilbey 1 st Normal Form CSE 3180 Semester 1 2005 week 3 / 28
Normalisation Corrected Table Unit CSE 3180 CSE 2138 CSE 9001 Student No. 9142717 9131618 9077184 8967384 8737980 9142717 6932475 Result C D P N P P HD Name Talmun Renoir Gilbey Breton Balzac Talmen Gilbey Formally expressed as: Results(subject code, studentno, result, name CSE 3180 Semester 1 2005 week 3 / 29
Normalisation • 2 nd Normal Form Designed to eliminate Update Anomalies (4 categories) Examples 1. Update Where a Description alteration requires total file scan and alterations are made 2. Inconsistent Data Possibility of a code having more than one description 3. Additions Condition where a row cannot be fully identified e. g. unit code only 4. Deletions If unit Code CSE 9001 is deleted, location of associated data is not possible CSE 3180 Semester 1 2005 week 3 / 30
Normalisation General Solution: Create additional table(s) using sub-sets of the Primary Key Give each table a Name CSE 3180 Semester 1 2005 week 3 / 31
Normalisation 3 rd Normal Form Final Stage ---> 3 rd Normal Form Watch for Determinants Primary Keys are Determinants Candidate Keys are Determinants A table is in 3 rd Normal Form (3 NF) if (i) It is in 2 nd Normal Form and (ii) Determinants are Candidate Keys CSE 3180 Semester 1 2005 week 3 / 32
Normalisation As a check, review 1. Update 2. Inconsistent Data 3. Additions 4. Deletions for Conditions and Difficulties of access (and retrieval) CSE 3180 Semester 1 2005 week 3 / 33
Normalisation Review A relation is in 3 NF if, and only if, the NONKEY ATTRIBUTES are (1) Mutually Independent (2) Fully Dependent on the Total Primary Key Or: A relation is in 3 NF if, and only if, for the life of the database, each row consists of a Primary Key Value which identifies some Entity instance, together with a set of Zero or more mutually independent attribute values which describe the Entity in some way. (Date, C. J) Or: Each field must represent a fact about the key, the Whole Key and Nothing but the Key. (Kent, W) CSE 3180 Semester 1 2005 week 3 / 34
Normalisation - Examples 1. 1 NF or First Normal Form Each Row MUST CONTAIN the same number of columns Example: Course Instructor table class code C 3576 K 567 B 6745 R 3289 lecturer Doe, J Nguyen, L Fabbri, M Pratt, W tutor Jones, R Smith, V tutor Ng, K Ong, W Archer, V Barrat, N The number of columns (attributes) is not consistent Solution : Create a table of TUTORS with Class Codes CSE 3180 Semester 1 2005 week 3 / 35
Normalisation - Examples 2 NF: 2 criteria (1) The table must be in 1 NF (2) every attribute which is NOT part of the Primary key must be functionally dependent on the Primary Key (transitive dependencies may occur) Building Room Seats No. of Levels A 214 85 4 A 242 A 25 4 B 213 135 6 The number of levels is dependent on Building, not Building + Room. Solution: accomm(building, room, seats) building(building, no_levels) CSE 3180 Semester 1 2005 week 3 / 36
Normalisation - Examples Building A A B Room Seats 214 85 242 A 25 213 135 Building No. of Levels A 4 B 7 CSE 3180 Semester 1 2005 week 3 / 37
Normalisation - Examples 3 NF : A table is in 3 rd Normal Form if (1) the table is in 2 NF and (2) every attribute which is NOT part of the Primary Key is functionally dependent ONLY on the key - i. e. not dependent on any other non-key attribute CSE 3180 Semester 1 2005 week 3 / 38
Normalisation - Examples This table is NOT in 3 NF: Class. Code Lecturer C 3567 Doe, J K 4567 Fabbri, M B 7645 Nguyen, L R 3289 Pratt, W Lecturer’s Office Phone 101 Bldg A 32412 312 Bldg B 31523 209 Bldg F 31518 406 Bldg E 32581 Lecturer’s Office Phone Doe, J 101 Bldg A 32412 Fabbri, M 312 Bldg B 31523 Nguyen, L 209 Bldg F 31518 Pratt, W 406 Bldg E 32581 CSE 3180 Semester 1 2005 week 3 / 39
A Guide When developing a database table DO NOT CONFUSE A REPORT (which people can interpret) - which a DBMS cannot interpret, and which must be in at least 3 rd Normal Form AND A TABLE CSE 3180 Semester 1 2005 week 3 / 40
Normalisation Some Thoughts: • Databases should be constructed so that data occurs only ONCE. (apart from Primary/Foreign key data) • Redundant or repetitive data may produce inconsistencies in the database during update e. g. a person’s address is in 1 row only, and other rows in other tables have keys referring to it. (known as referenced and referencing keys/tables) • A quick design test - ‘If a row is deleted, will valuable data be lost ? ’ If YES, remove the valuable information and place it in another table (known as set retention) CSE 3180 Semester 1 2005 week 3 / 41
Normalisation Diagrammatic Functional Dependencies supplier name p. k supplier number status city part name p. k part number Notice that these attributes are dependencies of the primary key colour mass city assets This attribute IS NOT CSE 3180 Semester 1 2005 week 3 / 42
Other Normalisation Forms Other forms of Normalisation: 4 th Normal Form 4 NF A row should NOT contain two or more independent multivalued facts about an entity. Facts are independent of each other when there is no direct connection between them. Generally occurs in a many to many relationship Example: A book publishing company can have many authors producing many books The books would (normally) be sold from many booksellers stores CSE 3180 Semester 1 2005 week 3 / 43
Other Normalisation Forms 5 th Normal Form A table is in 5 th Normal Form when its data can no longer be decomposed into smaller tables without each row having the same Primary Key Oracle insist that all data can and should be restricted to 3 NF CSE 3180 Semester 1 2005 week 3 / 44
Multi. Valued Dependencies MVD’s Functional Dependencies cannot express all cases of associations between attributes Consider PERSON and SKILL(s) A PERSON may have one or many SKILLS A SKILL may be owned by many PERSONS PERSON --- > SKILL ----> PERSON Neither of these expressions cover the above statements CSE 3180 Semester 1 2005 week 3 / 45
Multi. Valued Dependencies MVD’s An MVD defines that an ATTRIBUTE is always associated with a given set of values of another ATTRIBUTE PERSON --->> SKILL Interpretation: a single value of a set of attributes determines a set of values of other attributes AND this set of values is the same, independent of any other dependencies of the first named attributes PERSON ---->> skill_id, date_acquired CSE 3180 Semester 1 2005 week 3 / 46
Stages in Data Modelling - 1 • • • Analyse User Requirements - User Views Aggregate Functions - Highlight Synonyms Prepare first cut of Entities Determine Cardinality of Entities – Develop E-R Diagrams Determine Relationships 1: 1, 1: M, M: N Introduce Attributes - Data Structures (use Natural Mapping) Analyse Candidate Keys Determine success of relationships Review (if necessary) E-R Diagram CSE 3180 Semester 1 2005 week 3 / 51
Stages in Data Modelling - 2 • • • Select Primary, Foreign Keys for given applications Select Candidate keys for access Normalise Test Normalisation Review E-R Diagram and Data Structure Diagrams if necessary • Examine Data Structures/E-R Diagrams against original views, requirements e. g. report contents • Obtain confirmation and acceptance from User(s) • Move to Physical Design Considerations and, don’t forget the documentation ! CSE 3180 Semester 1 2005 week 3 / 52
Database Development Processes Project Identification Selection Project Initiation and Planning Analysis Logical Design Physical Design Implementation Maintenance CSE 3180 Semester 1 2005 week 3 / 53
Data Modelling - Some Guidelines • Do not prematurely combine entities into tables • Concentrate on Access Mechanisms which can be shared among requests • Deviate from the model in a responsible manner (i. e. get user acceptance, enthusiasm) • Use table/view and attribute names which closely reflect the data model names (Natural Mapping) CSE 3180 Semester 1 2005 week 3 / 54
Data Modelling - Some Guidelines • Be aware of the presence of composite attributes in a table. These should be represented as multiple attributes • Watch data types for each attribute, especially complex keys • Develop an ‘architecture’ to support ‘Business Rules’ for procedures, integrity and access • Look for developing technology trends - Objects, Extended Relational Datatypes, Internet Sourced DBMS (e. g. Oracle 9 i, 10) CSE 3180 Semester 1 2005 week 3 / 55
Database Design Guidelines • Determine the reason/purpose of the proposed database • Determine the entities supported by the database - what are the ‘main’ collections of data --> entities • Determine the attributes of each entity • Determine the Entity relationships • Determine the Entity-Relationship types 1: 1, 1: M, M: N CSE 3180 Semester 1 2005 week 3 / 56
Database Design Guidelines • Determine how the relationships can be related - Primary Key/Foreign Key. Are additional attributes needed ? • Re-assess your Logical design - Can the objectives of (1) be achieved ? If no, what addition/reduction of Entities or Attributes is necessary ? • Modify your design as required, and re-evaluate (it is an ITERATIVE Process) CSE 3180 Semester 1 2005 week 3 / 57
Some Common Design Problems • Not all attributes in a table relate to the same entity or subject. e. g. mixing passengers and plane details in the same table Hint: Re-examine the Primary Key and determinant aspects • Blank attributes in tables (no values) Generally means they belong in another table • Multiple Tables contain the same data-items Can indicate that aggregation or consolidation might be necessary CSE 3180 Semester 1 2005 week 3 / 58
Some Common Design Problems • Continuous Design alterations 01. Normal during the logical design stage Some suggestions: Make prototypes, use sample data, determine weaknesses, oversights Consult with the user. Get approval / acceptance Modify, re-design 02. Unworkable at physical design stage A clear indication of a rushed or poorly developed logical design. May be more appropriate to start again (and may also be more cost effective) CSE 3180 Semester 1 2005 week 3 / 59
A Quick Survey • You should now be familiar with some of the concepts and the terminology associated with Database • You should not expect that you will remember or even understand everything in these lectures • But you will gradually find that the concepts and environment become more ‘understandable’ as we proceed • That’s enough for today – and Easter (and the Semester break) is not far away !! CSE 3180 Semester 1 2005 week 3 / 60
fb8c7e450d0c930aeee9066e4928520e.ppt