d1ffa6f9a183341305b878c7accc95fa.ppt
- Количество слайдов: 35
Databases and Database Design William A. Yasnoff, MD, Ph. D Oregon Health Division 1
Databases & Database Design u Introduction u Relational Databases u Database Design: Data Models 2
Introduction u Database = large collection of information u Stored on hard disk u Retrieval – 100, 000 items X 10 ms = 17 minutes – Need to organize data to improve retrieval speed – How do you organize your paper files? 3
Introduction (continued) u Data Organization – sorting data: phonebook in alpha v. random order – “tabs” at key points: dictionary with tabs for each letter – direct pointers to information = index » table of key element value and address of other data » e. g. name & mailbox number --> letters in mailbox 4
Relational Database Model 5
Relational Database Model u Developed by EF Codd, CJ Date (70 s) u Table = Entity = Relation u Table row = tuple = instance u Table column = attribute u Table linkage by values u Entity-Relationship Model 6
Attributes u Key – uniquely identify row/tuple – may be one or more attributes u Non-key – other properties of instance – dependent on key 7
Retrieval in RDMS u SQL – Select. . . – From. . . » optional: can be computed – Where. . . u Query by example – Fill in the blanks – WONDER 8
Advantages of RDMS u Very strong theoretical basis – storage – retrieval u Easy to implement u Conflicts and anomalies can be avoided u Intuitive appeal u Easy retrieval 9
Relational Model: Summary u popular method of organizing data u strong theoretical properties facilitate retrieval u relation = table u attribute = column u tuple = row u key = {attributes} that uniquely identify each row 10
Relational Model: Summary 2 u Allows complex data relationships with multiple tables: – {patient id, patient demographics} – {patient id, patient visit date, blood pressure} u Structured Query Language (SQL) retrieval » Select patient _name where blood_lead_level > 10 11
Database Design: Data Models 12
Data Models u Definition: A representation of the data and data relationships of an activity – data: case report of serious E. Coli illness – data: E. Coli serotype O 157 – relationship: case report “contains” serotype (semantic relationship) u Database Design = Development of a quality data model 13
Data Model Quality u Correctness – Conceptual (concepts represented properly) » real world representation – Syntactic (relationships represented properly) » real world language 14 u Completeness (wholeness) – Conceptual (all concepts represented) – Syntactic (all relationships represented)
Entity u “Any distinguishable object that is to be represented in a database” [C. J. Date] u Properties – within scope of model – single concept – a set of distinguishable “instances” – satisfies normalization rules u Also 15 called a relation
Entity Guidelines u Relationship with at least one other entity u Unique, descriptive name u Class or set of things (not just one) u Single meaning (no homonyms) u No synonyms (two entities describing same class) 16
Attribute u “a type or characteristic of an entity” (e. g. “gender” is an attribute of the entity “patient”) u Domain = the set of values from which an attribute may be selected (e. g. the domain of the attribute “gender” is [male, female]) u An entity typically has many attributes 17
Attribute Characteristics u Key – value uniquely identifies entity » e. g. “Lab test ID” is key attribute of “specimen” u Non-Key – value does not uniquely identify entity » e. g. “author” does not uniquely identify “publication” 18
Attribute Characteristics u Atomic = individual data value (one and only one fact) u Description = complete and clear definition 19 – e. g. “professional privilege date” = The date on which a health care professional is granted privilege to practice in a particular health care facility, establishing the provider’s eligibility for patient care assignments and liability coverage
Attribute Guidelines u Unique name – no plurals, possessives, articles, conjunctions, verbs, or prepositions u Clear, complete, unambiguous description u Atomic (no positional information) u Domain with 2 or more values u Originates in only one entity 20
Primary Key u Attribute of an entity whose values uniquely determine its occurrences – {birth certificate number, person name, birth date, mother name, physician name} – {facility, patient name, physician name, date, temperature, pulse, blood pressure} 21
Primary Key Characteristics u Stable: does not change over time u Minimal: fewest attributes necessary u Factless: no hidden information u Definitive: value always exists u Accessible: available when data created u Unique: absolutely no duplicates 22
Relationship u Semantic: “contains”, “is part of”, “belongs to” u One-to-one – serotype of an organism – immunization status of a child u One-to-many – antibiotic resistances of an organism – vaccines administered to a child 23
Relationship Guidelines u No circular references (e. g. “health plans -> markets -> products -> health plans” should be “health plans -> health plan markets -> market products <- health plans”) u Single relationship between two entities u No recursive relationships 24
Single Relationship of Entities u Problem: employee <--> job assignment <--> job [double circular] u Solution: person --> job assignment <-job [“person” includes “status” as employee or contractor] 25
Recursive Relationships u Problem: supervisor <--> employee [may be circular] u Solution: person {person ID } --> employee-supervisor relationship {employee ID, supervisor ID} – multiple supervisory roles for each person 26
Normalization u Formalization of common sense rules of information organization u An attribute is functionally dependent on X only if each of its values is determined by the value of X (X may be composite) u Example: DOB is functionally dependent on Driver’s License number 27
Key Normalization Concept u Functional dependence of each entity must be – based on entire primary key – NOT based on any other attributes 28
Benefits of Normalization u Aids in database design, integration u Ensures precise capture of business logic u Minimizes redundancy u Minimizes need for null values u Prevents – information loss – unintentional results 29
Summary of Normalization u. One Fact in One Place 30
Common DB Design Errors u Multiple instances in same row of table, e. g. first_value, second_value, third_value [Problem: what to do with 4 th value? ] u Same data item repeated in multiple places, e. g. address appears in two different tables [Problem: how to keep two values synchronized? ] 31
Data Model Quality u Correctness – Conceptual (concepts represented properly) » real world representation – Syntactic (relationships represented properly) » real world language 32 u Completeness (wholeness) – Conceptual (all concepts represented) – Syntactic (all relationships represented)
Database Design Pearls u Accommodate all data needed u Correct relationships between data items u No duplicate representation u Anticipated retrievals use indexes u Meet confidentiality requirements 33
References - 1 u Reingruber MC & Gregory WW: The Data Modeling Handbook (New York: John Wiley & Sons, 1994) u Montgomery SL: Object-Oriented Information Engineering (Boston: AP Professional/Harcourt Brace, 1994) 34
References - 2 u Codd EF: The Relational Model for Database Management (Reading, MA: Addison-Wesley, 1990) u Date CJ: An Introduction to Database Systems, 5 th ed. (Reading, MA: Addison. Wesley, 1990) u Duncan KA: Health Information and Health Reform (San Francisco: Jossey. Bass, 1994) 35
d1ffa6f9a183341305b878c7accc95fa.ppt