Скачать презентацию Moving Towards A Data Repository That Facilitates Data Скачать презентацию Moving Towards A Data Repository That Facilitates Data

5c5d36323dc56691863512cfbc9df27b.ppt

  • Количество слайдов: 42

Moving Towards A Data Repository That Facilitates Data Analysis CHOP November 18, 2009 Moving Towards A Data Repository That Facilitates Data Analysis CHOP November 18, 2009

Relational Database Design Relational Database Design

Normalization • Normalization - process of efficiently organizing data in a database to reduce Normalization • Normalization - process of efficiently organizing data in a database to reduce redundancies of data • Goal - consistency of data – Store data once and one time only! – security – disk space – speed of queries – efficiency of database updates – data integrity § In normalized database no aggregation and no calculated fields 3

Data Anomolies 4 Data Anomolies 4

Unnormalized data set Patient ID Name Address DOB Doc Appt Date Location DX 111111 Unnormalized data set Patient ID Name Address DOB Doc Appt Date Location DX 111111 Cindy Marselis 2320 Edge Hill Road 1/11/64 Armstrong 9/1/09 11: 00 AM Alter 2011 Herniated Disc Flu 111111 Cindy Marselis 9331 Rising Sun Avenue 1/11/64 Morningstar 9/1/09 11: 00 AM Alter 2011 Herniated Disc 111111 Cindy Marselis 2320 Edge Hill Road 1/11/64 Allen 11/1/09 10: 00 AM Alter 2012 Psoriasis 222222 Kathryn Marselis 2320 Edge Hill Road 11/3/04 Dershaw 8/1/09 11: 00 AM Speakman 105 Well baby check 111111 Cindy Schwartz 9331 Rising Sun Avenue 1/11/64 Armstrong 8/11/09 3: 00 PM Alter 105 Psoriasis Herniated Disc 5

Normalized db - before 6 Normalized db - before 6

Normalized db - after 7 Normalized db - after 7

Example of Appointment Entity Relationship Diagram 8 Example of Appointment Entity Relationship Diagram 8

Structured, free text, unstructured text Structured, free text, unstructured text

Free text • Issues with string searches – Must match exactly in case, punctuation, Free text • Issues with string searches – Must match exactly in case, punctuation, spelling, etc. • Use of lookup tables where possible 10

Unstructured Text • Gartner: white-collar workers spend from 30 to 40% of time managing Unstructured Text • Gartner: white-collar workers spend from 30 to 40% of time managing documents • Merrill Lynch: > 85 % of business information exists as unstructured data – e-mails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations and Web pages. • In relational db, data that can't be stored in rows and columns. – stored in a BLOB (binary large object) – e-mail files, word-processing text documents, Power. Point presentations, JPEG and GIF image files, and MPEG video file • Metadata (data about data can be stored) http: //www. informationmanagement. com/issues/20030201/6287 -1. html 11

Approaches to structured and unstructured data 1. Unique database: consolidates all structured and unstructured Approaches to structured and unstructured data 1. Unique database: consolidates all structured and unstructured data together – expensive to buy and maintain – large volume of data can clog the database making it slow and inefficient 2. Use two databases: one structured data, and one for unstructured data. – Avoids performance issues with structured data – significant performance limitations for unstructured data 12

Approaches to structured and unstructured data 3. Unstructured data left on file servers with Approaches to structured and unstructured data 3. Unstructured data left on file servers with database to record and links to unstructured data files. – Avoids issue with volumes of data – Fragile as links are broken when files and folders moved around. – Must create links every time new document created 4. Complex and expensive connectors used to tap in all databases and file servers providing unified view of data. – Expensive and complex requiring purchase and maintenance of multiple databases and file servers with the added cost of all required connectors. 5. Patents currently under development. 13

Certification Commission for Health Information Technology (CCHIT) EHR Construct EMAR: Electronic Medication Administration Record Certification Commission for Health Information Technology (CCHIT) EHR Construct EMAR: Electronic Medication Administration Record CPOE: Computerized Physician Order Entry PFS: Physician Fee Schedule OC/RR: Physician Order Communication/Results Retrieval CPOE: Computerized Physician Order Entry PFS: Physician Fee Schedule R-ADT: Registration Admission Discharge Transfer 14

Data Warehousing Data Warehousing

Pressures Driving Need for Business Intelligence and Data Warehousing • External and internal forces Pressures Driving Need for Business Intelligence and Data Warehousing • External and internal forces require tactical and strategic decisions • Search for competitive advantage • Business environments are dynamic • Decision-making cycle time is reduced 16

Operational vs. Decision Support Data • Operational data • DSS – Relational, normalized – Operational vs. Decision Support Data • Operational data • DSS – Relational, normalized – Snapshot of operational database data – Optimized to support – Summarized transactions – Large amounts of data – Real time updates • Data analyst viewpoint – Timespan – Granularity – Dimensionality 17

Creating a Data Warehouse 18 Creating a Data Warehouse 18

Codd’s Key Data Warehouse Rules • • • Separated from operational environment Integrated Data Codd’s Key Data Warehouse Rules • • • Separated from operational environment Integrated Data Historical data over long time horizon Snapshot data captured at given time Subject-oriented data Mainly read-only data with periodic batch updates from operational source, no online updates 19

Codd’s Key Data Warehouse Rules contd. • Contains different levels of data detail – Codd’s Key Data Warehouse Rules contd. • Contains different levels of data detail – Current and old detail – Lightly and highly summarized • Metadata (data about the data) critical components – Identify and define data elements – Provide the source, transformation, integration, storage, usage, relationships, and history of data elements 20

Decision Support Systems DSS Decision Support Systems DSS

DSS Components 22 DSS Components 22

Decomposition of DSS – Operational Data o o o o Tumor registry A/D/T Radiology Decomposition of DSS – Operational Data o o o o Tumor registry A/D/T Radiology narrative Pathology narrative Lab results Patient Accounting Charge Master 23

Decomposition of DSS – External Data o o Research spider Treatment guidelines Reimbursement schedules Decomposition of DSS – External Data o o Research spider Treatment guidelines Reimbursement schedules NCI/NIH protocols 24

Decomposition of DSS – ETL • • Rationalize normal lab values Transform gender codes Decomposition of DSS – ETL • • Rationalize normal lab values Transform gender codes and free text Narrative dumps Doctor cleansing o Similar names o Which practice gets credit? 25

ETL – Extraction, Transformation, Load Extract data from source systems Transform: cleanse data for ETL – Extraction, Transformation, Load Extract data from source systems Transform: cleanse data for consistency and output exceptions o Apply business rules o Selecting certain columns to load (not null records) o Translating coded values (1, M, male =0) o Derive new calculated value (sale_amount = qty * unit_price) o Join data from multiple sources (lookup, merge) o Aggregate (rollup/summarize data – average LOS for each doctor by DRG) o Transpose/pivot (turning columns into rows) o Data validation. Load: data into repository 26

ETL Best Practice • • • ETL: 60 -80% of development effort Create multi-departmental ETL Best Practice • • • ETL: 60 -80% of development effort Create multi-departmental team charged with consensus on Transformation! Review exceptions carefully o Indicator of issues with operational db design o Indicator of changes needed in transformation 27

Decomposition of DSS – Business Data • • Business data – central repository Includes Decomposition of DSS – Business Data • • Business data – central repository Includes metadata: source, format, timing of feeds Characteristic Factors Integrated • • Centralized Holds data retrieved from entire organization Subject. Oriented • • Optimized to give answers to diverse questions Used by all functional areas Time Variant • • Flow of data through time Projected data Non-Volatile • • Data never removed Always growing 28

Decomposition of DSS – Business Model Data • Comprehensive Cancer Center definition of a Decomposition of DSS – Business Model Data • Comprehensive Cancer Center definition of a patient o Must have seen a physician for suspected or confirmed benign or malignant condition o What about patients seen for screening mammography? 29

Decomposition of DSS – End user query tool • • Web-based or client-server? OLAP Decomposition of DSS – End user query tool • • Web-based or client-server? OLAP – Online Analytic Processing o Microsoft o Business Objects (bought by SAP) o Micro. Strategy o Cognos (bought by IBM) o Oracle (includes Hyperion) 30

Decomposition of DSS – End –user tool o o Drill down functionality Roll up Decomposition of DSS – End –user tool o o Drill down functionality Roll up Charts – not data level Export features http: //demos. telerik. com/aspnetajax/chart/examples/functionality/ drilldown/defaultcs. aspx 31

Design – Star Schema Design – Star Schema

Star Schema • • Center fact table o usually contains numeric information for summary Star Schema • • Center fact table o usually contains numeric information for summary reports. Dimension table radiate from fact table Dimension table is hierarchial • ‘rollup’ allows to compare types of hospitals, disease categories, or even patient age bands. Creates logical data cube dimensions identifying a set of numeric measurements within the cube. 33

Star Schema • Data-modeling technique • Maps multidimensional decision support into relational database • Star Schema • Data-modeling technique • Maps multidimensional decision support into relational database • Yield model for multidimensional data analysis while preserving relational structure of operational DB • Four Components: – Facts – Dimensions – Attribute hierarchies 34

Simple Star Schema 35 Simple Star Schema 35

Star Schema 36 Star Schema 36

Entity Relationship Diagram 37 Entity Relationship Diagram 37

Analysis Analysis

Online Analytical Processing (OLAP) • Advanced data analysis environment • Supports decision making, business Online Analytical Processing (OLAP) • Advanced data analysis environment • Supports decision making, business modeling, and operations research activities • Characteristics of OLAP – Use multidimensional data analysis techniques – Provide advanced database support – Provide easy-to-use end-user interfaces – Support client/server architecture 39

Healthcare Cube – slice and dice view Dia gno sis Time Provider Ph y Healthcare Cube – slice and dice view Dia gno sis Time Provider Ph y i sic an Clinic Year Time Strategic Period Specialty Quarter Group Month Physician Week Day Shift Hour 40

Dashboard 41 Dashboard 41

Scorecard including Key Performace Indicators (KPI) • • • Risk-adjusted mortality index Risk-adjusted complications Scorecard including Key Performace Indicators (KPI) • • • Risk-adjusted mortality index Risk-adjusted complications index Risk-adjusted patient safety index Severity-adjusted average length of stay Expense per adjusted discharge, case mix- and wage-adjusted 42