75c703bf83f6c74594083ac6453e6a35.ppt
- Количество слайдов: 131
Software and Enterprise Architectures CSE 5810 Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-255 Storrs, CT 06269 -2155 steve@engr. uconn. edu http: //www. engr. uconn. edu/~stev e (860) 486 - 4818 Copyright © 2008 by S. Demurjian, Storrs, CT. SWEA 1
Software Architectures m CSE m 5810 m m m Emerging Discipline in Mid-1990 s Software as Collection of Interacting Components What are Local Interactions (within Component)? What are Global Interactions (between Components)? Advantages of SW Architectural Design q Understand Communication/Synchronization q Definition of Database Requirements q Identification of Performance/Scaling Issues q Detailing of Security Needs and Constraints Towards Large-Scale Software Development For Biomedical Informatics: q What are Architectures for Data Sharing? q How is Interoperability Facilitated? SWEA 2
Concepts of Software Architectures m CSE 5810 m m m Exceed Traditional Algorithm/Data Structure Perspective Emphasize Componentwise Organization and System Functionality Focus on Global and Local Interactions Identify Communication/Synchronization Requirements Define Database Needs and Dependencies Consider Performance/Scaling Issues Understand Potential Evolution Dimensions SWEA 3
Software Design Levels m CSE 5810 m m m Architecturally: q Modules q Interconnections Among Modules q Decomposition into Subsystems Code: q Algorithms/Data Structures q Tasking/Control Threads Executable: q Memory Management q Runtime Environment Is this a Realistic/Accurate View? q Yes for a Single “Application” q What about Application of Applications? q System of Systems? SWEA 4
Software Engineering - an Oxymoron? m CSE m 5810 m m m Is there any Engineering? Is there any Science? Collection of Disparate Techniques: q Data-Flow Diagrams q E-R Diagrams q Finite State Machines q Petri Nets q UML Class, Object, Sequence, Etc. q Design Patterns q Model Drive Architectures What is being “Engineered”? How do we Know we are Done? q E. g. Does Artifact Match Specification? SWEA 5
What's Available for Engineering Software? m CSE m 5810 m m m Specification (Abstract Models, Algebraic Semantics) Software Structure (Bundling Representation with Algorithms) Languages Issues (Models, Scope, User-Defined Types) Information Hiding (Protect Integrity of Information) Integrity Constraints (Invariants of Data Structures) Is this up to date? What else can be Added to List? q Design Patters q Model Driven Architectures q XML –Data Modeling and Dependencies q Others? SWEA 6
Engineering Success in Computing m CSE 5810 m m Compilers Have Had Great Success q Originally by Hand q Then Compilers q Parser Generators - Lex/Yacc Solid Science Behind Compilers q Regular, Context Free, Context Sensitive Languages q FSAs, PDAs, CFGs, etc. Science has Provided Engineering Success re. Ease and Accuracy of Modern Compiler Writing SWEA 7
History of Programming m CSE 5810 m m C - Still Remains Industry Stronghorse q Separate Compilation q Decomposition of System into Subsystems, etc. q Shared Declarations q ADTs in C, But Compiler won't Enforce Them Modula-II and Ada 83 Had q Information Hiding q Public/Private Paradigm q Module/Package Concepts q Import/Export Paradigm Rigor Enforced by Compiler – but Can’t q Bind/Group Modules into Subsystems q Precisely Specify Interconnections and Interactions Among Subsystems and Components SWEA 8
‘Recent-Past’ Generation? m CSE 5810 m m C++ and Ada 95 q Considered “Legacy” Languages - Old Java, C# - Are they Headed Toward Legacy? q How do they Rate? q What Do they Offer that Hasn't been Offered Before? q What are Unique Benefits and Potential of Java? What about new Web Technologies? q Javascript, Perl, Ph. P, Phython, Ruby q XML and SOAP q Mobile Computing q How do all of these fit into this process? q Particularly in Regards to C/S Solutions! SWEA 9
What's Next Step? m CSE 5810 m m m Architectural Description Languages q Provide Tools to Describe Architectures q Definition and Communication Codification of Architectural Expertise Frameworks for Specific Domains DB vs. GUI vs. Embedded vs. C/S Formal Underpinning for Engineering Rigor What has Appeared for Each of these? q Struts for GUI q Open Source Frameworks (mediawiki) q Wide-Ranging Standards (XML) q Model-Driven Architectures q What Else? ? ? SWEA 10
Architectural Styles m CSE 5810 m m m What are Popular Architectural Styles? q How are they Characterized? q Example in Practice Explore a Taxonomy of Styles Focus on “Micro-Architectures” q Components q Flow Among Components q Represents “Single” Application Forms Basis for “Macro-Architectures” q System of Systems q Application of Applications q Significantly Scaling Up SWEA 11
Taxonomy of Architectural Styles CSE 5810 m m Data Flow Systems q Batch Sequential q Pipes and Filters Call & Return Systems q Main/Subroutines m (C, Pascal) q Object Oriented q Implicit Invocation q Hierarchical Systems m Virtual Machines q Interpreters q Rule Based Systems Data Centered Systems q DBS q Hypertext q Blackboards Independent Components q Communicating Processes/Event Systems Client/Server q Two-Tier q Multi-Tier SWEA 12
Taxonomy of Architectural Styles m CSE 5810 Establish Framework of … q Components Ø Building Blocks for Constructing Systems Ø A Major Unit of Functionality Ø Examples Include: Client, Server, Filter, Layer, DB q Connectors Ø Defining the Ways that Components Interact Ø What are the Protocols that Mandate the Allowable Interactions Among Components? Ø How are Protocols Enforced at Run/Design Time? Ø Examples Include: Procedure Call, Event Broadcast, DB Protocol, Pipe SWEA 13
Overall Framework m CSE 5810 m m m What Is the Design Vocabulary? q Connectors and Components What Are Allowable Structural Patterns? q Constraints on Combining Components & Connectors What Is the Underlying Conceptual Model? q Von Newman, Parallel, Agent, Message-Passing… q Are their New Emerging Models? q Collaborative Environments/Shareware? What Are Essential Invariants of a Style? q Limits on Allowable Components & Connectors Common Examples of Usage Advantages and Disadvantages of a Style Common Specializations of a Style SWEA 14
Pipes and Filters CSE 5810 Components are Independent Entities. No Shared State! Components with Input and Output Sort Merge Connectors for Flow Streams of I/O m Filters: q Invariant: Unaware of up and Down Stream Behavior q Streamed Behavior: Output Could Go From One Filter to the Next One Allowing Multiple Filters to Run in Parallel. SWEA 15
Pipes and Filters CSE 5810 m m m Possible Specializations: q Pipelines - Linear Sequence q Bounded - Limits on Data Amounts q Typed Pipes - Known Data Format What is a Classic Example? Other Examples: q Compilers q Sequential Processes q Parallel Processes SWEA 16
Pipes and Filters - Another Example m CSE 5810 m m Text Information Retrieval Systems q Scanning Newspapers for Key Words, Etc. q Also, Boolean Search Expressions Where is Such an Architecture Utilized Today? What is Potential Usage in BMI? User Commands Search Disk Controller Programming Result Query Resolver Control Term Search Comparator Data DB SWEA 17
Pipes and Filters – In BMI m CSE m 5810 Can be Structured to Model Medical Workflows Series of Actions taken by Stakeholders on Patient SWEA 18
Patterns for Ontologies m CSE m 5810 Extension of Rishi’s work … Linear Ontology Architectural Pattern (LOAP) q Model Knowledge in a Process q Continue with Examples from Prior PPT http: //www. engr. uconn. edu/~steve/Cse 5810/Attaining-Semantic. Enterprise-Interoperability-through-Ontology-Architectural. Patterns. pdf SWEA 19
Patterns for Ontologies m CSE 5810 Linear Ontology Architectural Pattern (LOAP) q Diagnosis, Test, and Anatomy Ontologies SWEA 20
Extending the Example CSE 5810 SWEA 21
What has OO Evolved Into? m CSE 5810 m What has Classic OO Solution Evolved into Today? q Client (Browser + Struts) q Server (Many Variants of OO Languages) q Database Server (typically Relational) Different Style (e. g. , Design Pattern) q Does Pattern Capture All Aspects of Style? q Do we Need to Couple Technology with Pattern? Dr. D, Jan 01, 08 Fever, Flu, Bed Rest No Scripts No Tests Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech) SWEA 22
Design Patterns as Software Architectures m CSE 5810 m m m Emerged as the Recognition that in Object-Oriented Systems Repetitions in Design Occurred Gained Prominence in 1995 with Publication of “Design Patterns: Elements of Reusable Object. Oriented Software”, Addison-Wesley q “… descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context…” q Akin to Complicated Generic Usage of Patterns Requires q Consistent Format and Abstraction q Common Vocabulary and Descriptions Simple to Complex Patterns – Wide Range SWEA 23
The Observer Pattern m CSE 5810 m m Utilized to Define a One-to-Many Relationship Between Objects When Object Changes State – all Dependents are Notified and Automatically Updated Loosely Coupled Objects q When one Object (Subject – an Active Object) Changes State than Multiple Objects (Observers – Passive Objects) Notified q Observer Object Implements Interface to Specify the Way that Changes are to Occur q Two Interfaces and Two Concrete Classes SWEA 24
The Observer Pattern CSE 5810 SWEA 25
Model View Controller m http: //java. sun. com/blueprints/patterns/MVC-detailed. html CSE 5810 SWEA 26
Model View Controller m CSE 5810 Three Parts of the Pattern: q Model Ø Enterprise Data and Business Rules for Accessing and Updating Data q View Ø Renders the Contents (or Portion) of Model Ø Deals with Presentation of Stored Data Ø Pull or Push Model Possible q Controller Ø Translates Interactions with View into Actions on Model Ø Actions could be Button Clicks (GUI), Get/Post http (Web), etc. SWEA 27
Model View Controller m http: //java. sun. com/blueprints/patterns/MVC-detailed. html CSE 5810 SWEA 28
The Façade Design Pattern m CSE 5810 m Unified higher-level global interface/system developed from q a set of complex heterogeneous source interfaces/subsystems q makes local sources easier to utilize for the clients Composition of Pattern q Subsytems q System Composed of Subsytems q Clients SWEA 29
Facade CSE 5810 SWEA 30
Other Ontology Architectural Patterns m CSE 5810 m Leverage Façade Pattern for q Local As View (LAV) Methodology q MApping FRAmework (MAFRA) provides a conceptual framework for building semantic mappings between heterogeneous ontology models using semantics bridges q High Level Centralized Ontology Architectural Patterns (COAP) Extend Façade Concept q Subsystems are Local Schemas q System is Global Schema SWEA 31
LAV Ontology and Example CSE 5810 SWEA 32
MAFRA Ontology and Example CSE 5810 SWEA 33
COAP Ontology and Example m CSE 5810 m COAP Allows us to Define and Integrate Ontologies at a Much Higher Level Integrating Multiple Ontologies SWEA 34
COAP Ontology and Example m Example Unifies ICS, DSM, SNOMED etc. CSE 5810 SWEA 35
COAP Ontology and Example m Example Unifies ICS, DSM, SNOMED etc. CSE 5810 SWEA 36
Layered Systems CSE 5810 Useful Systems Base Utility Core level Users m m m Components - Virtual Machine at Each Layer Connectors - Protocols That Specify How Layers Interaction Is Restricted to Adjacent Layers SWEA 37
Layered Systems m CSE 5810 m Advantages: q Increasing Levels of Abstraction q Support Enhancement - New Layers q Support for Reuse Drawbacks: q Not Feasible for All Systems q Performance Issues With Multiple Layers q Defining Abstractions Is Difficult. SWEA 38
Layered Systems in BMI m CSE 5810 m One Approach to Constructing Access to Patient Data for Clinical Research and Clinical Practice Construct Layered Data Repositories as Below q Each Layer Targets Different User Group q Need to Fine Tune Access Even within Layers Aggregated De-identified Patient Data Provider Cl. Researchers Public Health Researchers SWEA 39
ISO as Layered Architecture m CSE 5810 ISO Open Systems Interconnect (OSI) Model q Now Widely Used as a Reference Architecture q 7 -layer Model q Provides Framework for Specific Protocols (Such as IP, TCP, FTP, RPC, UDP, RSVP, …) Application Presentation Session Transport Network Data Link Physical SWEA 40
ISO OSI Model Application Presentation Session Transport Network Data Link Physical CSE 5810 m m m Application Presentation Session Transport Network Data Link Physical (Hardware)/Data Link Layer Networks: Ethernet, Token Ring, ATM Network Layer Net: The Internet Transport Layer Net: Tcp-based Network Presentation/Session Layer Net: Http/html, RPC, PVM, MPI Applications, E. g. , WWW, Window System, Algorithm SWEA 41
Layered Ontology Architectural Pattern (La. OAP) m Consider a set of Domain Models CSE 5810 SWEA 42
La. OAP and Example CSE 5810 SWEA 43
Implementation from Model to Code CSE 5810 SWEA 44
Implementation from Model to Code CSE 5810 SWEA 45
Implementation from Model to Code CSE 5810 SWEA 46
Other Ontology Patterns CSE 5810 Gangemi, A. , & Presutti, V. (2009). Ontology Design Patterns. In Handbook on Ontologies: International Handbooks on Information Systems (pp. 221 -243). IOS Press. SWEA 47
Other Ontology Patterns CSE 5810 Gangemi, A. (2006). Ontology Patterns for Semantic Web Content. Proceeding of 4 th International Semantic Web Conference, (pp. 262 -276). SWEA 48
Repositories CSE 5810 ks 8 ks 1 Blackboard (shared data) ks 2 ks 3 ks 6 ks 4 m m ks 7 ks 5 Knowledge Sources Interact With the Blackboard Contains the Problem Solving State Data. Control Is Driven by the State of the Blackboard. DB Systems Are a Form of Repository With a Layer Between the BB and the KSs - Supports q Concurrent Access, Security, Integrity, Recovery SWEA 49
Database System as a Repository CSE 5810 c 8 c 1 Database (shared data) c 2 c 3 c 6 c 4 m m m c 7 c 5 Clients Interact With the DBMS Database Contains the Problem Solving State Data Control is Driven by the State of the Database q Concurrent Access, Security, Integrity, Recovery q Single Layer System: Clients have Direct Access q Control of Access to Information must be Carefully Defined within DB Security/Integrity SWEA 50
Team Project as a Repository CSE 5810 c 8 c 1 Web Portal Shared c 2 c 3 c 6 c 4 m m m c 7 c 5 Clients are Providers, Patients, Clinical Researchers Database Underlies Web Portal Simply a Portion of Architecture q Interactions with PHR (Patients) q Interactions with EMR (Providers) q Interactions with Database/Warehouse (Researchers) SWEA 51
Virtual Chart as a Repository CSE 5810 c 8 c 1 Virtual Chart c 2 c 3 c 6 c 4 m c 7 c 5 Clients are Providers, Patients, Clinical Researchers SWEA 52
Interpreters CSE 5810 Inputs Outputs m m Program being interpreted Data (program state) Simulated interpretation engine Selected instruction Selected data Internal interpreter state What Are Components and Connectors? Where Have Interpreters Been Used in CS&E? q LISP, ML, Java, Other Languages, OS Command Line SWEA 53
Java as Interpreter CSE 5810 SWEA 54
Process Control Paradigms Input variables CSE 5810 Set point Ds to manipulated variables Controller Input variables Set point m Controller Ds to manipulated variables With Feedback Process Controlled variable Without Feedback Process Controlled variable Also: q Open vs. Close Loop Systems q Well Defined Control and Computational Characters q Heavily Used in Engineering Fields. SWEA 55
Process Architecture: Statechart Diagram? CSE 5810 SWEA 56
Process Architecture: Activity Diagram? m CSE 5810 Clear Applicability to Medical Processes that have Underlying BMI – Low Level Processes Waiting for Heart Signal timeout irregular beat Heart Signal Waiting for Resp. Signal Breath Trigger Local Alarm Trigger Remote Alarm Resp Signal Alarm Reset SWEA 57
Single and Multi-Tier Architectures m CSE 5810 m Widespread use in Practice for All Types of Distributed Systems and Applications Two Kinds of Components q Servers: Provide Services - May be Unaware of Clients Ø Web Servers (unaware? ) Ø Database Servers and Functional Servers (aware? ) q Clients: Request Services from Servers Ø Must Identify Servers Ø May Need to Identify Self Ø A Server Can be Client of Another Server m Expanding from Micro-Architectures (Single Computer/One Application) to Macro-Architecture SWEA 58
Single and Multi-Tier Architectures m CSE 5810 m m Normally, Clients and Servers are Independent Processes Running in Parallel Connectors Provide Means for Service Requests and Answers to be Passes Among Clients/Servers Connectors May be RPC, RMI, etc. Advantages q Parallelism, Independence q Separation of Concerns, Abstraction q Others? Disadvantages q Complex Implementation Mechanisms q Scalability, Correctness, Real-Time Limits q Others? SWEA 59
Example: Software Architectural Structure CSE 5810 Initial Data Entry Operator (Scanning & Posting) Advanced Data Entry Operators Analyst Manager 10 -100 MB Network Document Server Stored Images/CD Database Server Running Oracle RMI Registry RMI Act. Obj/Server Functional Server SWEA 60
Business Process Model CSE 5810 DB DB Historical Completed Records Applications Licensing DB Supervisor Review Scanner DB Licensing Division Scanning Operator Stored Images Licensing Division Printer Data Entry Operator DB Basic Information Entered New Licenses New Appointments FOI Letters (Request Information, etc. ) SWEA 61
Two-Tier Architecture m CSE 5810 m m Small Manufacturer Previously on C++ New Order Entry, Inventory, and Invoicing Applications in Java Programming Language Existing Customer and Order Database Most of Business Logic in Stored Procedures Tool-generated GUI Forms for Java Objects SWEA 62
Three-Tier Architecture m CSE 5810 m m m Passenger Check-in for Regional Airline Local Database for Seating on Today's Flights Clients Invoke EJBs at Local Site Through RMI EJBs Update Database and Queue Updates JMS Queues Updates to Legacy System DBC API Used to Access Local Database SWEA 63
Four-Tier Architecture m CSE m 5810 m m m Web Access to Brokerage Accounts Only HTML Browser Required on Front End "Brokerbean" EJB Provides Business Logic Login, Query, Trade Servlets Call Brokerbean Use JNDI to Find EJBs, RMI to Invoke Them SWEA 64
Architecture Comparisons m CSE 5810 m m Two-tier Through JDBC API is Simplest Multi-tier: Separate Business Logic, Protect Database Integrity, More Scaleable JMS Queues vs. Synchronous (RMI or IDL): q Availability, Response Time, Decoupling JMS Publish & Subscribe: Off-line Notification RMI IIOP vs. JRMP vs. Java IDL: q Standard Cross-language Calls or Full Java Functionality JTS: Distributed Integrity, Lockstep Actions SWEA 65
Comments on Architectural Styles m CSE 5810 m m m Architectural Styles Provide Patterns q Suppose Designing a New System q During Requirements Discovery, Behavior and Structure of System Will Emerge q Attempt to Match to Architectural Style q Modify, Extend Style as Needed By Choosing Existing Architectural Style q Know Advantages and Disadvantages q Ability to Focus in on Problem Areas and Bottlenecks q Can Adjust Architecture Accordingly Architectures Range from Large Scale to Small Scale in their Applicability We’ll see Examples for BMI Shortly … SWEA 66
The Next Big Challenge m CSE 5810 m Macro-Architectures q System of Systems q Application of Applications q Particularly for HIT and HIE! Involves Two Key Issues q Interoperability Ø Heterogeneous Distributed Databases Ø Heterogeneous Distributed Systems Ø Autonomous Applications q Scalability Ø Rapid and Continuous Growth Ø Amount of Data Ø Variety of Data Types Ø Different Privacy Levels or Ownerships of Data SWEA 67
Interoperability: A Classic View CSE 5810 Simple Federation FDB Global Schema 4 Federated Integration Local Schema Multiple Nested Federation Federated Integration Local Schema FDB 1 Local Schema Federation FDB 3 Federation SWEA 68
Database Interoperability in the Internet m CSE 5810 m Technology q Web/HTTP, JDBC/ODBC, CORBA (ORBs + IIOP), XML Architecture Information Broker • Mediator-Based Systems • Agent-Based Systems SWEA 69
Connecting a DB to the Web CSE 5810 m DBMS m CGI Script Invocation or JDBC Invocation Web Server Internet m Web Server are Stateless DB Interactions Tend to be Stateful Invoking a CGI Script on Each DB Interaction is Very Expensive, Mainly Due to the Cost of DB Open Browser SWEA 70
Connecting More Efficiently m CSE 5810 DBMS Helper Processes CGI Script or JDBC Invocation m Web Server Internet m To Avoid Cost of Opening Database, One can Use Helper Processes that Always Keep Database Open and Outlive Web Connection Newly Invoked CGI Scripts Connect to a Preexisting Helper Process System is Still Stateless Browser SWEA 71
DB-Internet Architecture CSE 5810 WWW Client (Netscape) WWW client (Info. Explore) WWW Client (Hot. Java) Internet HTTP Server DBWeb Gateway DBWeb Dispatcher DBWeb Gateway SWEA 72
Biomedical Architectures m CSE 5810 m m Transcend Normal Two, Three, and Four Tier Solutions – Macro-Architecture Emerging Standards q FHIR, SMART, open m. Health An Architecture of Architectures! q Need to Integrate Systems that are Themselves Multi-Tier and Distributed q Need to Resolve Data Ownership Issues Ø State of Connecticut Agencies Don’t Share Ø Competing Hospitals Seek to Protect Market Share q T 1, T 2, and Clinical Research Requires Ø Interoperating Genomic Databases/Supercomputers Ø Integration of De-identified Patient Data from Multiple Sources to Allow Sufficient Study Samples Ø De-identified Data Repositories or Data Marts q Dealing with Ownership Issues (DNA Research) SWEA 73
Internet and the Web m CSE 5810 A Major Opportunity for Business q A Global Marketplace Ø Business Across State and Country Boundaries q A Way of Extending Services Ø Online Payment vs. VISA, Mastercard q A Medium for Creation of New Services Ø Publishers, Travel Agents, Teller, Virtual Yellow Pages, Online Auctions … m m A Boon for Academia q Research Interactions and Collaborations q Free Software for Classroom/Research Usage q Opportunities for Exploration of Technologies in Student Projects What are Implications for BMI, HIE? SWEA 74
WWW: Three Market Segments Server CSE 5810 Business to Business Corporate Network q q q Server Intranet q q Decision support Mfg. . System monitoring corporate repositories Workgroups Internet Corporate Server Network Internet q q Provider Network Information sharing Ordering info. /status Targeted electronic commerce Sales Marketing Information Services Server Provider Network Exposure to Outside SWEA 75
Information Delivery Problems on the Net m CSE 5810 m m m Everyone can Publish Information on the Web Independently at Any Time q Consequently, there is an Information Explosion q Identifying Information Content More Difficult There are too Many Search Engines but too Few Capable of Returning High Quality Data Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changes What are Information Delivery Issues for BMI? q Publishing of Patient Education Materials q Publishing of Provider Education Materials q How Can Patients/Providers find what Need? q How do they Know if its Relevant? Reputable? SWEA 76
Example Web Applications CSE 5810 m m m Scenario 1: World Wide Wait q A Major Event is Underway and the Latest, Up-tothe Minute Results are Being Posted on the Web q You Want to Monitor the Results for this Important Event, so you Fire up your Trusty Web Browser, Pointing at the Result Posting Site, and Wait, and Wait … What is the Problem? q The Scalability Problems are the Result of a Mismatch Between the Data Access Characteristics of the Application and the Technology Used to Implement the Application May not be Relevant to BMI: Hard to Apply Scenario SWEA 77
Example Web Applications CSE 5810 m m m Scenario 2: q Many Applications Today have the Need for Tracking Changes in Local and Remote Data Sources and Notifying Changes If Some Condition Over the Data Source(s) is Met q To Monitor Changes on Web, You Need to Fire Your Trusty Web Browser from Time to Time, Cache the Most Recent Result, and Difference Manually Each Time You Poll the Data Source(s) Issue: Pure Pull is Not the Answer to All Problems BMI: If a Patient Enters Data that Sets off a Chain Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event) SWEA 78
What is the Problem? m CSE 5810 m Applications are Asymmetric but the Web is Not q Computation Centric vs. Information Flow Centric Type of Asymmetry q Network Asymmetry Ø Satellite, CATV, Mobile Clients, Etc. q Client to Server Ratio Ø Too Many Clients can Swamp Servers q Data Volume Ø Mouse and Key Click vs. Content Delivery q Update and Information Creation Ø Clients Need to be Informed or Must Poll m m Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notification FHIR and moving to Mobile Dominated World SWEA 79
What are Information Delivery Styles? m CSE 5810 m m Pull-Based System q Transfer of Data from Server to Client is Initiated by a Client Pull q Clients Determine when to Get Information q Potential for Information to be Old Unless Client Periodically Pulls Push-Based System q Transfer of Data from Server to Client is Initiated by a Server Push q Clients may get Overloaded if Push is Too Frequent Hybrid q Pull and Push Combined q Pull First and then Push Continually SWEA 80
Publish/Subscribe CSE 5810 m m m Semantics: Servers Publish/Clients Subscribe q Servers Publish Information Online q Clients Subscribe to the Information of Interest (Subscription-based Information Delivery) q Data Flow is Initiated by the Data Sources (Servers) and is Aperiodic q Danger: Subscriptions can Lead to Other Unwanted Subscriptions Applications q Unicast: Database Triggers and Active Databases q 1 -to-n: Online News Groups May work for Clinical Researcher to Provider Push SWEA 81
Design Options for Nodes m CSE 5810 Three Types of Nodes: q Data Sources Ø Provide Base Data which is to be Disseminated q Clients Ø Who are the Net Consumers of the Information q Information Brokers Ø Acquire Information from Other Data Sources, Add Value to that Information and then Distribute this Information to Other Consumers Ø By Creating a Hierarchy of Brokers, Information Delivery can be Tailored to the Need of Many Users m Brokers may be Ideal Intermediaries for BMI! q Act on Behalf of Patients, Providers q Incorporate Secure Access SWEA 82
Research Challenges m CSE 5810 Ubiquitous/Pervasive Many computers and information appliances everywhere, networked together m Inherent Complexity: q Coping with Latency (Sometimes Unpredictable) q Failure Detection and Recovery (Partial Failure) q Concurrency, Load Balancing, Availability, Scale q Service Partitioning q Ordering of Distributed Events “Accidental” Complexity: q Heterogeneity: Beyond the Local Case: Platform, Protocol, Plus All Local Heterogeneity in Spades. q Autonomy: Change and Evolve Autonomously q Tool Deficiencies: Language Support (Sockets, rpc), Debugging, Etc. SWEA 83
Infosphere Problem: too many sources, too much information Internet: Information Jungle n tio a pt a e rc Ad op er ty Mg ack Clean, Reliable, Timely Information, Anywhere mt Personalized Filtering & Info. Delivery Microfeedb Digital Earth Pr specialization ou s Re Infopipes Sensors Co ntin l. Q ua rie ue s Info rm atio n Q uali ty CSE 5810 SWEA 84
Current State-of-Art CSE 5810 Web Server Mainframe Database Server Thin Client SWEA 85
Infosphere Scenario – for BMI CSE 5810 Infotaps & Fat Clients Sensors Variety of Servers Many sources Database Server SWEA 86
Heterogeneity and Autonomy m CSE 5810 Heterogeneity: q How Much can we Really Integrate? q Syntactic Integration Ø Different Formats and Models Ø Web/SQL Query Languages q Semantic Interoperability Ø Basic Research on Ontology, Etc m Autonomy q No Central DBA on the Net q Independent Evolution of Schema and Content q Interoperation is Voluntary q Interface Technology (Support for Isvs) Ø DCOM: Microsoft Standard Ø CORBA, Etc. . . SWEA 87
Security and Data Quality m CSE 5810 Security q System Security in the Broad Sense q Attacks: Penetrations, Denial of Service q System (and Information) Survivability Ø Security Fault Tolerance Ø Replication for Performance, Availability, and Survivability m Data Quality q Web Data Quality Problems Ø Local Updates with Global Effects Ø Unchecked Redundancy (Mutual Copying) Ø Registration of Unchecked Information Ø Spam on the Rise SWEA 88
Data Warehousing and Data Mining m CSE 5810 m Data Warehousing q Provide Access to Data for Complex Analysis, Knowledge Discovery, and Decision Making q Underlying Infrastructure in Support of Mining q Provides Means to Interact with Multiple DBs q OLAP (on-Line Analytical Processing) vs. OLTP Data Mining – Role in BMI and Healthcare? q Discovery of Information in a Vast Data Sets q Search for Patterns and Common Features based q Discover Information not Previously Known Ø Medical Records Accessible Nationwide Ø Research/Discover Cures for Rare Diseases q Relies on Knowledge Discovery in DBs (KDD) SWEA 89
Data Warehousing and OLAP m CSE 5810 m m A Data Warehouse q Database is Maintained Separately from an Operational Database q “A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for Management’s Decision Making Process [W. H. Inmon]” OLAP (on-Line Analytical Processing) q Analysis of Complex Data in the Warehouse q Attempt to Attain “Value” through Analysis q Relies on Trained and Adept Skilled Knowledge Workers who Discover Information Data Mart q Organized Data for a Subset of an Organization q Establish De-Identified Marts for BMI Research SWEA 90
Building a Data Warehouse m CSE 5810 Option 1 q Leverage Existing Repositories q Collate and Collect q May Not Capture All Relevant Data m Option 2 q Start from Scratch q Utilize Underlying Corporate Data Corporate data warehouse Option 1: Consolidate Data Marts Option 2: Build from scratch Data Mart . . . Data Mart Corporate data SWEA 91
BMI – Partition/Excerpt Data Warehouse m CSE 5810 m Clinical and Epidemiological Research (and for T 2 and T 1) Each Study Submitted to Institutional Review Board (IRB) q For Human Subjects (Assess Risks, Protect Privacy) q See: http: //resadm. uchc. edu/hspo/irb/ To Satisfy IRB (and Privacy, Security, etc. ), Reverse Process to Create a Data Mart for each Approved Study q Export/Excerpt Study Data from Warehouse q May be Single or Multiple Sources BMI data warehouse Data Mart . . . Data Mart SWEA 92
Data Warehouse Characteristics m CSE m 5810 m m Utilizes a “Multi-Dimensional” Data Model Warehouse Comprised of q Store of Integrated Data from Multiple Sources q Processed into Multi-Dimensional Model Warehouse Supports of q Times Series and Trend Analysis q “Super-Excel” Integrated with DB Technologies Data is Less Volatile than Regular DB q Doesn’t Dramatically Change Over Time q Updates at Regular Intervals q Specific Refresh Policy Regarding Some Data SWEA 93
Three Tier Architecture CSE 5810 monitor External data sources OLAP Server integrator Summarization report Operational databases Extraxt Transform Load Refresh serve Data Warehouse Query report Data mining metadata Data marts SWEA 94
Data Warehouse Design m CSE 5810 m m Most of Data Warehouses use a Start Schema to Represent Multi-Dimensional Data Model Each Dimension is Represented by a Dimension Table that Provides its Multidimensional Coordinates and Stores Measures for those Coordinates A Fact Table Connects All Dimension Tables with a Multiple Join q Each Tuple in Fact Table Represents the Content of One Dimension q Each Tuple in the Fact Table Consists of a Pointer to Each of the Dimensional Tables q Links Between the Fact Table and the Dimensional Tables for a Shape Like a Star SWEA 95
What is a Multi-Dimensional Data Cube? m CSE 5810 m m m Representation of Information in Two or More Dimensions Typical Two-Dimensional - Spreadsheet In Practice, to Track Trends or Conduct Analysis, Three or More Dimensions are Useful For BMI – Axes for Diagnosis, Drug, Subject Age SWEA 96
Multi-Dimensional Schemas m CSE 5810 m m m Supporting Multi-Dimensional Schemas Requires Two Types of Tables: q Dimension Table: Tuples of Attributes for Each Dimension q Fact Table: Measured/Observed Variables with Pointers into Dimension Table Star Schema q Characterizes Data Cubes by having a Single Fact Table for Each Dimension Snowflake Schema q Dimension Tables from Star Schema are Organized into Hierarchy via Normalization Both Represent Storage Structures for Cubes SWEA 97
Example of Star Schema CSE 5810 Product Date Month Year Sale Fact Table Date Product. No Prod. Name Prod. Desc Categoryu Product Store Customer Store. ID City State Country Region Unit_Sales Dollar_Sales Customer Cust. ID Cust. Name Cust. City Cust. Country SWEA 98
Example of Star Schema for BMI CSE 5810 Vitals Date Month Year Patient Fact Table Visit Date BP Temp Resp HR (Pulse) Vitals Symptoms Patient Symptoms Pulmonary Heart Mus-Skel Skin Digestive Medications Etc. Patient. ID Patient. Name Patient. City Patient. Country Reference another Star Schema for all Meds SWEA 99
A Second Example of Star Schema … CSE 5810 SWEA 100
and Corresponding Snowflake Schema CSE 5810 SWEA 101
Data Warehouse Issues m CSE 5810 m Data Acquisition q Extraction from Heterogeneous Sources q Reformatted into Warehouse Context - Names, Meanings, Data Domains Must be Consistent q Data Cleaning for Validity and Quality is the Data as Expected w. r. t. Content? Value? q Transition of Data into Data Model of Warehouse q Loading of Data into the Warehouse Other Issues Include: q How Current is the Data? Frequency of Update? q Availability of Warehouse? Dependencies of Data? q Distribution, Replication, and Partitioning Needs? q Loading Time (Clean, Format, Copy, Transmit, Index Creation, etc. )? q For CTSA – Data Ownership (Competing Hosps). SWEA 102
Knowledge Discovery m CSE 5810 m m Data Warehousing Requires Knowledge Discovery to Organize/Extract Information Meaningfully Knowledge Discovery q Technology to Extract Interesting Knowledge (Rules, Patterns, Regularities, Constraints) from a Vast Data Set q Process of Non-trivial Extraction of Implicit, Previously Unknown, and Potentially Useful Information from Large Collection of Data Mining q A Critical Step in the Knowledge Discovery Process q Extracts Implicit Information from Large Data Set SWEA 103
Steps in a KDD Process m CSE m 5810 m m m m Learning the Application Domain (goals) Gathering and Integrating Data Cleaning Data Integration Data Transformation/Consolidation Data Mining q Choosing the Mining Method(s) and Algorithm(s) q Mining: Search for Patterns or Rules of Interest Analysis and Evaluation of the Mining Results Use of Discovered Knowledge in Decision Making Important Caveats q This is Not an Automated Process! q Requires Significant Human Interaction! SWEA 104
OLAP Strategies m CSE 5810 m OLAP Strategies q Roll-Up: Summarization of Data q Drill-Down: from the General to Specific (Details) q Pivot: Cross Tabulate the Data Cubes q Slide and Dice: Projection Operations Across Dimensions q Sorting: Ordering Result Sets q Selection: Access by Value or Value Range Implementation Issues q Persistent with Infrequent Updates (Loading) q Optimization for Performance on Queries is More Complex - Across Multi-Dimensional Cubes q Recovery Less Critical - Mostly Read Only q Temporal Aspects of Data (Versions) Important SWEA 105
On-Line Analytical Processing m CSE 5810 m Data Cube q A Multidimensonal Array q Each Attribute is a Dimension In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Date Product Store acron Rolla, MO 7/3/99 budwiser LA, CA 5/22/99 833. 92 large pants NY, NY 2/12/99 771. 24 3’ diaper Date Cuba, MO 7/30/99 Sale 325. 24 Pants Diapers Beer Nuts West East 81. 99 Region Central Mountain South Jan Feb March April Date SWEA 106
On-Line Analytical Processing m CSE 5810 For BMI – Imagine a Data Table with Patient Data q Define Axis q Summarize Data q Create Perspective to Match Research Goal q Essentially De-identified Data Mart Medication Patient Med Birth. Dat Dosage Steve Lipitor 1/1/45 John Zocor 2/2/55 80 mg Harry Crestor 3/3/65 5 mg Lois Lipitor 4/4/66 20 mg Charles Crestor 7/1/59 10 mg Lescol Crestor Zocor Lipitor 10 mg 5 10 Dosage 20 40 80 1940 s 1950 s 1960 s 1970 s Decade SWEA 107
Examples of Data Mining The Slicing Action q A Vertical or Horizontal Slice Across Entire Cube m CSE 5810 s Months Slice on city Atlanta Products Sales Ci tie Months Multi-Dimensional Data Cube SWEA 108
Examples of Data Mining The Dicing Action q A Slide First Identifies on Dimension q A Selection of Any Cube within the Slice which Essentially Constrains All Three Dimensions m CSE 5810 Months Products Sales At lan ta Products Sales Ci tie s Months Electronics March 2000 Atlanta Dice on Electronics and Atlanta SWEA 109
Examples of Data Mining Drill down on Q 1 Q 2 Q 3 Q 4 Roll Up on Location (State, USA) Roll Up: Combines Multiple Dimensions From Individual Cities to State Ca Ari li zo Ge forn na or ia Io gia wa Q 1 Q 2 Q 3 Q 4 Products Sales C A Ga olu tlan m t Sa ines bu a va vil s nn le ah Jan Feb March Products Sales Cit Location (city, GA) ies Drill Down - Takes a Facet (e. g. , Q 1) and Decomposes into Finer Detail CSE 5810 SWEA 110
Mining Other Types of Data m CSE m 5810 Analysis and Access Dramatically More Complicated! Time Series Data for Glucose, BP, Peak Flow, etc. Spatial databases Multimedia databases World Wide Web Time series data Geographical and Satellite Data SWEA 111
Advantages/Objectives of Data Mining m CSE 5810 m m Descriptive Mining q Discover and Describe General Properties q 60% People who buy Beer on Friday also have Bought Nuts or Chips in the Past Three Months Predictive Mining q Infer Interesting Properties based on Available Data q People who Buy Beer on Friday usually also Buy Nuts or Chips Result of Mining q Order from Chaos q Mining Large Data Sets in Multiple Dimensions Allows Businesses, Individuals, etc. to Learn about Trends, Behavior, etc. q Impact on Marketing Strateg SWEA 112
Data Mining Methods (1) m CSE 5810 Association q Discover the Frequency of Items Occurring Together in a Transaction or an Event q Example Ø 80% Customers who Buy Milk also Buy Bread Hence - Bread and Milk Adjacent in Supermarket Ø 50% of Customers Forget to Buy Milk/Soda/Drinks Hence - Available at Register m Prediction q Predicts Some Unknown or Missing Information based on Available Data q Example Ø Forecast Sale Value of Electronic Products for Next Quarter via Available Data from Past Three Quarters SWEA 113
Association Rules m CSE m 5810 m m Motivated by Market Analysis Rules of the Form q Item 1^Item 2^…^ Itemk+1 ^ … ^ Itemn Example q “Beer ^ Soft Drink Pop Corn” Problem: Discovering All Interesting Association Rules in a Large Database is Difficult! q Issues Ø Interestingness Ø Completeness Ø Efficiency q Basic Measurement for Association Rules Ø Support of the Rule Ø Confidence of the Rule SWEA 114
Data Mining Methods (2) m CSE 5810 Classification q Determine the Class or Category of an Object based on its Properties q Example Ø Classify Companies based on the Final Sale Results in the Past Quarter m Clustering q Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity q Example Ø Group Crime Locations to Find Distribution Patterns SWEA 115
Classification m CSE 5810 m m Two Stages q Learning Stage: Construction of a Classification Function or Model q Classification Stage: Predication of Classes of Objects Using the Function or Model Tools for Classification q Decision Tree q Bayesian Network q Neural Network q Regression Problem q Given a Set of Objects whose Classes are Known (Training Set), Derive a Classification Model which can Correctly Classify Future Objects SWEA 116
An Example m Attributes m Class Attribute - Play/Don’t Play the Game Training Set q Values that Set the Condition for the Classification q What are the Pattern Below? CSE 5810 m Attribute Possible Values outlook sunny, overcast, rain temperature continuous humidity continuous windy true, false Outlook Temperature Humidity Windy Play sunny 85 false No overcast 83 78 false Yes sunny 80 90 true No sunny 72 95 false No sunny 72 70 false Yes … … . . . SWEA 117
Data Mining Methods (3) m CSE 5810 Summarization q Characterization (Summarization) of General Features of Objects in the Target Class q Example Ø Characterize People’s Buying Patterns on the Weekend Ø Potential Impact on “Sale Items” & “When Sales Start” Ø Department Stores with Bonus Coupons m Discrimination q Comparison of General Features of Objects Between a Target Class and a Contrasting Class q Example Ø Comparing Students in Engineering and in Art Ø Attempt to Arrive at Commonalities/Differences SWEA 118
Summarization Technique m CSE m 5810 Attribute-Oriented Induction Generalization using Concert hierarchy (Taxonomy) barcode category brand content size 14998 milk diaryland Skim 2 L food 12998 mechanical Motor. Craft valve 23 a 12 in … … . . . Milk … bread Skim milk … 2% milk Category Content Count milk skim 280 milk 2% 98 … . . . White whole bread … wheat Lucern … Dairyland Wonder … Safeway SWEA 119
Why is Data Mining Popular? m CSE 5810 Technology Push q Technology for Collecting Large Quantity of Data Ø Bar Code, Scanners, Satellites, Cameras q Technology for Storing Large Collection of Data Ø Databases, Data Warehouses Ø Variety of Data Repositories, such as Virtual Worlds, Digital Media, World Wide Web m m Corporations want to Improve Direct Marketing and Promotions - Driving Technology Advances q Targeted Marketing by Age, Region, Income, etc. q Exploiting User Preferences/Customized Shopping What is Potential for BMI? q How do you see Data Mining Utilized? q What are Key Issues to Worry About? SWEA 120
Requirements & Challenges in Data Mining m CSE 5810 m m m Security and Social q What Information is Available to Mine? q Preferences via Store Cards/Web Purchases q What is Your Comfort Level with Trends? User Interfaces and Visualization q What Tools Must be Provided for End Users of Data Mining Systems? q How are Results for Multi-Dimensional Data Displayed? Performance Guarantees q Range from Real-Time for Some Queries to Long. Term for Other Queries Data Sources of Complex Data Types or Unstructured Data - Ability to Format, Clean, and Load Data Sets SWEA 121
CSE 5810 An Initiative of the University of Connecticut Center for Public Health and Health Policy Robert H. Aseltine, Jr. , Ph. D. Cal Collins January 16, 2008 SWEA 122
What is CHIN? m CSE 5810 m State of Connecticut Agencies Collect and Maintain Data in Separate Databases such as: q Vital Statistics: Birth, Death (DPH) q Surveillance data: Lead Screening and Immunization Registries (DPH) q Administrative services: LINK system (DCF), CAMRIS (DMR) q Benefit programs: WIC (DPH), Medicaid (DSS) q Educational achievement: (PSIS) Such Data is Un-Integrated q Impossible to Track Assess Target Populations q Difficult to Develop Evidence-Based Practices q Limits Meaningful Interactions Among State Agencies SWEA 123
What Do We Mean by “Integration? ” UCONN Health Center Low Birth Weight Infant Registry CT Dept. of Education PSIS System Dept. of Mental Retardation Birth to Three System CSE 5810 Last Name First Name DOB SSN Birth Wt. (kg) Last Name First Name DOB Street Town Appel April 01/01/1 999 016 -000 -9876 2. 8 Allen Gwen 01/01/19 99 Apple Enfie Berry John 02/02/1 997 216 -000 -4576 2. 9 Buck Jerome 07/01/19 99 Burbank West Carat Colleen 03/03/1 993 119 -000 -1234 1. 9 Cleary Jane 03/03/19 93 Cedar Tolla Ernst Max 04/04/1 994 116 -000 -3456 2. 7 Dory Daniel 03/03/19 93 Dogfish Hartf Gomez Gloria 05/05/1 995 036 -000 -9999 2. 6 Ernst Max 04/04/19 94 Elm Enfie Hurst William 06/06/1 996 016 -000 -5599 3. 1 Friday Joe 11/03/19 99 Fruit Wind Keller Helene 07/07/1 997 017 -000 -2340 2. 5 Glenn Valerie 03/23/19 98 Glen Branf Martinez Pedro 08/08/1 998 018 -000 -9886 Martinez Pedro 08/08/19 98 High Hartf Rodriguez Felix 09/09/1 999 029 -000 -9111 Riley Lily 03/03/19 96 Ipswich Bridg Smith Peggy 10/10/2 000 016 -000 -8787 Sanchez Ramon 03/03/19 93 Juniper New 2. 8 2. 5 First Name CMT Math Polio Vac Date Days in Attendance Appel April 134 01/05/ 1999 179 Carat Colleen 256 05/01/ 1998 122 Cleary Jane 268 01/28/ 2000 178 Ernst Max 152 01/09/ 1999 145 Gomez 3. 0 Last Name Gloria 289 01/01/ 1999 168 Friday Joe 265 10/01/ 1999 170 Keller Helene 309 11/01/ 2001 180 Martinez Pedro 248 12/01/ 2003 180 Riley Lily 201 01/01/ 1999 122 Sanchez Ramon 249 01/01/ 1999 159 Last Name First Name DOB SSN Birth Wt. Street Town CMT Math Grade 3 Polio Vaccination Date Days in Attendance Ernst Max 04/04/1994 116 -000 -3456 2. 7 Elm Enfield 152 01/09/1999 145 Martinez Pedro 08/08/1998 018 -000 -9886 3. 0 High Hartford 248 12/01/2003 180 SWEA 124
Key Challenges to Integrating Data m CSE 5810 m m Security and Privacy q HIPAA q FERPA q WIC, Social Security (Medicaid/Medicare) regulations q State statutes Alteration/disruption of business practices Unique identification of individuals/cases Accuracy and reliability of data Disparate hardware/software platforms SWEA 125
Key Challenges to Integrating Data m CSE 5810 m m Security and Privacy q HIPAA q FERPA q WIC, Social Security (Medicaid/Medicare) regulations q State statutes Alteration/disruption of business practices Unique identification of individuals/cases Accuracy and reliability of data Disparate hardware/software platforms SWEA 126
The Solution: CHIN m CSE m 5810 m Connecticut Health Information Network A Federated Network That: q Allows Shared Access to “Health”-related Data From Heterogeneous Databases q Allows Agencies to Retain Complete Control Over Access to Data q Has Minimal Impact on Business Practices q Complies with Security and Privacy Statutes q Incorporates Cutting-edge Approaches to Case Matching Partnership of: q Early Partners: DPH, DCF, DDS, Do. E, DOIT, UConn, Akaza Research SWEA 127
Current CHIN Architecture CSE 5810 SWEA 128
Path – Modular Data Integration m CSE 5810 m m Produce relational, record-level datasets by merging data from multiple agencies to support research into health, education, and social services, licensing De-identify or anonymize that data to the level necessary for a particular application Utilized internally within an agency to integrate data that does not need to be anonymized. Supports Integraiton with legacy systems that hold data in incompatible formats http: //www. publichealth. uconn. edu/pathproduct. html SWEA 129
Path – Capabilities m CSE 5810 m m m m integrates data from diverse sources that may or may not share a universal record identifier handles data in a HIPAA and FERPA compliant manner utilizes a highly secure architecture maintains the autonomy of agency data - exposure, location, and schema provides an extremely easy to learn and flexible user interface requires no changes to agency database schemas needs minimal upgrade to departmental computer hardware and software once installed, it can quickly and efficiently produce integrated datasets SWEA 130
Concluding Remarks m CSE 5810 m Only Scratched Surface on Architectures q Micro Architectures q Macro Architectures q Super-Macro Architectures (We’ll see …) What’s are Key Facets in the Discussion? q Role and Impact of Standards q Open Solutions q Architectural Variants – Reuse “Architecture” Ø Can we Reuse CHIN for Clinical Practice? Ø Are All Contributors Simply Each Hospital and EHR? Ø How do we Connect all of the Pieces? m What are Next Steps? q Let’s Review Some other Work q Source: Wide Range of Presentations on Web SWEA 131


