Скачать презентацию Software and Enterprise Architectures CSE 5095 Prof Steven Скачать презентацию Software and Enterprise Architectures CSE 5095 Prof Steven

d8875e5a7ba3cf2bf0e37dd380660cc9.ppt

  • Количество слайдов: 194

Software and Enterprise Architectures CSE 5095 Prof. Steven A. Demurjian, Sr. Computer Science & Software and Enterprise Architectures CSE 5095 Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-255 Storrs, CT 06269 -2155 [email protected] uconn. edu http: //www. engr. uconn. edu/~stev e (860) 486 - 4818 Copyright © 2008 by S. Demurjian, Storrs, CT. SWEA 1

Software Architectures m CSE m 5095 m m m Emerging Discipline in Mid-1990 s Software Architectures m CSE m 5095 m m m Emerging Discipline in Mid-1990 s Software as Collection of Interacting Components What are Local Interactions (within Component)? What are Global Interactions (between Components)? Advantages of SW Architectural Design q Understand Communication/Synchronization q Definition of Database Requirements q Identification of Performance/Scaling Issues q Detailing of Security Needs and Constraints Towards Large-Scale Software Development For Biomedical Informatics: q What are Architectures for Data Sharing? q How is Interoperability Facilitated? SWEA 2

Concepts of Software Architectures m CSE 5095 m m m Exceed Traditional Algorithm/Data Structure Concepts of Software Architectures m CSE 5095 m m m Exceed Traditional Algorithm/Data Structure Perspective Emphasize Componentwise Organization and System Functionality Focus on Global and Local Interactions Identify Communication/Synchronization Requirements Define Database Needs and Dependencies Consider Performance/Scaling Issues Understand Potential Evolution Dimensions SWEA 3

The HTSS Software Architecture SDO CSE 5095 IL IL IL EDO SDO EDO Payment The HTSS Software Architecture SDO CSE 5095 IL IL IL EDO SDO EDO Payment CR CR IL: CR: IC: DO: Item Locator Cash Register Invent. Control Deli Orderer for Shopper/Employee Item IC Order IC Non-Local Client Int. Credit. Card. DB Inventory Control Item. DB Global Server Item. DB Local Server ATM-Ban. KDB Order. DB Supplier. DB SWEA 4

Multiple Backend Database System (MBDS) CSE 5095 Backend Database Processor Database Controller Backend Database Multiple Backend Database System (MBDS) CSE 5095 Backend Database Processor Database Controller Backend Database Processor Host/User Backend Database Processor SWEA 5

The MBDS Processes CSE 5095 Database Controller Request Preparation Post Processing Put Msg. Get The MBDS Processes CSE 5095 Database Controller Request Preparation Post Processing Put Msg. Get Msg. Put Msg. Directory Management Record Processing Concurrency Control Backend Database Processor Disk I/O SWEA 6

Multiple Processes in MBDS CSE 5095 No. 1 2 3 4 6 12 15 Multiple Processes in MBDS CSE 5095 No. 1 2 3 4 6 12 15 16 21 22 23 Type New Request Results of Request Number of Reqs in Transaction Aggregate Operators (Sum, etc. ) Parsed Request to Backends Backend Aggregate Operator Results Ids for Accessing Database Indexes Request and Disk Addresses Ids for Accessing Database Records Locks Obtained: Okay to Execute Request ID of Finished Request SRC Host Po. Pr Req. P Rec. P DM DM DM CC Rec. P DST Req. P Host Po. Pr DMs Rec. P CC SWEA 7

Message Passing in MBDS CSE 5095 F 15 From Other Backend A 1 Request Message Passing in MBDS CSE 5095 F 15 From Other Backend A 1 Request Preparation D 6 Put Msg. B 3 C 4 K 12 Post Processing K 12 Get Msg. E 15 To Backend(s) Get Msg. Put Msg. D 6, F 15 E 15 Directory Management G 21 K 12 H 22 Record Processing I 16 Concurrency Control J 23 Disk I/O SWEA 8

Software Design Levels m CSE 5095 m m m Architecturally: q Modules q Interconnections Software Design Levels m CSE 5095 m m m Architecturally: q Modules q Interconnections Among Modules q Decomposition into Subsystems Code: q Algorithms/Data Structures q Tasking/Control Threads Executable: q Memory Management q Runtime Environment Is this a Realistic/Accurate View? q Yes for a Single “Application” q What about Application of Applications? q System of Systems? SWEA 9

Software Engineering - an Oxymoron? m CSE m 5095 m m m Is there Software Engineering - an Oxymoron? m CSE m 5095 m m m Is there any Engineering? Is there any Science? Collection of Disparate Techniques: q Data-Flow Diagrams q E-R Diagrams q Finite State Machines q Petri Nets q UML Class, Object, Sequence, Etc. q Design Patterns q Model Drive Architectures What is being “Engineered”? How do we Know we are Done? q E. g. Does Artifact Match Specification? SWEA 10

What's Available for Engineering Software? m CSE m 5095 m m m Specification (Abstract What's Available for Engineering Software? m CSE m 5095 m m m Specification (Abstract Models, Algebraic Semantics) Software Structure (Bundling Representation with Algorithms) Languages Issues (Models, Scope, User-Defined Types) Information Hiding (Protect Integrity of Information) Integrity Constraints (Invariants of Data Structures) Is this up to date? What else can be Added to List? q Design Patters q Model Driven Architectures q XML –Data Modeling and Dependencies q Others? SWEA 11

Engineering Success in Computing m CSE 5095 m m Compilers Have Had Great Success Engineering Success in Computing m CSE 5095 m m Compilers Have Had Great Success q Originally by Hand q Then Compilers q Parser Generators - Lex/Yacc Solid Science Behind Compilers q Regular, Context Free, Context Sensitive Languages q FSAs, PDAs, CFGs, etc. Science has Provided Engineering Success re. Ease and Accuracy of Modern Compiler Writing SWEA 12

History of Programming m CSE 5095 m m C - Still Remains Industry Stronghorse History of Programming m CSE 5095 m m C - Still Remains Industry Stronghorse q Separate Compilation q Decomposition of System into Subsystems, etc. q Shared Declarations q ADTs in C, But Compiler won't Enforce Them Modula-II and Ada 83 Had q Information Hiding q Public/Private Paradigm q Module/Package Concepts q Import/Export Paradigm Rigor Enforced by Compiler – but Can’t q Bind/Group Modules into Subsystems q Precisely Specify Interconnections and Interactions Among Subsystems and Components SWEA 13

‘Recent-Past’ Generation? m CSE 5095 m m C++ and Ada 95 q Considered “Legacy” ‘Recent-Past’ Generation? m CSE 5095 m m C++ and Ada 95 q Considered “Legacy” Languages - Old Java, C# - Are they Headed Toward Legacy? q How do they Rate? q What Do they Offer that Hasn't been Offered Before? q What are Unique Benefits and Potential of Java? What about new Web Technologies? q Javascript, Perl, Ph. P, Phython, Ruby q XML and SOAP q How do all of these fit into this process? q Particularly in Regards to C/S Solutions! SWEA 14

What's Next Step? m CSE 5095 m m m Architectural Description Languages q Provide What's Next Step? m CSE 5095 m m m Architectural Description Languages q Provide Tools to Describe Architectures q Definition and Communication Codification of Architectural Expertise Frameworks for Specific Domains DB vs. GUI vs. Embedded vs. C/S Formal Underpinning for Engineering Rigor What has Appeared for Each of these? q Struts for GUI q Open Source Frameworks (mediawiki) q Wide-Ranging Standards (XML) q Model-Driven Architectures q What Else? ? ? SWEA 15

Architectural Styles m CSE 5095 m m m What are Popular Architectural Styles? q Architectural Styles m CSE 5095 m m m What are Popular Architectural Styles? q How are they Characterized? q Example in Practice Explore a Taxonomy of Styles Focus on “Micro-Architectures” q Components q Flow Among Components q Represents “Single” Application Forms Basis for “Macro-Architectures” q System of Systems q Application of Applications q Significantly Scaling Up SWEA 16

Taxonomy of Architectural Styles CSE 5095 m m Data Flow Systems q Batch Sequential Taxonomy of Architectural Styles CSE 5095 m m Data Flow Systems q Batch Sequential q Pipes and Filters Call & Return Systems m q Main/Subroutines (C, Pascal) q Object Oriented q Implicit Invocation q Hierarchical Systems m Virtual Machines q Interpreters q Rule Based Systems Data Centered Systems q DBS q Hypertext q Blackboards Independent Components q Communicating Processes/Event Systems Client/Server q Two-Tier q Multi-Tier SWEA 17

Taxonomy of Architectural Styles m CSE 5095 Establish Framework of … q Components Ø Taxonomy of Architectural Styles m CSE 5095 Establish Framework of … q Components Ø Building Blocks for Constructing Systems Ø A Major Unit of Functionality Ø Examples Include: Client, Server, Filter, Layer, DB q Connectors Ø Defining the Ways that Components Interact Ø What are the Protocols that Mandate the Allowable Interactions Among Components? Ø How are Protocols Enforced at Run/Design Time? Ø Examples Include: Procedure Call, Event Broadcast, DB Protocol, Pipe SWEA 18

Overall Framework m CSE 5095 m m m What Is the Design Vocabulary? q Overall Framework m CSE 5095 m m m What Is the Design Vocabulary? q Connectors and Components What Are Allowable Structural Patterns? q Constraints on Combining Components & Connectors What Is the Underlying Conceptual Model? q Von Newman, Parallel, Agent, Message-Passing… q Are their New Emerging Models? q Collaborative Environments/Shareware? What Are Essential Invariants of a Style? q Limits on Allowable Components & Connectors Common Examples of Usage Advantages and Disadvantages of a Style Common Specializations of a Style SWEA 19

Pipes and Filters CSE 5095 Components are Independent Entities. No Shared State! Components with Pipes and Filters CSE 5095 Components are Independent Entities. No Shared State! Components with Input and Output Sort Merge Connectors for Flow Streams of I/O m Filters: q Invariant: Unaware of up and Down Stream Behavior q Streamed Behavior: Output Could Go From One Filter to the Next One Allowing Multiple Filters to Run in Parallel. SWEA 20

Pipes and Filters CSE 5095 m m m Possible Specializations: q Pipelines - Linear Pipes and Filters CSE 5095 m m m Possible Specializations: q Pipelines - Linear Sequence q Bounded - Limits on Data Amounts q Typed Pipes - Known Data Format What is a Classic Example? Other Examples: q Compilers q Sequential Processes q Parallel Processes SWEA 21

Pipes and Filters - Another Example m CSE 5095 m m Text Information Retrieval Pipes and Filters - Another Example m CSE 5095 m m Text Information Retrieval Systems q Scanning Newspapers for Key Words, Etc. q Also, Boolean Search Expressions Where is Such an Architecture Utilized Today? What is Potential Usage in BMI? User Commands Search Disk Controller Programming Result Query Resolver Control Term Search Comparator Data DB SWEA 22

ADTs and OO Architectures m CSE m 5095 Widespread Usage in the 1990’s Advantages ADTs and OO Architectures m CSE m 5095 Widespread Usage in the 1990’s Advantages Are Well Known Components op Connectors op obj op op op obj obj op op obj m Disadvantages: q Interaction Required Object Identity q If Identity Changes, It Is Difficult to Track All Affected Objects. SWEA 23

Implicit Invocation m CSE 5095 m m Similar to OO in the Sense that Implicit Invocation m CSE 5095 m m Similar to OO in the Sense that Components Can Call Services on Other Components How Does this Work? q Components Have List of Events they can Raise and List of Procedures to Handle Events q When Event is Raised, it is Broadcast q All Components that Have Procedure to Handle Broadcast Event will Act Upon it q The Component That Raised the Event has no Knowledge of Which Component(s) will Handle Event What are Some Examples? SWEA 24

Implicit Invocation m CSE 5095 m Advantages q No Need to Know the Targeted Implicit Invocation m CSE 5095 m Advantages q No Need to Know the Targeted Components q Single Event can Impact Multiple Components q New Event Handlers can Easily be Added q New Events Can then be Raised Disadvantages q No Control Over the Order of Processing When an Event is Raised q No Control Over “Who” and “How Many” Process Events q Very Non-Deterministic System Behavior SWEA 25

What has OO Evolved Into? m CSE 5095 m What has Classic OO Solution What has OO Evolved Into? m CSE 5095 m What has Classic OO Solution Evolved into Today? q Client (Browser + Struts) q Server (Many Variants of OO Languages) q Database Server (typically Relational) Different Style (e. g. , Design Pattern) q Does Pattern Capture All Aspects of Style? q Do we Need to Couple Technology with Pattern? Dr. D, Jan 01, 08 Fever, Flu, Bed Rest No Scripts No Tests Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech) SWEA 26

Layered Systems CSE 5095 Useful Systems Base Utility Core level Users m m m Layered Systems CSE 5095 Useful Systems Base Utility Core level Users m m m Components - Virtual Machine at Each Layer Connectors - Protocols That Specify How Layers Interaction Is Restricted to Adjacent Layers SWEA 27

Layered Systems m CSE 5095 m Advantages: q Increasing Levels of Abstraction q Support Layered Systems m CSE 5095 m Advantages: q Increasing Levels of Abstraction q Support Enhancement - New Layers q Support for Reuse Drawbacks: q Not Feasible for All Systems q Performance Issues With Multiple Layers q Defining Abstractions Is Difficult. SWEA 28

Layered Systems in BMI m CSE 5095 m One Approach to Constructing Access to Layered Systems in BMI m CSE 5095 m One Approach to Constructing Access to Patient Data for Clinical Research and Clinical Practice Construct Layered Data Repositories as Below q Each Layer Targets Different User Group q Need to Fine Tune Access Even within Layers Aggregated De-identified Patient Data Provider Cl. Researchers Public Health Researchers SWEA 29

ISO as Layered Architecture m CSE 5095 ISO Open Systems Interconnect (OSI) Model q ISO as Layered Architecture m CSE 5095 ISO Open Systems Interconnect (OSI) Model q Now Widely Used as a Reference Architecture q 7 -layer Model q Provides Framework for Specific Protocols (Such as IP, TCP, FTP, RPC, UDP, RSVP, …) Application Presentation Session Transport Network Data Link Physical SWEA 30

ISO OSI Model Application Presentation Session Transport Network Data Link Physical CSE 5095 m ISO OSI Model Application Presentation Session Transport Network Data Link Physical CSE 5095 m m m Application Presentation Session Transport Network Data Link Physical (Hardware)/Data Link Layer Networks: Ethernet, Token Ring, ATM Network Layer Net: The Internet Transport Layer Net: Tcp-based Network Presentation/Session Layer Net: Http/html, RPC, PVM, MPI Applications, E. g. , WWW, Window System, Algorithm SWEA 31

Repositories CSE 5095 ks 8 ks 1 Blackboard (shared data) ks 2 ks 3 Repositories CSE 5095 ks 8 ks 1 Blackboard (shared data) ks 2 ks 3 ks 6 ks 4 m m ks 7 ks 5 Knowledge Sources Interact With the Blackboard Contains the Problem Solving State Data. Control Is Driven by the State of the Blackboard. DB Systems Are a Form of Repository With a Layer Between the BB and the KSs - Supports q Concurrent Access, Security, Integrity, Recovery SWEA 32

Database System as a Repository CSE 5095 c 8 c 1 Database (shared data) Database System as a Repository CSE 5095 c 8 c 1 Database (shared data) c 2 c 3 c 6 c 4 m m m c 7 c 5 Clients Interact With the DBMS Database Contains the Problem Solving State Data Control is Driven by the State of the Database q Concurrent Access, Security, Integrity, Recovery q Single Layer System: Clients have Direct Access q Control of Access to Information must be Carefully Defined within DB Security/Integrity SWEA 33

Team Project as a Repository CSE 5095 c 8 c 1 Web Portal Shared Team Project as a Repository CSE 5095 c 8 c 1 Web Portal Shared c 2 c 3 c 6 c 4 m m m c 7 c 5 Clients are Providers, Patients, Clinical Researchers Database Underlies Web Portal Simply a Portion of Architecture q Interactions with PHR (Patients) q Interactions with EMR (Providers) q Interactions with Database/Warehouse (Researchers) SWEA 34

Interpreters CSE 5095 Inputs Outputs m m Program being interpreted Data (program state) Simulated Interpreters CSE 5095 Inputs Outputs m m Program being interpreted Data (program state) Simulated interpretation engine Selected instruction Selected data Internal interpreter state What Are Components and Connectors? Where Have Interpreters Been Used in CS&E? q LISP, ML, Java, Other Languages, OS Command Line SWEA 35

Java as Interpreter CSE 5095 SWEA 36 Java as Interpreter CSE 5095 SWEA 36

Process Control Paradigms Input variables CSE 5095 Set point Ds to manipulated variables Controller Process Control Paradigms Input variables CSE 5095 Set point Ds to manipulated variables Controller Input variables Set point m Controller Ds to manipulated variables With Feedback Process Controlled variable Without Feedback Process Controlled variable Also: q Open vs. Close Loop Systems q Well Defined Control and Computational Characters q Heavily Used in Engineering Fields. SWEA 37

Process Architecture: Statechart Diagram? CSE 5095 SWEA 38 Process Architecture: Statechart Diagram? CSE 5095 SWEA 38

Process Architecture: Activity Diagram? m CSE 5095 Clear Applicability to Medical Processes that have Process Architecture: Activity Diagram? m CSE 5095 Clear Applicability to Medical Processes that have Underlying BMI – Low Level Processes Waiting for Heart Signal timeout irregular beat Heart Signal Waiting for Resp. Signal Breath Trigger Local Alarm Trigger Remote Alarm Resp Signal Alarm Reset SWEA 39

Design Patterns as Software Architectures m CSE 5095 m m m Emerged as the Design Patterns as Software Architectures m CSE 5095 m m m Emerged as the Recognition that in Object-Oriented Systems Repetitions in Design Occurred Gained Prominence in 1995 with Publication of “Design Patterns: Elements of Reusable Object. Oriented Software”, Addison-Wesley q “… descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context…” q Akin to Complicated Generic Usage of Patterns Requires q Consistent Format and Abstraction q Common Vocabulary and Descriptions Simple to Complex Patterns – Wide Range SWEA 40

The Observer Pattern m CSE 5095 m m Utilized to Define a One-to-Many Relationship The Observer Pattern m CSE 5095 m m Utilized to Define a One-to-Many Relationship Between Objects When Object Changes State – all Dependents are Notified and Automatically Updated Loosely Coupled Objects q When one Object (Subject – an Active Object) Changes State than Multiple Objects (Observers – Passive Objects) Notified q Observer Object Implements Interface to Specify the Way that Changes are to Occur q Two Interfaces and Two Concrete Classes SWEA 41

The Observer Pattern CSE 5095 SWEA 42 The Observer Pattern CSE 5095 SWEA 42

Model View Controller m http: //java. sun. com/blueprints/patterns/MVC-detailed. html CSE 5095 SWEA 43 Model View Controller m http: //java. sun. com/blueprints/patterns/MVC-detailed. html CSE 5095 SWEA 43

Model View Controller m CSE 5095 Three Parts of the Pattern: q Model Ø Model View Controller m CSE 5095 Three Parts of the Pattern: q Model Ø Enterprise Data and Business Rules for Accessing and Updating Data q View Ø Renders the Contents (or Portion) of Model Ø Deals with Presentation of Stored Data Ø Pull or Push Model Possible q Controller Ø Translates Interactions with View into Actions on Model Ø Actions could be Button Clicks (GUI), Get/Post http (Web), etc. SWEA 44

Model View Controller m http: //java. sun. com/blueprints/patterns/MVC-detailed. html CSE 5095 SWEA 45 Model View Controller m http: //java. sun. com/blueprints/patterns/MVC-detailed. html CSE 5095 SWEA 45

UML for System Modeling m CSE 5095 m m m UML is a Language UML for System Modeling m CSE 5095 m m m UML is a Language for Specifying, Visualizing, Constructing, and Documenting Software Artifacts What Does a Modeling Language Provide? q Model Elements: Concepts and Semantics q Notation: Visual Rendering of Model Elements q Guidelines: Hints and Suggestions for Using Elements in Notation References and Resources q Web: http: //www. uml. org/ Is UML Sufficient for Complexity of BMI? q Able to Model Information Needs for BMI? q Able to Represent Required Architectures? SWEA 46

UML Diagrammatic Representations m CSE 5095 m m Component Diagram: Captures the Physical Structure UML Diagrammatic Representations m CSE 5095 m m Component Diagram: Captures the Physical Structure of the Implementation Deployment Diagram: Captures the Topology of a System’s Hardware Collaboration Diagram: Captures Dynamic Behavior (Message-Oriented) What About Other Diagrams? q State Chart Diagram: Captures Dynamic Behavior (Event-Oriented) q Activity Diagram: Captures Dynamic Behavior (Activity-Oriented) These and Others Seem too Low Level … What is Role of UML for BMI? q Yet Another Design Artifact q Can it be More? SWEA 47

Component Diagram m Captures the Physical Structure of the Implementation CSE 5095 SWEA 48 Component Diagram m Captures the Physical Structure of the Implementation CSE 5095 SWEA 48

Deployment Diagram m Captures the Topology of a System’s Hardware CSE 5095 SWEA 49 Deployment Diagram m Captures the Topology of a System’s Hardware CSE 5095 SWEA 49

Collaboration Diagram CSE 5095 SWEA 50 Collaboration Diagram CSE 5095 SWEA 50

Single and Multi-Tier Architectures m CSE 5095 m Widespread use in Practice for All Single and Multi-Tier Architectures m CSE 5095 m Widespread use in Practice for All Types of Distributed Systems and Applications Two Kinds of Components q Servers: Provide Services - May be Unaware of Clients Ø Web Servers (unaware? ) Ø Database Servers and Functional Servers (aware? ) q Clients: Request Services from Servers Ø Must Identify Servers Ø May Need to Identify Self Ø A Server Can be Client of Another Server m Expanding from Micro-Architectures (Single Computer/One Application) to Macro-Architecture SWEA 51

Single and Multi-Tier Architectures m CSE 5095 m m Normally, Clients and Servers are Single and Multi-Tier Architectures m CSE 5095 m m Normally, Clients and Servers are Independent Processes Running in Parallel Connectors Provide Means for Service Requests and Answers to be Passes Among Clients/Servers Connectors May be RPC, RMI, etc. Advantages q Parallelism, Independence q Separation of Concerns, Abstraction q Others? Disadvantages q Complex Implementation Mechanisms q Scalability, Correctness, Real-Time Limits q Others? SWEA 52

Example: Software Architectural Structure CSE 5095 Initial Data Entry Operator (Scanning & Posting) Advanced Example: Software Architectural Structure CSE 5095 Initial Data Entry Operator (Scanning & Posting) Advanced Data Entry Operators Analyst Manager 10 -100 MB Network Document Server Stored Images/CD Database Server Running Oracle RMI Registry RMI Act. Obj/Server Functional Server SWEA 53

Business Process Model CSE 5095 DB DB Historical Completed Records Applications Licensing DB Supervisor Business Process Model CSE 5095 DB DB Historical Completed Records Applications Licensing DB Supervisor Review Scanner DB Licensing Division Scanning Operator Stored Images Licensing Division Printer Data Entry Operator DB Basic Information Entered New Licenses New Appointments FOI Letters (Request Information, etc. ) SWEA 54

Two-Tier Architecture m CSE 5095 m m Small Manufacturer Previously on C++ New Order Two-Tier Architecture m CSE 5095 m m Small Manufacturer Previously on C++ New Order Entry, Inventory, and Invoicing Applications in Java Programming Language Existing Customer and Order Database Most of Business Logic in Stored Procedures Tool-generated GUI Forms for Java Objects SWEA 55

Three-Tier Architecture m CSE 5095 m m m Passenger Check-in for Regional Airline Local Three-Tier Architecture m CSE 5095 m m m Passenger Check-in for Regional Airline Local Database for Seating on Today's Flights Clients Invoke EJBs at Local Site Through RMI EJBs Update Database and Queue Updates JMS Queues Updates to Legacy System DBC API Used to Access Local Database SWEA 56

Four-Tier Architecture m CSE m 5095 m m m Web Access to Brokerage Accounts Four-Tier Architecture m CSE m 5095 m m m Web Access to Brokerage Accounts Only HTML Browser Required on Front End "Brokerbean" EJB Provides Business Logic Login, Query, Trade Servlets Call Brokerbean Use JNDI to Find EJBs, RMI to Invoke Them SWEA 57

Architecture Comparisons m CSE 5095 m m Two-tier Through JDBC API is Simplest Multi-tier: Architecture Comparisons m CSE 5095 m m Two-tier Through JDBC API is Simplest Multi-tier: Separate Business Logic, Protect Database Integrity, More Scaleable JMS Queues vs. Synchronous (RMI or IDL): q Availability, Response Time, Decoupling JMS Publish & Subscribe: Off-line Notification RMI IIOP vs. JRMP vs. Java IDL: q Standard Cross-language Calls or Full Java Functionality JTS: Distributed Integrity, Lockstep Actions SWEA 58

Comments on Architectural Styles m CSE 5095 m m m Architectural Styles Provide Patterns Comments on Architectural Styles m CSE 5095 m m m Architectural Styles Provide Patterns q Suppose Designing a New System q During Requirements Discovery, Behavior and Structure of System Will Emerge q Attempt to Match to Architectural Style q Modify, Extend Style as Needed By Choosing Existing Architectural Style q Know Advantages and Disadvantages q Ability to Focus in on Problem Areas and Bottlenecks q Can Adjust Architecture Accordingly Architectures Range from Large Scale to Small Scale in their Applicability We’ll see Examples for BMI Shortly … SWEA 59

Other Issues in Software Architectures m CSE 5095 m m m Consider a Set Other Issues in Software Architectures m CSE 5095 m m m Consider a Set of Applications q New Software q Legacy, COTS, Databases, etc. A Distributed Application is a Set of Applications Deployed Over a Network that Communicate Relationship Between Applications Different Implementations of “Same” Application on Different Hardware Platforms Configuration of Various Hardware Nodes Different Node Types in the Network Issue: q What is the ‘Best’ Way to Deploy Applications Across the Network of Available Resources? SWEA 60

Distributed Application & Hardware Nodes CSE 5095 m Computers & Connections May have Different Distributed Application & Hardware Nodes CSE 5095 m Computers & Connections May have Different Characteristics that Affect their Usage q Speed q Storage q Bandwidth SWEA 61

Objective: ‘Best’ Deployment m CSE 5095 m m A Distributed System is Optimally Deployed Objective: ‘Best’ Deployment m CSE 5095 m m A Distributed System is Optimally Deployed if it Yields the Best Performance: Efficient Use of Resources via Throughput, Response Time, or Number of Messages What are Implications in BMI? q Need to Bring Together Multiple Assets q Work Efficiently Across Network q Unifying Clinical Research Repositories SWEA 62

Distr. Systems: Combo of Requirements CSE 5095 interaction patterns software elements hardware elements Specification Distr. Systems: Combo of Requirements CSE 5095 interaction patterns software elements hardware elements Specification interfaces connections protocols SWEA 63

Deployment Influenced by Many Factors CSE 5095 algorithms software architecture underlying network replication degree Deployment Influenced by Many Factors CSE 5095 algorithms software architecture underlying network replication degree Performance processing nodes usage patterns middleware deployment SWEA 64

Framework for Design and Deployment CSE 5095 SOFTWARE HARDWARE Dependencies Deployment PERFORMANCE SWEA 65 Framework for Design and Deployment CSE 5095 SOFTWARE HARDWARE Dependencies Deployment PERFORMANCE SWEA 65

What is I 5? m CSE 5095 m m Five Definition Languages q Interface What is I 5? m CSE 5095 m m Five Definition Languages q Interface q Inheritance q Implementation q Instantiation q Installation Five Formal Integrated Graphical Languages Based on UML’s Implementation Diagrams The Application, Network, Dependencies and the Deployment are Part of an Integrated Framework SWEA 66

The Five Levels of I 5 m Implementation (I 2) - Classes of Components, The Five Levels of I 5 m Implementation (I 2) - Classes of Components, Nodes and Connectors m Abstraction Interface (I 1) - Types of Components, Nodes and Connectors Integration (I 3) - Dependencies Between Component and Node Classes m Instantiation (I 4) - Instances of Each Class Definition m Installation (I 5) - Deployment of Each Instance (Requirements and Complete Deployment) Detail m CSE 5095 SWEA 67

Levels of Specification in I 5 CSE 5095 m Types - Generic Definition of Levels of Specification in I 5 CSE 5095 m Types - Generic Definition of Components, Nodes, and Connectors According to Their Role q Defined in I 1 q Used in I 2 to Define Classes m Classes - Different Implementations of the Types q Defined in I 2 q Used in I 3 to Associate Software Components and Hardware Artifacts and I 4 to Define Instances m Instances - Identical Copies of the Different Classes q Defined in I 4 q Used in I 5 to Deploy Instances Across Nodes SWEA 68

UML m CSE 5095 m UML is a Set of Graphical Specification Languages (OMG’s UML m CSE 5095 m UML is a Set of Graphical Specification Languages (OMG’s Standard Design Language Since November, 1997) Implementation Diagrams q Component Diagrams: Ø Show the Physical Structure of the Code in Terms of Code Components and Their Dependencies q Deployment Diagrams: Ø Show the Physical Architecture of the Hardware and Software in the System. Ø They Have a Type and an Instance Version. SWEA 69

UML m CSE m 5095 When to Use Deployment Diagrams “… In practice, I UML m CSE m 5095 When to Use Deployment Diagrams “… In practice, I haven’t seen this kind of diagram used much. Most people do draw diagrams to show this kind of information but they are informal cartoons. On the whole, I don’t have a problem with that since each system has its own physical characteristics that your want to emphasize. As we wrestle more and more with distributed systems, however, I’m sure we will require more formality as we understand better which issues need to be highlighted in deployment diagrams. ” q From “UML Distilled. Applying the Standard Object Modeling Language”, by Martin Fowler. Addison-Wesley, Object Technology Series, 7 th. Reprint June, 1998. SWEA 70

Pros and Cons of Graphical Modeling CSE 5095 m Advantages: q q q m Pros and Cons of Graphical Modeling CSE 5095 m Advantages: q q q m Clear to Show Structure Excellent Communication Vehicle Addresses Different Aspects of Modeling in an Integrated Fashion m Disadvantages: q q q Shows Little (or No) Details There is a Big Gap Between Specification and Implementation Limited by Screen Size & Printable Page Solution: Associate a Complete Textual Specification to Graphical Model that Contains the Necessary Details for Each Element SWEA 71

Design Concepts m CSE 5095 m m m Interface Interaction With the Outer World Design Concepts m CSE 5095 m m m Interface Interaction With the Outer World Signature + Requested Services Type: Abstract Entity - Interface + Semantics Subtype: Inherits the Supertype Definition Class: Implementation of a Type Realization: Relation Between a Type and a Class That Implements It Subclass: Inherits the Superclass Implementation Instance: Element of a Class SWEA 72

The I 5 Framework m CSE 5095 m m m An Integrated Specification Framework The I 5 Framework m CSE 5095 m m m An Integrated Specification Framework for Distributed Systems q Support for the Architectural Specification of OO and Component Based Distributed Systems q Heterogeneous Network - Platforms A Five Level Framework for Defining Software and Hardware (Platforms) With a Uniform Notation and With Different Levels of Abstraction Specified Textually in Z or Graphically in UML q Emphasis on Implementation Diagrams Please See http: //www. engr. uconn. edu/~cecilia SWEA 73

Dependencies Between Levels CSE 5095 Component Types Node Types INTERFACE Component Classes Node Classes Dependencies Between Levels CSE 5095 Component Types Node Types INTERFACE Component Classes Node Classes IMPLEMENTATION Implementation Dependencies Inst. Components INTEGRATION Inst. Nodes System Instantiation Installation Req. (together, separated) INSTANTIATION Installation Req. (fix location) Complete Installation INSTALLATION SWEA 74

Interface - Software: I 1 S m CSE 5095 Components Types q q Type Interface - Software: I 1 S m CSE 5095 Components Types q q Type Supertypes Associated Interfaces Calls m Properties q q q Types are Unique Supertypes Must Be Part of I 1 S Calls Must Be Satisfied in I 1 S SWEA 75

Interface - Software: I 1 S CSE 5095 response Client <<call>> request receive Front. Interface - Software: I 1 S CSE 5095 response Client <> request receive Front. End <> Replica receive gossip <> SWEA 76

Interface - Hardware: I 1 H m CSE 5095 m m Node Types Connector Interface - Hardware: I 1 H m CSE 5095 m m Node Types Connector Types Connections m Properties q All Node Types Must Be Connected q Only Node and Connector Types Defined Take Part in the Connections MPI Sockets SUN Intel Pentium SWEA 77

Implementation - Software: I 2 S m CSE 5095 Component Classes q Component Type Implementation - Software: I 2 S m CSE 5095 Component Classes q Component Type q Class q Superclasses q Calls to Classes Interfaces m Properties: q Only Types in I 1 S are Allowed q Superclasses Are Realizations of the Supertypes q Calls & Inheritance are Satisfied Within I 2 S SWEA 78

Implementation - Software: I 2 S CSE 5095 PCCtr. Cl response XCtr. Cl <<call>> Implementation - Software: I 2 S CSE 5095 PCCtr. Cl response XCtr. Cl <> request receive XFront. End <> Counter receive gossip <> SWEA 79

Implementation - Hardware: I 2 H CSE 5095 m m m Node Classes q Implementation - Hardware: I 2 H CSE 5095 m m m Node Classes q Node Type q Class Connector Classes q Type q Class Connections Between Node Classes m Properties q Node and Connector Classes Refine the Types in I 1 H q Connections are With Connector Classes That Refine Connector Types in I 1 H SWEA 80

Implementation - Hardware: I 2 H CSE 5095 MPI Sockets SUN <<realizes>> Intel Pentium Implementation - Hardware: I 2 H CSE 5095 MPI Sockets SUN <> Intel Pentium <> MPI_Impl CSockets SUN OS 4. 1. 4 Win 95 SWEA 81

Software and Hardware Integration: I 3 CSE 5095 m m Relation <<supports>> q Instances Software and Hardware Integration: I 3 CSE 5095 m m Relation <> q Instances of the Component Class May Run on Instances of the Node Class q Important Step Since it Constrains Deployment Options Properties q Only Node and Component Classes Defined in I 2 Can Participate of the <> Relation SWEA 82

Software and Hardware Integration: I 3 CSE 5095 response XCtr. Cl <<supports>> PCCtr. Cl Software and Hardware Integration: I 3 CSE 5095 response XCtr. Cl <> PCCtr. Cl response <> MPI_Impl request XFront. End CSockets <> Win 95 SUN OS 4. 1. 4 receive <> Counter receive gossip SWEA 83

Instantiation - Software: I 4 S CSE 5095 m Component Instances q Class q Instantiation - Software: I 4 S CSE 5095 m Component Instances q Class q Identification q Calls m Properties q Instance Calls Refine Class Calls q Only Classes in I 2 S May Be Instantiated SWEA 84

Instantiation - Software: I 4 S CSE 5095 request c 2: PCCtr. Cl c Instantiation - Software: I 4 S CSE 5095 request c 2: PCCtr. Cl c 3: PCCtr. Cl response request fe 2: XFront. End c 4: XCtr. Cl response receive gossip ct 1: Counter receive gossip fe 1: XFront. End c 1: PCCtr. Cl response receive ct 2: Counter receive gossip receive ct 3: Counter receive gossip ct 4: Counter receive gossip ct 5: Counter receive gossip ct 6: Counter SWEA 85

Instantiation - Hardware: I 4 H m CSE 5095 m Node Instances q Class Instantiation - Hardware: I 4 H m CSE 5095 m Node Instances q Class q Identification Connector Instances q Class q Identification q Set of Connected Nodes m Properties q There are Only Instances of the Node & Connector Classes Defined in I 2 H q Connectors Refine I 2 H Connections SWEA 86

Instantiation - Hardware: I 4 H CSE 5095 pc 1: Win 95 pc 2: Instantiation - Hardware: I 4 H CSE 5095 pc 1: Win 95 pc 2: Win 95 pc 3: Win 95 pc 4: Win 95 sock 1 sock 2 sock 3 sock 4 sun 1: Sun. OS 4. 1. 4 sun 2: Sun. OS 4. 1. 4 sun 3: Sun. OS 4. 1. 4 sun 4: Sun. OS 4. 1. 4 sun 5: Sun. OS 4. 1. 4 sun 9: Sun. OS 4. 1. 4 sun 10: Sun. OS 4. 1. 4 mpi 1 sun 6: Sun. OS 4. 1. 4 sun 7: Sun. OS 4. 1. 4 sun 8: Sun. OS 4. 1. 4 SWEA 87

Installation Requirements m CSE 5095 m m m A Set of Component Instances Must Installation Requirements m CSE 5095 m m m A Set of Component Instances Must Be Deployed Together or Separated Fix the Location of Some Component Instances All Installation Requirements Must Be Consistent With the Requirements Imposed by All the Previous Specification Levels Requirements q Together q Separated q Fix SWEA 88

Installation - Requirements: Ifix, Iseparated CSE 5095 receive fe 2: XFront. End fe 1: Installation - Requirements: Ifix, Iseparated CSE 5095 receive fe 2: XFront. End fe 1: XFront. End request sun 2: Sun. OS 4. 1. 4 request sun 3: Sun. OS 4. 1. 4 separated = {ct 1: Counter, ct 2: Counter, ct 3: Counter, ct 4: Counter, ct 5: Counter, ct 6: Counter} SWEA 89

Mapping Applications to Hardware m CSE m 5095 Applications (Left) and Hardware (Right) Instances Mapping Applications to Hardware m CSE m 5095 Applications (Left) and Hardware (Right) Instances Restrictions on q Which Applications can be Deployed on Which Hardware? q Which Applications Deployed Together? q Which Applications Must be Separate? SWEA 90

Objective: ‘Best” Optimal Deployment CSE 5095 SWEA 91 Objective: ‘Best” Optimal Deployment CSE 5095 SWEA 91

Using I 5 for BMI m CSE 5095 Focus at Architectural Level q Multiple Using I 5 for BMI m CSE 5095 Focus at Architectural Level q Multiple Assets to Bring Together Ø Hospital EMRs, Provider EMRs, Other Systems q q Multiple and Disparate Hardware Different Contexts and Needs Ø Clinical Practice – (Near) Real-Time Integration/Access Ø Clinical Research – De-Identified Integrated Repository m Performance will be Key Issue q Clinical Practice – Time of Access q Clinical Research – Volume of Information Ø Some Genomic Data Requires Terabytes of Data! Ø Information overload Possible SWEA 92

The Next Big Challenge m CSE 5095 m Macro-Architectures q System of Systems q The Next Big Challenge m CSE 5095 m Macro-Architectures q System of Systems q Application of Applications Involves Two Key Issues q Interoperability Ø Heterogeneous Distributed Databases Ø Heterogeneous Distributed Systems Ø Autonomous Applications q Scalability Ø Rapid and Continuous Growth Ø Amount of Data Ø Variety of Data Types Ø Different Privacy Levels or Ownerships of Data SWEA 93

Interoperability: A Classic View CSE 5095 Simple Federation FDB Global Schema 4 Federated Integration Interoperability: A Classic View CSE 5095 Simple Federation FDB Global Schema 4 Federated Integration Local Schema Multiple Nested Federation Federated Integration Local Schema FDB 1 Local Schema Federation FDB 3 Federation SWEA 94

What is CORBA? m CSE m 5095 Differs from Typical Programming Languages Objects can What is CORBA? m CSE m 5095 Differs from Typical Programming Languages Objects can be … q Located Throughout Network q Interoperate with Objects on other Platforms q Written in Ant PLs for which there is mapping from IDL to that Language SWEA 95

What is CORBA? m CSE m 5095 Allow Interactions from Client to Server CORBA What is CORBA? m CSE m 5095 Allow Interactions from Client to Server CORBA Installed on All Participating Machines SWEA 96

CORBA-Based Development CSE 5095 IDL file Client Application IDL Compiler Stub ORB/IIOP Object Implementation CORBA-Based Development CSE 5095 IDL file Client Application IDL Compiler Stub ORB/IIOP Object Implementation IDL Compiler Skeleton ORB/IIOP SWEA 97

Database Interoperability in the Internet m CSE 5095 m Technology q Web/HTTP, JDBC/ODBC, CORBA Database Interoperability in the Internet m CSE 5095 m Technology q Web/HTTP, JDBC/ODBC, CORBA (ORBs + IIOP), XML Architecture Information Broker • Mediator-Based Systems • Agent-Based Systems SWEA 98

ORB Integration: Java Client + Legacy Application CSE 5095 Java Client Legacy Application Java ORB Integration: Java Client + Legacy Application CSE 5095 Java Client Legacy Application Java Wrapper Object Request Broker (ORB) CORBA is the Medium of Info. Exchange Requires Java/CORBA Capabilities SWEA 99

Java Client with Wrapper to Legacy Application CSE 5095 Java Client Java Application Code Java Client with Wrapper to Legacy Application CSE 5095 Java Client Java Application Code WRAPPER Mapping Classes JAVA LAYER NATIVE LAYER Native Functions (C++) RPC Client Stubs (C) Interactions Between Java Client and Legacy Appl. via C and RPC C is the Medium of Info. Exchange Java Client with C++/C Wrapper Legacy Application Network SWEA 100

COTS and Legacy Appls. to Java Clients CSE 5095 COTS Application Legacy Application Java COTS and Legacy Appls. to Java Clients CSE 5095 COTS Application Legacy Application Java Application Code Native Functions that Map to COTS Appl NATIVE LAYER Native Functions that Map to Legacy Appl NATIVE LAYER JAVA LAYER Mapping Classes JAVA NETWORK WRAPPER Network Java Client Java is Medium of Info. Exchange - C/C++ Appls with Java Wrappers SWEA 101

Java Client to Legacy App via RDBS CSE 5095 Transformed Legacy Data Java Client Java Client to Legacy App via RDBS CSE 5095 Transformed Legacy Data Java Client Updated Data Relational Database System(RDS) Extract and Generate Data Transform and Store Data Legacy Application SWEA 102

JDBC m CSE 5095 m JDBC API Provides DB Access Protocols for Open, Query, JDBC m CSE 5095 m JDBC API Provides DB Access Protocols for Open, Query, Close, etc. Different Drivers for Different DB Platforms JDBC API Java Application Driver Manager Driver Oracle Driver Access Driver Sybase SWEA 103

Connecting a DB to the Web CSE 5095 m DBMS m CGI Script Invocation Connecting a DB to the Web CSE 5095 m DBMS m CGI Script Invocation or JDBC Invocation Web Server Internet m Web Server are Stateless DB Interactions Tend to be Stateful Invoking a CGI Script on Each DB Interaction is Very Expensive, Mainly Due to the Cost of DB Open Browser SWEA 104

Connecting More Efficiently m CSE 5095 DBMS Helper Processes CGI Script or JDBC Invocation Connecting More Efficiently m CSE 5095 DBMS Helper Processes CGI Script or JDBC Invocation m Web Server Internet m To Avoid Cost of Opening Database, One can Use Helper Processes that Always Keep Database Open and Outlive Web Connection Newly Invoked CGI Scripts Connect to a Preexisting Helper Process System is Still Stateless Browser SWEA 105

DB-Internet Architecture CSE 5095 WWW Client (Netscape) WWW client (Info. Explore) WWW Client (Hot. DB-Internet Architecture CSE 5095 WWW Client (Netscape) WWW client (Info. Explore) WWW Client (Hot. Java) Internet HTTP Server DBWeb Gateway DBWeb Dispatcher DBWeb Gateway SWEA 106

Biomedical Architectures m CSE 5095 m Transcend Normal Two, Three, and Four Tier Solutions Biomedical Architectures m CSE 5095 m Transcend Normal Two, Three, and Four Tier Solutions – Macro-Architecture An Architecture of Architectures! q Need to Integrate Systems that are Themselves Multi-Tier and Distributed q Need to Resolve Data Ownership Issues Ø State of Connecticut Agencies Don’t Share Ø Competing Hospitals Seek to Protect Market Share q T 1, T 2, and Clinical Research Requires Ø Interoperating Genomic Databases/Supercomputers Ø Integration of De-identified Patient Data from Multiple Sources to Allow Sufficient Study Samples Ø De-identified Data Repositories or Data Marts q Dealing with Ownership Issues (DNA Research) SWEA 107

Consider Team Project Architecture Providers Patients CSE 5095 PHR EMR Web-Based Portal(XML + HL Consider Team Project Architecture Providers Patients CSE 5095 PHR EMR Web-Based Portal(XML + HL 7) Open Source DB (XML or My. SQL) Feedback Repository Clinical Researchers Education Materials SWEA 108

Internet and the Web m CSE 5095 A Major Opportunity for Business q A Internet and the Web m CSE 5095 A Major Opportunity for Business q A Global Marketplace Ø Business Across State and Country Boundaries q A Way of Extending Services Ø Online Payment vs. VISA, Mastercard q A Medium for Creation of New Services Ø Publishers, Travel Agents, Teller, Virtual Yellow Pages, Online Auctions … m m A Boon for Academia q Research Interactions and Collaborations q Free Software for Classroom/Research Usage q Opportunities for Exploration of Technologies in Student Projects What are Implications for BMI? Where is the Adv? SWEA 109

WWW: Three Market Segments Server CSE 5095 Business to Business Corporate Network q q WWW: Three Market Segments Server CSE 5095 Business to Business Corporate Network q q q Server Intranet q q Decision support Mfg. . System monitoring corporate repositories Workgroups Internet Corporate Server Network Internet q q Provider Network Information sharing Ordering info. /status Targeted electronic commerce Sales Marketing Information Services Server Provider Network Exposure to Outside SWEA 110

Information Delivery Problems on the Net m CSE 5095 m m m Everyone can Information Delivery Problems on the Net m CSE 5095 m m m Everyone can Publish Information on the Web Independently at Any Time q Consequently, there is an Information Explosion q Identifying Information Content More Difficult There are too Many Search Engines but too Few Capable of Returning High Quality Data Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changes What are Information Delivery Issues for BMI? q Publishing of Patient Education Materials q Publishing of Provider Education Materials q How Can Patients/Providers find what Need? q How do they Know if its Relevant? Reputable? SWEA 111

Example Web Applications CSE 5095 m m m Scenario 1: World Wide Wait q Example Web Applications CSE 5095 m m m Scenario 1: World Wide Wait q A Major Event is Underway and the Latest, Up-tothe Minute Results are Being Posted on the Web q You Want to Monitor the Results for this Important Event, so you Fire up your Trusty Web Browser, Pointing at the Result Posting Site, and Wait, and Wait … What is the Problem? q The Scalability Problems are the Result of a Mismatch Between the Data Access Characteristics of the Application and the Technology Used to Implement the Application May not be Relevant to BMI: Hard to Apply Scenario SWEA 112

Example Web Applications CSE 5095 m m m Scenario 2: q Many Applications Today Example Web Applications CSE 5095 m m m Scenario 2: q Many Applications Today have the Need for Tracking Changes in Local and Remote Data Sources and Notifying Changes If Some Condition Over the Data Source(s) is Met q To Monitor Changes on Web, You Need to Fire Your Trusty Web Browser from Time to Time, Cache the Most Recent Result, and Difference Manually Each Time You Poll the Data Source(s) Issue: Pure Pull is Not the Answer to All Problems BMI: If a Patient Enters Data that Sets off a Chain Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event) SWEA 113

What is the Problem? m CSE 5095 m Applications are Asymmetric but the Web What is the Problem? m CSE 5095 m Applications are Asymmetric but the Web is Not q Computation Centric vs. Information Flow Centric Type of Asymmetry q Network Asymmetry Ø Satellite, CATV, Mobile Clients, Etc. q Client to Server Ratio Ø Too Many Clients can Swamp Servers q Data Volume Ø Mouse and Key Click vs. Content Delivery q Update and Information Creation Ø Clients Need to be Informed or Must Poll m Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notification SWEA 114

What are Information Delivery Styles? m CSE 5095 m m Pull-Based System q Transfer What are Information Delivery Styles? m CSE 5095 m m Pull-Based System q Transfer of Data from Server to Client is Initiated by a Client Pull q Clients Determine when to Get Information q Potential for Information to be Old Unless Client Periodically Pulls Push-Based System q Transfer of Data from Server to Client is Initiated by a Server Push q Clients may get Overloaded if Push is Too Frequent Hybrid q Pull and Push Combined q Pull First and then Push Continually SWEA 115

Publish/Subscribe CSE 5095 m m m Semantics: Servers Publish/Clients Subscribe q Servers Publish Information Publish/Subscribe CSE 5095 m m m Semantics: Servers Publish/Clients Subscribe q Servers Publish Information Online q Clients Subscribe to the Information of Interest (Subscription-based Information Delivery) q Data Flow is Initiated by the Data Sources (Servers) and is Aperiodic q Danger: Subscriptions can Lead to Other Unwanted Subscriptions Applications q Unicast: Database Triggers and Active Databases q 1 -to-n: Online News Groups May work for Clinical Researcher to Provider Push SWEA 116

Design Options for Nodes m CSE 5095 Three Types of Nodes: q Data Sources Design Options for Nodes m CSE 5095 Three Types of Nodes: q Data Sources Ø Provide Base Data which is to be Disseminated q Clients Ø Who are the Net Consumers of the Information q Information Brokers Ø Acquire Information from Other Data Sources, Add Value to that Information and then Distribute this Information to Other Consumers Ø By Creating a Hierarchy of Brokers, Information Delivery can be Tailored to the Need of Many Users m Brokers may be Ideal Intermediaries for BMI! q Act on Behalf of Patients, Providers q Incorporate Secure Access SWEA 117

Research Challenges m CSE 5095 Ubiquitous/Pervasive Many computers and information appliances everywhere, networked together Research Challenges m CSE 5095 Ubiquitous/Pervasive Many computers and information appliances everywhere, networked together m Inherent Complexity: q Coping with Latency (Sometimes Unpredictable) q Failure Detection and Recovery (Partial Failure) q Concurrency, Load Balancing, Availability, Scale q Service Partitioning q Ordering of Distributed Events “Accidental” Complexity: q Heterogeneity: Beyond the Local Case: Platform, Protocol, Plus All Local Heterogeneity in Spades. q Autonomy: Change and Evolve Autonomously q Tool Deficiencies: Language Support (Sockets, rpc), Debugging, Etc. SWEA 118

Infosphere Problem: too many sources, too much information Internet: Information Jungle n tio a Infosphere Problem: too many sources, too much information Internet: Information Jungle n tio a pt a e rc Ad op er ty Mg ack Clean, Reliable, Timely Information, Anywhere mt Personalized Filtering & Info. Delivery Microfeedb Digital Earth Pr specialization ou s Re Infopipes Sensors Co ntin l. Q ua rie ue s Info rm atio n Q uali ty CSE 5095 SWEA 119

Current State-of-Art CSE 5095 Web Server Mainframe Database Server Thin Client SWEA 120 Current State-of-Art CSE 5095 Web Server Mainframe Database Server Thin Client SWEA 120

Infosphere Scenario – for BMI CSE 5095 Infotaps & Fat Clients Sensors Variety of Infosphere Scenario – for BMI CSE 5095 Infotaps & Fat Clients Sensors Variety of Servers Many sources Database Server SWEA 121

Heterogeneity and Autonomy m CSE 5095 Heterogeneity: q How Much can we Really Integrate? Heterogeneity and Autonomy m CSE 5095 Heterogeneity: q How Much can we Really Integrate? q Syntactic Integration Ø Different Formats and Models Ø Web/SQL Query Languages q Semantic Interoperability Ø Basic Research on Ontology, Etc m Autonomy q No Central DBA on the Net q Independent Evolution of Schema and Content q Interoperation is Voluntary q Interface Technology (Support for Isvs) Ø DCOM: Microsoft Standard Ø CORBA, Etc. . . SWEA 122

Security and Data Quality m CSE 5095 Security q System Security in the Broad Security and Data Quality m CSE 5095 Security q System Security in the Broad Sense q Attacks: Penetrations, Denial of Service q System (and Information) Survivability Ø Security Fault Tolerance Ø Replication for Performance, Availability, and Survivability m Data Quality q Web Data Quality Problems Ø Local Updates with Global Effects Ø Unchecked Redundancy (Mutual Copying) Ø Registration of Unchecked Information Ø Spam on the Rise SWEA 123

Legacy Data Challenge m CSE 5095 m Legacy Applications and Data q Definition: Important Legacy Data Challenge m CSE 5095 m Legacy Applications and Data q Definition: Important and Difficult to Replace q Typically, Mainframe Mission Critical Code q Most are OLTP and Database Applications Evolution of Legacy Databases q Client-server Architectures q Wrappers q Expensive and Gradual in Any Case SWEA 124

Potential Value Added/Jumping on Bandwagon m CSE 5095 m m Sophisticated Query Capability q Potential Value Added/Jumping on Bandwagon m CSE 5095 m m Sophisticated Query Capability q Combining SQL with Keyword Queries Consistent Updates q Atomic Transactions and Beyond But Everything has to be in a Database! q Only If we Stick with Classic DB Assumptions Relaxing DB Assumptions q Interoperable Query Processing q Extended Transaction Updates Commodities DB Software q A Little Help is Still Good If it is Cheap q Internet Facilitates Software Distribution q Databases as Middleware SWEA 125

Data Warehousing and Data Mining m CSE 5095 m Data Warehousing q Provide Access Data Warehousing and Data Mining m CSE 5095 m Data Warehousing q Provide Access to Data for Complex Analysis, Knowledge Discovery, and Decision Making q Underlying Infrastructure in Support of Mining q Provides Means to Interact with Multiple DBs q OLAP (on-Line Analytical Processing) vs. OLTP Data Mining q Discovery of Information in a Vast Data Sets q Search for Patterns and Common Features based q Discover Information not Previously Known Ø Medical Records Accessible Nationwide Ø Research/Discover Cures for Rare Diseases q Relies on Knowledge Discovery in DBs (KDD) SWEA 126

Data Warehousing and OLAP m CSE 5095 m m A Data Warehouse q Database Data Warehousing and OLAP m CSE 5095 m m A Data Warehouse q Database is Maintained Separately from an Operational Database q “A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for Management’s Decision Making Process [W. H. Inmon]” OLAP (on-Line Analytical Processing) q Analysis of Complex Data in the Warehouse q Attempt to Attain “Value” through Analysis q Relies on Trained and Adept Skilled Knowledge Workers who Discover Information Data Mart q Organized Data for a Subset of an Organization q Establish De-Identified Marts for BMI Research SWEA 127

Building a Data Warehouse m CSE 5095 Option 1 q Leverage Existing Repositories q Building a Data Warehouse m CSE 5095 Option 1 q Leverage Existing Repositories q Collate and Collect q May Not Capture All Relevant Data m Option 2 q Start from Scratch q Utilize Underlying Corporate Data Corporate data warehouse Option 1: Consolidate Data Marts Option 2: Build from scratch Data Mart . . . Data Mart Corporate data SWEA 128

BMI – Partition/Excerpt Data Warehouse m CSE 5095 m Clinical and Epidemiological Research (and BMI – Partition/Excerpt Data Warehouse m CSE 5095 m Clinical and Epidemiological Research (and for T 2 and T 1) Each Study Submitted to Institutional Review Board (IRB) q For Human Subjects (Assess Risks, Protect Privacy) q See: http: //resadm. uchc. edu/hspo/irb/ To Satisfy IRB (and Privacy, Security, etc. ), Reverse Process to Create a Data Mart for each Approved Study q Export/Excerpt Study Data from Warehouse q May be Single or Multiple Sources BMI data warehouse Data Mart . . . Data Mart SWEA 129

Data Warehouse Characteristics m CSE m 5095 m m Utilizes a “Multi-Dimensional” Data Model Data Warehouse Characteristics m CSE m 5095 m m Utilizes a “Multi-Dimensional” Data Model Warehouse Comprised of q Store of Integrated Data from Multiple Sources q Processed into Multi-Dimensional Model Warehouse Supports of q Times Series and Trend Analysis q “Super-Excel” Integrated with DB Technologies Data is Less Volatile than Regular DB q Doesn’t Dramatically Change Over Time q Updates at Regular Intervals q Specific Refresh Policy Regarding Some Data SWEA 130

Three Tier Architecture CSE 5095 monitor External data sources OLAP Server integrator Summarization report Three Tier Architecture CSE 5095 monitor External data sources OLAP Server integrator Summarization report Operational databases Extraxt Transform Load Refresh serve Data Warehouse Query report Data mining metadata Data marts SWEA 131

Data Warehouse Design m CSE 5095 m m Most of Data Warehouses use a Data Warehouse Design m CSE 5095 m m Most of Data Warehouses use a Start Schema to Represent Multi-Dimensional Data Model Each Dimension is Represented by a Dimension Table that Provides its Multidimensional Coordinates and Stores Measures for those Coordinates A Fact Table Connects All Dimension Tables with a Multiple Join q Each Tuple in Fact Table Represents the Content of One Dimension q Each Tuple in the Fact Table Consists of a Pointer to Each of the Dimensional Tables q Links Between the Fact Table and the Dimensional Tables for a Shape Like a Star SWEA 132

What is a Multi-Dimensional Data Cube? m CSE 5095 m m m Representation of What is a Multi-Dimensional Data Cube? m CSE 5095 m m m Representation of Information in Two or More Dimensions Typical Two-Dimensional - Spreadsheet In Practice, to Track Trends or Conduct Analysis, Three or More Dimensions are Useful For BMI – Axes for Diagnosis, Drug, Subject Age SWEA 133

Multi-Dimensional Schemas m CSE 5095 m m m Supporting Multi-Dimensional Schemas Requires Two Types Multi-Dimensional Schemas m CSE 5095 m m m Supporting Multi-Dimensional Schemas Requires Two Types of Tables: q Dimension Table: Tuples of Attributes for Each Dimension q Fact Table: Measured/Observed Variables with Pointers into Dimension Table Star Schema q Characterizes Data Cubes by having a Single Fact Table for Each Dimension Snowflake Schema q Dimension Tables from Star Schema are Organized into Hierarchy via Normalization Both Represent Storage Structures for Cubes SWEA 134

Example of Star Schema CSE 5095 Product Date Month Year Sale Fact Table Date Example of Star Schema CSE 5095 Product Date Month Year Sale Fact Table Date Product. No Prod. Name Prod. Desc Categoryu Product Store Customer Store. ID City State Country Region Unit_Sales Dollar_Sales Customer Cust. ID Cust. Name Cust. City Cust. Country SWEA 135

Example of Star Schema for BMI CSE 5095 Vitals Date Month Year Patient Fact Example of Star Schema for BMI CSE 5095 Vitals Date Month Year Patient Fact Table Visit Date BP Temp Resp HR (Pulse) Vitals Symptoms Patient Symptoms Pulmonary Heart Mus-Skel Skin Digestive Medications Etc. Patient. ID Patient. Name Patient. City Patient. Country Reference another Star Schema for all Meds SWEA 136

A Second Example of Star Schema … CSE 5095 SWEA 137 A Second Example of Star Schema … CSE 5095 SWEA 137

and Corresponding Snowflake Schema CSE 5095 SWEA 138 and Corresponding Snowflake Schema CSE 5095 SWEA 138

Data Warehouse Issues m CSE 5095 m Data Acquisition q Extraction from Heterogeneous Sources Data Warehouse Issues m CSE 5095 m Data Acquisition q Extraction from Heterogeneous Sources q Reformatted into Warehouse Context - Names, Meanings, Data Domains Must be Consistent q Data Cleaning for Validity and Quality is the Data as Expected w. r. t. Content? Value? q Transition of Data into Data Model of Warehouse q Loading of Data into the Warehouse Other Issues Include: q How Current is the Data? Frequency of Update? q Availability of Warehouse? Dependencies of Data? q Distribution, Replication, and Partitioning Needs? q Loading Time (Clean, Format, Copy, Transmit, Index Creation, etc. )? q For CTSA – Data Ownership (Competing Hosps). SWEA 139

Knowledge Discovery m CSE 5095 m m Data Warehousing Requires Knowledge Discovery to Organize/Extract Knowledge Discovery m CSE 5095 m m Data Warehousing Requires Knowledge Discovery to Organize/Extract Information Meaningfully Knowledge Discovery q Technology to Extract Interesting Knowledge (Rules, Patterns, Regularities, Constraints) from a Vast Data Set q Process of Non-trivial Extraction of Implicit, Previously Unknown, and Potentially Useful Information from Large Collection of Data Mining q A Critical Step in the Knowledge Discovery Process q Extracts Implicit Information from Large Data Set SWEA 140

Steps in a KDD Process m CSE m 5095 m m m m Learning Steps in a KDD Process m CSE m 5095 m m m m Learning the Application Domain (goals) Gathering and Integrating Data Cleaning Data Integration Data Transformation/Consolidation Data Mining q Choosing the Mining Method(s) and Algorithm(s) q Mining: Search for Patterns or Rules of Interest Analysis and Evaluation of the Mining Results Use of Discovered Knowledge in Decision Making Important Caveats q This is Not an Automated Process! q Requires Significant Human Interaction! SWEA 141

OLAP Strategies m CSE 5095 m OLAP Strategies q Roll-Up: Summarization of Data q OLAP Strategies m CSE 5095 m OLAP Strategies q Roll-Up: Summarization of Data q Drill-Down: from the General to Specific (Details) q Pivot: Cross Tabulate the Data Cubes q Slide and Dice: Projection Operations Across Dimensions q Sorting: Ordering Result Sets q Selection: Access by Value or Value Range Implementation Issues q Persistent with Infrequent Updates (Loading) q Optimization for Performance on Queries is More Complex - Across Multi-Dimensional Cubes q Recovery Less Critical - Mostly Read Only q Temporal Aspects of Data (Versions) Important SWEA 142

On-Line Analytical Processing m CSE 5095 m Data Cube q A Multidimensonal Array q On-Line Analytical Processing m CSE 5095 m Data Cube q A Multidimensonal Array q Each Attribute is a Dimension In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Date Product Store acron Rolla, MO 7/3/99 budwiser LA, CA 5/22/99 833. 92 large pants NY, NY 2/12/99 771. 24 3’ diaper Date Cuba, MO 7/30/99 Sale 325. 24 Pants Diapers Beer Nuts West East 81. 99 Region Central Mountain South Jan Feb March April Date SWEA 143

On-Line Analytical Processing m CSE 5095 For BMI – Imagine a Data Table with On-Line Analytical Processing m CSE 5095 For BMI – Imagine a Data Table with Patient Data q Define Axis q Summarize Data q Create Perspective to Match Research Goal q Essentially De-identified Data Mart Medication Patient Med Birth. Dat Dosage Steve Lipitor 1/1/45 John Zocor 2/2/55 80 mg Harry Crestor 3/3/65 5 mg Lois Lipitor 4/4/66 20 mg Charles Crestor 7/1/59 10 mg Lescol Crestor Zocor Lipitor 10 mg 5 10 Dosage 20 40 80 1940 s 1950 s 1960 s 1970 s Decade SWEA 144

Examples of Data Mining The Slicing Action q A Vertical or Horizontal Slice Across Examples of Data Mining The Slicing Action q A Vertical or Horizontal Slice Across Entire Cube m CSE 5095 s Months Slice on city Atlanta Products Sales Ci tie Months Multi-Dimensional Data Cube SWEA 145

Examples of Data Mining The Dicing Action q A Slide First Identifies on Dimension Examples of Data Mining The Dicing Action q A Slide First Identifies on Dimension q A Selection of Any Cube within the Slice which Essentially Constrains All Three Dimensions m CSE 5095 Months Products Sales At lan ta Products Sales Ci tie s Months Electronics March 2000 Atlanta Dice on Electronics and Atlanta SWEA 146

Examples of Data Mining Drill down on Q 1 Q 2 Q 3 Q Examples of Data Mining Drill down on Q 1 Q 2 Q 3 Q 4 Roll Up on Location (State, USA) Roll Up: Combines Multiple Dimensions From Individual Cities to State Ca Ari li zo Ge forn na or ia Io gia wa Q 1 Q 2 Q 3 Q 4 Products Sales C A Ga olu tlan m t Sa ines bu a va vil s nn le ah Jan Feb March Products Sales Cit Location (city, GA) ies Drill Down - Takes a Facet (e. g. , Q 1) and Decomposes into Finer Detail CSE 5095 SWEA 147

Mining Other Types of Data m CSE m 5095 Analysis and Access Dramatically More Mining Other Types of Data m CSE m 5095 Analysis and Access Dramatically More Complicated! Time Series Data for Glucose, BP, Peak Flow, etc. Spatial databases Multimedia databases World Wide Web Time series data Geographical and Satellite Data SWEA 148

Advantages/Objectives of Data Mining m CSE 5095 m m Descriptive Mining q Discover and Advantages/Objectives of Data Mining m CSE 5095 m m Descriptive Mining q Discover and Describe General Properties q 60% People who buy Beer on Friday also have Bought Nuts or Chips in the Past Three Months Predictive Mining q Infer Interesting Properties based on Available Data q People who Buy Beer on Friday usually also Buy Nuts or Chips Result of Mining q Order from Chaos q Mining Large Data Sets in Multiple Dimensions Allows Businesses, Individuals, etc. to Learn about Trends, Behavior, etc. q Impact on Marketing Strateg SWEA 149

Data Mining Methods (1) m CSE 5095 Association q Discover the Frequency of Items Data Mining Methods (1) m CSE 5095 Association q Discover the Frequency of Items Occurring Together in a Transaction or an Event q Example Ø 80% Customers who Buy Milk also Buy Bread Hence - Bread and Milk Adjacent in Supermarket Ø 50% of Customers Forget to Buy Milk/Soda/Drinks Hence - Available at Register m Prediction q Predicts Some Unknown or Missing Information based on Available Data q Example Ø Forecast Sale Value of Electronic Products for Next Quarter via Available Data from Past Three Quarters SWEA 150

Association Rules m CSE m 5095 m m Motivated by Market Analysis Rules of Association Rules m CSE m 5095 m m Motivated by Market Analysis Rules of the Form q Item 1^Item 2^…^ Itemk+1 ^ … ^ Itemn Example q “Beer ^ Soft Drink Pop Corn” Problem: Discovering All Interesting Association Rules in a Large Database is Difficult! q Issues Ø Interestingness Ø Completeness Ø Efficiency q Basic Measurement for Association Rules Ø Support of the Rule Ø Confidence of the Rule SWEA 151

Data Mining Methods (2) m CSE 5095 Classification q Determine the Class or Category Data Mining Methods (2) m CSE 5095 Classification q Determine the Class or Category of an Object based on its Properties q Example Ø Classify Companies based on the Final Sale Results in the Past Quarter m Clustering q Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity q Example Ø Group Crime Locations to Find Distribution Patterns SWEA 152

Classification m CSE 5095 m m Two Stages q Learning Stage: Construction of a Classification m CSE 5095 m m Two Stages q Learning Stage: Construction of a Classification Function or Model q Classification Stage: Predication of Classes of Objects Using the Function or Model Tools for Classification q Decision Tree q Bayesian Network q Neural Network q Regression Problem q Given a Set of Objects whose Classes are Known (Training Set), Derive a Classification Model which can Correctly Classify Future Objects SWEA 153

An Example m Attributes m Class Attribute - Play/Don’t Play the Game Training Set An Example m Attributes m Class Attribute - Play/Don’t Play the Game Training Set q Values that Set the Condition for the Classification q What are the Pattern Below? CSE 5095 m Attribute Possible Values outlook sunny, overcast, rain temperature continuous humidity continuous windy true, false Outlook Temperature Humidity sunny 85 85 overcast 83 78 sunny 80 90 sunny 72 95 sunny 72 70 … … … Windy false true false … Play No Yes No No Yes. . . SWEA 154

Data Mining Methods (3) m CSE 5095 Summarization q Characterization (Summarization) of General Features Data Mining Methods (3) m CSE 5095 Summarization q Characterization (Summarization) of General Features of Objects in the Target Class q Example Ø Characterize People’s Buying Patterns on the Weekend Ø Potential Impact on “Sale Items” & “When Sales Start” Ø Department Stores with Bonus Coupons m Discrimination q Comparison of General Features of Objects Between a Target Class and a Contrasting Class q Example Ø Comparing Students in Engineering and in Art Ø Attempt to Arrive at Commonalities/Differences SWEA 155

Summarization Technique m CSE m 5095 Attribute-Oriented Induction Generalization using Concert hierarchy (Taxonomy) barcode Summarization Technique m CSE m 5095 Attribute-Oriented Induction Generalization using Concert hierarchy (Taxonomy) barcode category 14998 milk brand diaryland content size Skim 2 L food 12998 mechanical Motor. Craft valve 23 a 12 in … … . . . Milk … Skim milk … 2% milk Category milk … Content Count skim 2% … 280 98. . . bread White whole bread … wheat Lucern … Dairyland Wonder … Safeway SWEA 156

Why is Data Mining Popular? m CSE 5095 Technology Push q Technology for Collecting Why is Data Mining Popular? m CSE 5095 Technology Push q Technology for Collecting Large Quantity of Data Ø Bar Code, Scanners, Satellites, Cameras q Technology for Storing Large Collection of Data Ø Databases, Data Warehouses Ø Variety of Data Repositories, such as Virtual Worlds, Digital Media, World Wide Web m m Corporations want to Improve Direct Marketing and Promotions - Driving Technology Advances q Targeted Marketing by Age, Region, Income, etc. q Exploiting User Preferences/Customized Shopping What is Potential for BMI? q How do you see Data Mining Utilized? q What are Key Issues to Worry About? SWEA 157

Requirements & Challenges in Data Mining m CSE 5095 m m m Security and Requirements & Challenges in Data Mining m CSE 5095 m m m Security and Social q What Information is Available to Mine? q Preferences via Store Cards/Web Purchases q What is Your Comfort Level with Trends? User Interfaces and Visualization q What Tools Must be Provided for End Users of Data Mining Systems? q How are Results for Multi-Dimensional Data Displayed? Performance Guarantees q Range from Real-Time for Some Queries to Long. Term for Other Queries Data Sources of Complex Data Types or Unstructured Data - Ability to Format, Clean, and Load Data Sets SWEA 158

CSE 5095 An Initiative of the University of Connecticut Center for Public Health and CSE 5095 An Initiative of the University of Connecticut Center for Public Health and Health Policy Robert H. Aseltine, Jr. , Ph. D. Cal Collins January 16, 2008 SWEA 159

What is CHIN? m CSE 5095 m State of Connecticut Agencies Collect and Maintain What is CHIN? m CSE 5095 m State of Connecticut Agencies Collect and Maintain Data in Separate Databases such as: q Vital Statistics: Birth, Death (DPH) q Surveillance data: Lead Screening and Immunization Registries (DPH) q Administrative services: LINK system (DCF), CAMRIS (DMR) q Benefit programs: WIC (DPH), Medicaid (DSS) q Educational achievement: (PSIS) Such Data is Un-Integrated q Impossible to Track Assess Target Populations q Difficult to Develop Evidence-Based Practices q Limits Meaningful Interactions Among State Agencies SWEA 160

What Do We Mean by “Integration? ” UCONN Health Center Low Birth Weight Infant What Do We Mean by “Integration? ” UCONN Health Center Low Birth Weight Infant Registry Dept. of Mental Retardation Birth to Three System CT Dept. of Education PSIS System CSE 5095 Last Name First Name DOB SSN Birth Wt. (kg) Last Name First Name DOB Street Town Appel April 01/01/1 999 016 -000 -9876 2. 8 Allen Gwen 01/01/19 99 Apple Enfie Berry John 02/02/1 997 216 -000 -4576 2. 9 Buck Jerome 07/01/19 99 Burbank West Carat Colleen 03/03/1 993 119 -000 -1234 1. 9 Cleary Jane 03/03/19 93 Cedar Tolla Ernst Max 04/04/1 994 116 -000 -3456 2. 7 Dory Daniel 03/03/19 93 Dogfish Hartf Gomez Gloria 05/05/1 995 036 -000 -9999 2. 6 Ernst Max 04/04/19 94 Elm Enfie Hurst William 06/06/1 996 016 -000 -5599 3. 1 Friday Joe 11/03/19 99 Fruit Wind Keller Helene 07/07/1 997 017 -000 -2340 2. 5 Glenn Valerie 03/23/19 98 Glen Branf Martinez Pedro 08/08/1 998 018 -000 -9886 Martinez Pedro 08/08/19 98 High Hartf Rodriguez Felix 09/09/1 999 029 -000 -9111 Riley Lily 03/03/19 96 Ipswich Bridg Smith Peggy 10/10/2 000 016 -000 -8787 Sanchez Ramon 03/03/19 93 Juniper New 2. 8 2. 5 First Name CMT Math Polio Vac Date Days in Attendance Appel April 134 01/05/ 1999 179 Carat Colleen 256 05/01/ 1998 122 Cleary Jane 268 01/28/ 2000 178 Ernst Max 152 01/09/ 1999 145 Gomez 3. 0 Last Name Gloria 289 01/01/ 1999 168 Friday Joe 265 10/01/ 1999 170 Keller Helene 309 11/01/ 2001 180 Martinez Pedro 248 12/01/ 2003 180 Riley Lily 201 01/01/ 1999 122 Sanchez Ramon 249 01/01/ 1999 159 Last Name First Name DOB SSN Birth Wt. Street Town CMT Math Grade 3 Polio Vaccination Date Days in Attendance Ernst Max 04/04/1994 116 -000 -3456 2. 7 Elm Enfield 152 01/09/1999 145 Martinez Pedro 08/08/1998 018 -000 -9886 3. 0 High Hartford 248 12/01/2003 180 SWEA 161

Key Challenges to Integrating Data m CSE 5095 m m Security and Privacy q Key Challenges to Integrating Data m CSE 5095 m m Security and Privacy q HIPAA q FERPA q WIC, Social Security (Medicaid/Medicare) regulations q State statutes Alteration/disruption of business practices Unique identification of individuals/cases Accuracy and reliability of data Disparate hardware/software platforms SWEA 162

Key Challenges to Integrating Data m CSE 5095 m m Security and Privacy q Key Challenges to Integrating Data m CSE 5095 m m Security and Privacy q HIPAA q FERPA q WIC, Social Security (Medicaid/Medicare) regulations q State statutes Alteration/disruption of business practices Unique identification of individuals/cases Accuracy and reliability of data Disparate hardware/software platforms SWEA 163

The Solution: CHIN m CSE m 5095 m Connecticut Health Information Network A Federated The Solution: CHIN m CSE m 5095 m Connecticut Health Information Network A Federated Network That: q Allows Shared Access to “Health”-related Data From Heterogeneous Databases q Allows Agencies to Retain Complete Control Over Access to Data q Has Minimal Impact on Business Practices q Complies with Security and Privacy Statutes q Incorporates Cutting-edge Approaches to Case Matching Partnership of: q Early Partners: DPH, DCF, DDS, Do. E, DOIT, UConn, Akaza Research SWEA 164

CHIN Processes and Components Map data elements to source database Publish “metadata” to CHIN CHIN Processes and Components Map data elements to source database Publish “metadata” to CHIN with security and privacy rules CHIN Metadata Registry CSE 5095 Define data elements in CHIN Contributor CHIN Metadata Registry and CHIN Trusted Broker Query Execution: Identifier Matching and Data Merge CHIN GRID and Trusted Broker Review Committee Approval Build Query CHIN Enterprise Administration CHIN Metadata Registry and CHIN Query Builder De-identify Data CHIN Trusted Broker and De-Identification Engine Integrated, De-identified Data SWEA 165

Original CHIN Architecture CSE 5095 http: //publichealth. uconn. edu/CHIN. php SWEA 166 Original CHIN Architecture CSE 5095 http: //publichealth. uconn. edu/CHIN. php SWEA 166

Second CHIN Architecture: User Side CSE 5095 A & A Contributor SWEA 167 Second CHIN Architecture: User Side CSE 5095 A & A Contributor SWEA 167

Second CHIN Architecture: Contributor Side CSE 5095 A & A Front End Trusted Broker Second CHIN Architecture: Contributor Side CSE 5095 A & A Front End Trusted Broker SWEA 168

Current CHIN Architecture CSE 5095 SWEA 169 Current CHIN Architecture CSE 5095 SWEA 169

CHIN Architecture: Standards-based m CSE 5095 m All data is mapped to Health Level CHIN Architecture: Standards-based m CSE 5095 m All data is mapped to Health Level Seven’s Clinical Document Architecture (CDA) in XML q Health Level Seven (HL 7), is an ANSI-approved Standards Developing Organization q HL 7 has its own XML Special Interest Group, responsible for developing XML implementations of its standards in XML q HL 7 is also an active participant in W 3 C, the organization responsible for the development of XML q CDA was approved as an ANSI standard in November of 2000. Component Architecture communicates via Web Services and OGSA Grid standards SWEA 170

CHIN Arch. : Proven, Open Components m CSE 5095 Components are based on open-source CHIN Arch. : Proven, Open Components m CSE 5095 Components are based on open-source libraries q The grid-based servers Mako and Virtual Mako are part of the Mobius Project from Ohio State University’s Dept. of Bio. Informatics q The translation tools to get data into XML are provided by the XQuare and XBridge projects, hosted on the Object. Web website, an open source middleware community q The algorithm and code for identity management is FEBRL, Freely Extensible Biomedical Record Linkage, which was developed at Australian National University q Nu. SOAP Web Services Engine for component integration SWEA 171

FEBRL m CSE m 5095 m m m Identifier matching in FEBRL proceeds in FEBRL m CSE m 5095 m m m Identifier matching in FEBRL proceeds in four steps: Data cleansing and standardization q Removes, to the degree possible, string discrepancies based on common misspellings, extra white space, or misplaced name or address components. Indexing q Reduces the size of the number of record comparisons which must be performed for scalability; blocking, sorting, and bigram indexing methods are all supported. Record comparison q Conducted using an arbitrary composition of exact or inexact string comparison methods over any combination of fields Classification. q Follows the Felligi-Sunter 34 model, with records pairs assigned a weight based on a pallet of probabilities and matches determined based on the record pair weights SWEA 172

FEBRL m CSE 5095 m The current prototype uses FEBRL to implement a simplistic FEBRL m CSE 5095 m The current prototype uses FEBRL to implement a simplistic method of linkage whereby record pairs are declared a match if the first and last name are exactly equal. Next Steps q Evaluate the accuracy of linking records over a rubric of five data fields - first name, last name, date of birth, social security number, and gender. q Exact and inexact matching (ie misspellings and slight discrepancies), including experimental variations of the service based on the blinded bigram matching algorithm. q Assess false positives and false negatives produced by each palette of field comparison algorithms. q Evaluate the accuracy of linking records using fabricated data sets with characteristics similar to real datasets q Experiment with variations of canopy cluster matching algorithm. SWEA 173

Other CHIN Issues m CSE 5095 m Why Choose an Open Architecture? q Increased Other CHIN Issues m CSE 5095 m Why Choose an Open Architecture? q Increased Accountability q Plenty of Documentation and Research q Greater Transparency q Ease of Installation, Maintenance, Dissemination How is Data Ported into CHIN? q CHIN is based on a Grid, with each organization supporting its own data through a Contributor server q Agency staff has complete control over access to data on CHIN by other users q Only one server faces to the outside network SWEA 174

Creating a Contributor Server CSE 5095 ed ish ubl D to M R P Creating a Contributor Server CSE 5095 ed ish ubl D to M R P Data Elements Firewall L M e. X t era n Ge External IP Address Connection to CHIN Trusted Broker SS L Contributor Server Contains: XML generated files Mako service Java files m m *. xqy files XML files to generate CDA compliant files Datasource SWEA 175

Connecting to rest of Network CSE 5095 ed ish ubl D to M R Connecting to rest of Network CSE 5095 ed ish ubl D to M R P N HI o. C Data Elements c Ac L M e. X t era n Ge External IP Address Connection to • Metadata Registry takes information • CHIN Trusted About data elements • About data security • Broker information Datasource SS • Contributor profile is registered with L CHIN Network Admin t ss Firewall e Contributor Server Contains: XML generated files Mako service Java files m m *. xqy files XML files to generate CDA compliant files Datasource SWEA 176

How do we get data out? m CSE 5095 m The Trusted Broker component: How do we get data out? m CSE 5095 m The Trusted Broker component: q Pulls XML from the Virtual Mako which reaches out to all Contributors q Compares records from different Contributors using FEBRL q De-identifies data sets to generate a final data set for Investigators The Front End component: q Provides a central place for users to connect to the system q Connects to the Metadata Registry and the Trusted Broker via Web Services calls q Allows different users of the system to perform different actions SWEA 177

Getting Data from CHIN CSE 5095 SWEA 178 Getting Data from CHIN CSE 5095 SWEA 178

Getting Data From CHIN CSE 5095 XML Files • CHIN also contains: • A Getting Data From CHIN CSE 5095 XML Files • CHIN also contains: • A Front-end server to take queries • A Trusted Broker to compare data, perform record linkage, and de-identify results FEBRL Result Set Deidentify Final Result Set SWEA 179

Progress to Date m CSE m 5095 m m Needs assessment completed Technical and Progress to Date m CSE m 5095 m m Needs assessment completed Technical and functional specifications identified MOU’s with state agencies Expanding list of partners Prototype developed Funding for Model Network Development/Deployment /Evaluation 2008 SWEA 180

Demo CSE 5095 SWEA 181 Demo CSE 5095 SWEA 181

EMR Architectures m CSE 5095 Provider-Based Systems have Two Variants q All Data In EMR Architectures m CSE 5095 Provider-Based Systems have Two Variants q All Data In House Ø Larger Providers (Clinics) Ø Control All Own Data Ø Sizeable IT Staff for 24 -7 Operations Ø Control of Own Backups q Limited In House – Off Site Storage (Larger, Multi -Site Practices Ø Smaller Providers – Limited IT Staff Ø Desire Out-of-Box Solution Ø Local Data for Ease of Access Ø Remote Storage – Promotes Off-Hours Access q Even 1 st Variant – Service for “Backups” SWEA 182

EMR for Large Providers - All. Script CSE 5095 SWEA 183 EMR for Large Providers - All. Script CSE 5095 SWEA 183

EMR for Smaller Providers Provider’s Office Vendor’s Location Server/Data Farm CSE 5095 Local EMR EMR for Smaller Providers Provider’s Office Vendor’s Location Server/Data Farm CSE 5095 Local EMR Patient Data Remote EMR Remote Access SWEA 184

Integrating Clinical Repositories m CSE 5095 m Provider/Hospital Relationship q Provider has Privileges at Integrating Clinical Repositories m CSE 5095 m Provider/Hospital Relationship q Provider has Privileges at Hospital q Provider Chooses Office-Based EMR q More Easily Integrated with Hospital EMR q Emerging at Community Hospital Level Example: q Milford Hospital, MA q All Area Providers with Privileges Linked in q Ability to See Patient Records, Tests, at Hospital q Unclear on Uploads from Providers to Hospital q However, No Link to UMass Medical Center (of which Milford Hospital is Affiliated) SWEA 185

Integrating Clinical Repositories m CSE m 5095 m CTSA – Region Wide Clinical/Translational Research Integrating Clinical Repositories m CSE m 5095 m CTSA – Region Wide Clinical/Translational Research Target Area Hospitals q St. Francis, Hartford, Hosp. Central CT, CCMC q Each Hospital has Own Clinical Repository (EMR) For Wider-Scoped T 1, T 2, and Clinical Research q Need to Integrate these Repositories at Some Level q What is Most Practical? Ø Setting up Centralized De-Identified Repository? Ø Creating Data Marts as you go? Ø What are Pros and Cons of Each? q Researcher Seeking CHF Patient Data Needs to have De-Identified Data Mart SWEA 186

Integrating Clinical Repositories CSE 5095 SWEA 187 Integrating Clinical Repositories CSE 5095 SWEA 187

Integrating Clinical Repositories CSE 5095 SWEA 188 Integrating Clinical Repositories CSE 5095 SWEA 188

Integrating Clinical Repositories CSE 5095 SWEA 189 Integrating Clinical Repositories CSE 5095 SWEA 189

Integrating Clinical Repositories CSE 5095 NHIN Prototype Phase I SWEA 190 Integrating Clinical Repositories CSE 5095 NHIN Prototype Phase I SWEA 190

Integrating Clinical Repositories CSE 5095 NHIN Prototype Phase II SWEA 191 Integrating Clinical Repositories CSE 5095 NHIN Prototype Phase II SWEA 191

CSE 5095 SWEA 192 CSE 5095 SWEA 192

Personal Health Record Integration CSE 5095 SWEA 193 Personal Health Record Integration CSE 5095 SWEA 193

Concluding Remarks m CSE 5095 m Only Scratched Surface on Architectures q Micro Architectures Concluding Remarks m CSE 5095 m Only Scratched Surface on Architectures q Micro Architectures q Macro Architectures q Super-Macro Architectures (We’ll see …) What’s are Key Facets in the Discussion? q Role and Impact of Standards q Open Solutions q Architectural Variants – Reuse “Architecture” Ø Can we Reuse CHIN for Clinical Practice? Ø Are All Contributors Simply Each Hospital and EHR? Ø How do we Connect all of the Pieces? m What are Next Steps? q Let’s Review Some other Work q Source: Wide Range of Presentations on Web SWEA 194