
d8875e5a7ba3cf2bf0e37dd380660cc9.ppt
- Количество слайдов: 194
Software and Enterprise Architectures CSE 5095 Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-255 Storrs, CT 06269 -2155 steve@engr. uconn. edu http: //www. engr. uconn. edu/~stev e (860) 486 - 4818 Copyright © 2008 by S. Demurjian, Storrs, CT. SWEA 1
Software Architectures m CSE m 5095 m m m Emerging Discipline in Mid-1990 s Software as Collection of Interacting Components What are Local Interactions (within Component)? What are Global Interactions (between Components)? Advantages of SW Architectural Design q Understand Communication/Synchronization q Definition of Database Requirements q Identification of Performance/Scaling Issues q Detailing of Security Needs and Constraints Towards Large-Scale Software Development For Biomedical Informatics: q What are Architectures for Data Sharing? q How is Interoperability Facilitated? SWEA 2
Concepts of Software Architectures m CSE 5095 m m m Exceed Traditional Algorithm/Data Structure Perspective Emphasize Componentwise Organization and System Functionality Focus on Global and Local Interactions Identify Communication/Synchronization Requirements Define Database Needs and Dependencies Consider Performance/Scaling Issues Understand Potential Evolution Dimensions SWEA 3
The HTSS Software Architecture SDO CSE 5095 IL IL IL EDO SDO EDO Payment CR CR IL: CR: IC: DO: Item Locator Cash Register Invent. Control Deli Orderer for Shopper/Employee Item IC Order IC Non-Local Client Int. Credit. Card. DB Inventory Control Item. DB Global Server Item. DB Local Server ATM-Ban. KDB Order. DB Supplier. DB SWEA 4
Multiple Backend Database System (MBDS) CSE 5095 Backend Database Processor Database Controller Backend Database Processor Host/User Backend Database Processor SWEA 5
The MBDS Processes CSE 5095 Database Controller Request Preparation Post Processing Put Msg. Get Msg. Put Msg. Directory Management Record Processing Concurrency Control Backend Database Processor Disk I/O SWEA 6
Multiple Processes in MBDS CSE 5095 No. 1 2 3 4 6 12 15 16 21 22 23 Type New Request Results of Request Number of Reqs in Transaction Aggregate Operators (Sum, etc. ) Parsed Request to Backends Backend Aggregate Operator Results Ids for Accessing Database Indexes Request and Disk Addresses Ids for Accessing Database Records Locks Obtained: Okay to Execute Request ID of Finished Request SRC Host Po. Pr Req. P Rec. P DM DM DM CC Rec. P DST Req. P Host Po. Pr DMs Rec. P CC SWEA 7
Message Passing in MBDS CSE 5095 F 15 From Other Backend A 1 Request Preparation D 6 Put Msg. B 3 C 4 K 12 Post Processing K 12 Get Msg. E 15 To Backend(s) Get Msg. Put Msg. D 6, F 15 E 15 Directory Management G 21 K 12 H 22 Record Processing I 16 Concurrency Control J 23 Disk I/O SWEA 8
Software Design Levels m CSE 5095 m m m Architecturally: q Modules q Interconnections Among Modules q Decomposition into Subsystems Code: q Algorithms/Data Structures q Tasking/Control Threads Executable: q Memory Management q Runtime Environment Is this a Realistic/Accurate View? q Yes for a Single “Application” q What about Application of Applications? q System of Systems? SWEA 9
Software Engineering - an Oxymoron? m CSE m 5095 m m m Is there any Engineering? Is there any Science? Collection of Disparate Techniques: q Data-Flow Diagrams q E-R Diagrams q Finite State Machines q Petri Nets q UML Class, Object, Sequence, Etc. q Design Patterns q Model Drive Architectures What is being “Engineered”? How do we Know we are Done? q E. g. Does Artifact Match Specification? SWEA 10
What's Available for Engineering Software? m CSE m 5095 m m m Specification (Abstract Models, Algebraic Semantics) Software Structure (Bundling Representation with Algorithms) Languages Issues (Models, Scope, User-Defined Types) Information Hiding (Protect Integrity of Information) Integrity Constraints (Invariants of Data Structures) Is this up to date? What else can be Added to List? q Design Patters q Model Driven Architectures q XML –Data Modeling and Dependencies q Others? SWEA 11
Engineering Success in Computing m CSE 5095 m m Compilers Have Had Great Success q Originally by Hand q Then Compilers q Parser Generators - Lex/Yacc Solid Science Behind Compilers q Regular, Context Free, Context Sensitive Languages q FSAs, PDAs, CFGs, etc. Science has Provided Engineering Success re. Ease and Accuracy of Modern Compiler Writing SWEA 12
History of Programming m CSE 5095 m m C - Still Remains Industry Stronghorse q Separate Compilation q Decomposition of System into Subsystems, etc. q Shared Declarations q ADTs in C, But Compiler won't Enforce Them Modula-II and Ada 83 Had q Information Hiding q Public/Private Paradigm q Module/Package Concepts q Import/Export Paradigm Rigor Enforced by Compiler – but Can’t q Bind/Group Modules into Subsystems q Precisely Specify Interconnections and Interactions Among Subsystems and Components SWEA 13
‘Recent-Past’ Generation? m CSE 5095 m m C++ and Ada 95 q Considered “Legacy” Languages - Old Java, C# - Are they Headed Toward Legacy? q How do they Rate? q What Do they Offer that Hasn't been Offered Before? q What are Unique Benefits and Potential of Java? What about new Web Technologies? q Javascript, Perl, Ph. P, Phython, Ruby q XML and SOAP q How do all of these fit into this process? q Particularly in Regards to C/S Solutions! SWEA 14
What's Next Step? m CSE 5095 m m m Architectural Description Languages q Provide Tools to Describe Architectures q Definition and Communication Codification of Architectural Expertise Frameworks for Specific Domains DB vs. GUI vs. Embedded vs. C/S Formal Underpinning for Engineering Rigor What has Appeared for Each of these? q Struts for GUI q Open Source Frameworks (mediawiki) q Wide-Ranging Standards (XML) q Model-Driven Architectures q What Else? ? ? SWEA 15
Architectural Styles m CSE 5095 m m m What are Popular Architectural Styles? q How are they Characterized? q Example in Practice Explore a Taxonomy of Styles Focus on “Micro-Architectures” q Components q Flow Among Components q Represents “Single” Application Forms Basis for “Macro-Architectures” q System of Systems q Application of Applications q Significantly Scaling Up SWEA 16
Taxonomy of Architectural Styles CSE 5095 m m Data Flow Systems q Batch Sequential q Pipes and Filters Call & Return Systems m q Main/Subroutines (C, Pascal) q Object Oriented q Implicit Invocation q Hierarchical Systems m Virtual Machines q Interpreters q Rule Based Systems Data Centered Systems q DBS q Hypertext q Blackboards Independent Components q Communicating Processes/Event Systems Client/Server q Two-Tier q Multi-Tier SWEA 17
Taxonomy of Architectural Styles m CSE 5095 Establish Framework of … q Components Ø Building Blocks for Constructing Systems Ø A Major Unit of Functionality Ø Examples Include: Client, Server, Filter, Layer, DB q Connectors Ø Defining the Ways that Components Interact Ø What are the Protocols that Mandate the Allowable Interactions Among Components? Ø How are Protocols Enforced at Run/Design Time? Ø Examples Include: Procedure Call, Event Broadcast, DB Protocol, Pipe SWEA 18
Overall Framework m CSE 5095 m m m What Is the Design Vocabulary? q Connectors and Components What Are Allowable Structural Patterns? q Constraints on Combining Components & Connectors What Is the Underlying Conceptual Model? q Von Newman, Parallel, Agent, Message-Passing… q Are their New Emerging Models? q Collaborative Environments/Shareware? What Are Essential Invariants of a Style? q Limits on Allowable Components & Connectors Common Examples of Usage Advantages and Disadvantages of a Style Common Specializations of a Style SWEA 19
Pipes and Filters CSE 5095 Components are Independent Entities. No Shared State! Components with Input and Output Sort Merge Connectors for Flow Streams of I/O m Filters: q Invariant: Unaware of up and Down Stream Behavior q Streamed Behavior: Output Could Go From One Filter to the Next One Allowing Multiple Filters to Run in Parallel. SWEA 20
Pipes and Filters CSE 5095 m m m Possible Specializations: q Pipelines - Linear Sequence q Bounded - Limits on Data Amounts q Typed Pipes - Known Data Format What is a Classic Example? Other Examples: q Compilers q Sequential Processes q Parallel Processes SWEA 21
Pipes and Filters - Another Example m CSE 5095 m m Text Information Retrieval Systems q Scanning Newspapers for Key Words, Etc. q Also, Boolean Search Expressions Where is Such an Architecture Utilized Today? What is Potential Usage in BMI? User Commands Search Disk Controller Programming Result Query Resolver Control Term Search Comparator Data DB SWEA 22
ADTs and OO Architectures m CSE m 5095 Widespread Usage in the 1990’s Advantages Are Well Known Components op Connectors op obj op op op obj obj op op obj m Disadvantages: q Interaction Required Object Identity q If Identity Changes, It Is Difficult to Track All Affected Objects. SWEA 23
Implicit Invocation m CSE 5095 m m Similar to OO in the Sense that Components Can Call Services on Other Components How Does this Work? q Components Have List of Events they can Raise and List of Procedures to Handle Events q When Event is Raised, it is Broadcast q All Components that Have Procedure to Handle Broadcast Event will Act Upon it q The Component That Raised the Event has no Knowledge of Which Component(s) will Handle Event What are Some Examples? SWEA 24
Implicit Invocation m CSE 5095 m Advantages q No Need to Know the Targeted Components q Single Event can Impact Multiple Components q New Event Handlers can Easily be Added q New Events Can then be Raised Disadvantages q No Control Over the Order of Processing When an Event is Raised q No Control Over “Who” and “How Many” Process Events q Very Non-Deterministic System Behavior SWEA 25
What has OO Evolved Into? m CSE 5095 m What has Classic OO Solution Evolved into Today? q Client (Browser + Struts) q Server (Many Variants of OO Languages) q Database Server (typically Relational) Different Style (e. g. , Design Pattern) q Does Pattern Capture All Aspects of Style? q Do we Need to Couple Technology with Pattern? Dr. D, Jan 01, 08 Fever, Flu, Bed Rest No Scripts No Tests Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech) SWEA 26
Layered Systems CSE 5095 Useful Systems Base Utility Core level Users m m m Components - Virtual Machine at Each Layer Connectors - Protocols That Specify How Layers Interaction Is Restricted to Adjacent Layers SWEA 27
Layered Systems m CSE 5095 m Advantages: q Increasing Levels of Abstraction q Support Enhancement - New Layers q Support for Reuse Drawbacks: q Not Feasible for All Systems q Performance Issues With Multiple Layers q Defining Abstractions Is Difficult. SWEA 28
Layered Systems in BMI m CSE 5095 m One Approach to Constructing Access to Patient Data for Clinical Research and Clinical Practice Construct Layered Data Repositories as Below q Each Layer Targets Different User Group q Need to Fine Tune Access Even within Layers Aggregated De-identified Patient Data Provider Cl. Researchers Public Health Researchers SWEA 29
ISO as Layered Architecture m CSE 5095 ISO Open Systems Interconnect (OSI) Model q Now Widely Used as a Reference Architecture q 7 -layer Model q Provides Framework for Specific Protocols (Such as IP, TCP, FTP, RPC, UDP, RSVP, …) Application Presentation Session Transport Network Data Link Physical SWEA 30
ISO OSI Model Application Presentation Session Transport Network Data Link Physical CSE 5095 m m m Application Presentation Session Transport Network Data Link Physical (Hardware)/Data Link Layer Networks: Ethernet, Token Ring, ATM Network Layer Net: The Internet Transport Layer Net: Tcp-based Network Presentation/Session Layer Net: Http/html, RPC, PVM, MPI Applications, E. g. , WWW, Window System, Algorithm SWEA 31
Repositories CSE 5095 ks 8 ks 1 Blackboard (shared data) ks 2 ks 3 ks 6 ks 4 m m ks 7 ks 5 Knowledge Sources Interact With the Blackboard Contains the Problem Solving State Data. Control Is Driven by the State of the Blackboard. DB Systems Are a Form of Repository With a Layer Between the BB and the KSs - Supports q Concurrent Access, Security, Integrity, Recovery SWEA 32
Database System as a Repository CSE 5095 c 8 c 1 Database (shared data) c 2 c 3 c 6 c 4 m m m c 7 c 5 Clients Interact With the DBMS Database Contains the Problem Solving State Data Control is Driven by the State of the Database q Concurrent Access, Security, Integrity, Recovery q Single Layer System: Clients have Direct Access q Control of Access to Information must be Carefully Defined within DB Security/Integrity SWEA 33
Team Project as a Repository CSE 5095 c 8 c 1 Web Portal Shared c 2 c 3 c 6 c 4 m m m c 7 c 5 Clients are Providers, Patients, Clinical Researchers Database Underlies Web Portal Simply a Portion of Architecture q Interactions with PHR (Patients) q Interactions with EMR (Providers) q Interactions with Database/Warehouse (Researchers) SWEA 34
Interpreters CSE 5095 Inputs Outputs m m Program being interpreted Data (program state) Simulated interpretation engine Selected instruction Selected data Internal interpreter state What Are Components and Connectors? Where Have Interpreters Been Used in CS&E? q LISP, ML, Java, Other Languages, OS Command Line SWEA 35
Java as Interpreter CSE 5095 SWEA 36
Process Control Paradigms Input variables CSE 5095 Set point Ds to manipulated variables Controller Input variables Set point m Controller Ds to manipulated variables With Feedback Process Controlled variable Without Feedback Process Controlled variable Also: q Open vs. Close Loop Systems q Well Defined Control and Computational Characters q Heavily Used in Engineering Fields. SWEA 37
Process Architecture: Statechart Diagram? CSE 5095 SWEA 38
Process Architecture: Activity Diagram? m CSE 5095 Clear Applicability to Medical Processes that have Underlying BMI – Low Level Processes Waiting for Heart Signal timeout irregular beat Heart Signal Waiting for Resp. Signal Breath Trigger Local Alarm Trigger Remote Alarm Resp Signal Alarm Reset SWEA 39
Design Patterns as Software Architectures m CSE 5095 m m m Emerged as the Recognition that in Object-Oriented Systems Repetitions in Design Occurred Gained Prominence in 1995 with Publication of “Design Patterns: Elements of Reusable Object. Oriented Software”, Addison-Wesley q “… descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context…” q Akin to Complicated Generic Usage of Patterns Requires q Consistent Format and Abstraction q Common Vocabulary and Descriptions Simple to Complex Patterns – Wide Range SWEA 40
The Observer Pattern m CSE 5095 m m Utilized to Define a One-to-Many Relationship Between Objects When Object Changes State – all Dependents are Notified and Automatically Updated Loosely Coupled Objects q When one Object (Subject – an Active Object) Changes State than Multiple Objects (Observers – Passive Objects) Notified q Observer Object Implements Interface to Specify the Way that Changes are to Occur q Two Interfaces and Two Concrete Classes SWEA 41
The Observer Pattern CSE 5095 SWEA 42
Model View Controller m http: //java. sun. com/blueprints/patterns/MVC-detailed. html CSE 5095 SWEA 43
Model View Controller m CSE 5095 Three Parts of the Pattern: q Model Ø Enterprise Data and Business Rules for Accessing and Updating Data q View Ø Renders the Contents (or Portion) of Model Ø Deals with Presentation of Stored Data Ø Pull or Push Model Possible q Controller Ø Translates Interactions with View into Actions on Model Ø Actions could be Button Clicks (GUI), Get/Post http (Web), etc. SWEA 44
Model View Controller m http: //java. sun. com/blueprints/patterns/MVC-detailed. html CSE 5095 SWEA 45
UML for System Modeling m CSE 5095 m m m UML is a Language for Specifying, Visualizing, Constructing, and Documenting Software Artifacts What Does a Modeling Language Provide? q Model Elements: Concepts and Semantics q Notation: Visual Rendering of Model Elements q Guidelines: Hints and Suggestions for Using Elements in Notation References and Resources q Web: http: //www. uml. org/ Is UML Sufficient for Complexity of BMI? q Able to Model Information Needs for BMI? q Able to Represent Required Architectures? SWEA 46
UML Diagrammatic Representations m CSE 5095 m m Component Diagram: Captures the Physical Structure of the Implementation Deployment Diagram: Captures the Topology of a System’s Hardware Collaboration Diagram: Captures Dynamic Behavior (Message-Oriented) What About Other Diagrams? q State Chart Diagram: Captures Dynamic Behavior (Event-Oriented) q Activity Diagram: Captures Dynamic Behavior (Activity-Oriented) These and Others Seem too Low Level … What is Role of UML for BMI? q Yet Another Design Artifact q Can it be More? SWEA 47
Component Diagram m Captures the Physical Structure of the Implementation CSE 5095 SWEA 48
Deployment Diagram m Captures the Topology of a System’s Hardware CSE 5095 SWEA 49
Collaboration Diagram CSE 5095 SWEA 50
Single and Multi-Tier Architectures m CSE 5095 m Widespread use in Practice for All Types of Distributed Systems and Applications Two Kinds of Components q Servers: Provide Services - May be Unaware of Clients Ø Web Servers (unaware? ) Ø Database Servers and Functional Servers (aware? ) q Clients: Request Services from Servers Ø Must Identify Servers Ø May Need to Identify Self Ø A Server Can be Client of Another Server m Expanding from Micro-Architectures (Single Computer/One Application) to Macro-Architecture SWEA 51
Single and Multi-Tier Architectures m CSE 5095 m m Normally, Clients and Servers are Independent Processes Running in Parallel Connectors Provide Means for Service Requests and Answers to be Passes Among Clients/Servers Connectors May be RPC, RMI, etc. Advantages q Parallelism, Independence q Separation of Concerns, Abstraction q Others? Disadvantages q Complex Implementation Mechanisms q Scalability, Correctness, Real-Time Limits q Others? SWEA 52
Example: Software Architectural Structure CSE 5095 Initial Data Entry Operator (Scanning & Posting) Advanced Data Entry Operators Analyst Manager 10 -100 MB Network Document Server Stored Images/CD Database Server Running Oracle RMI Registry RMI Act. Obj/Server Functional Server SWEA 53
Business Process Model CSE 5095 DB DB Historical Completed Records Applications Licensing DB Supervisor Review Scanner DB Licensing Division Scanning Operator Stored Images Licensing Division Printer Data Entry Operator DB Basic Information Entered New Licenses New Appointments FOI Letters (Request Information, etc. ) SWEA 54
Two-Tier Architecture m CSE 5095 m m Small Manufacturer Previously on C++ New Order Entry, Inventory, and Invoicing Applications in Java Programming Language Existing Customer and Order Database Most of Business Logic in Stored Procedures Tool-generated GUI Forms for Java Objects SWEA 55
Three-Tier Architecture m CSE 5095 m m m Passenger Check-in for Regional Airline Local Database for Seating on Today's Flights Clients Invoke EJBs at Local Site Through RMI EJBs Update Database and Queue Updates JMS Queues Updates to Legacy System DBC API Used to Access Local Database SWEA 56
Four-Tier Architecture m CSE m 5095 m m m Web Access to Brokerage Accounts Only HTML Browser Required on Front End "Brokerbean" EJB Provides Business Logic Login, Query, Trade Servlets Call Brokerbean Use JNDI to Find EJBs, RMI to Invoke Them SWEA 57
Architecture Comparisons m CSE 5095 m m Two-tier Through JDBC API is Simplest Multi-tier: Separate Business Logic, Protect Database Integrity, More Scaleable JMS Queues vs. Synchronous (RMI or IDL): q Availability, Response Time, Decoupling JMS Publish & Subscribe: Off-line Notification RMI IIOP vs. JRMP vs. Java IDL: q Standard Cross-language Calls or Full Java Functionality JTS: Distributed Integrity, Lockstep Actions SWEA 58
Comments on Architectural Styles m CSE 5095 m m m Architectural Styles Provide Patterns q Suppose Designing a New System q During Requirements Discovery, Behavior and Structure of System Will Emerge q Attempt to Match to Architectural Style q Modify, Extend Style as Needed By Choosing Existing Architectural Style q Know Advantages and Disadvantages q Ability to Focus in on Problem Areas and Bottlenecks q Can Adjust Architecture Accordingly Architectures Range from Large Scale to Small Scale in their Applicability We’ll see Examples for BMI Shortly … SWEA 59
Other Issues in Software Architectures m CSE 5095 m m m Consider a Set of Applications q New Software q Legacy, COTS, Databases, etc. A Distributed Application is a Set of Applications Deployed Over a Network that Communicate Relationship Between Applications Different Implementations of “Same” Application on Different Hardware Platforms Configuration of Various Hardware Nodes Different Node Types in the Network Issue: q What is the ‘Best’ Way to Deploy Applications Across the Network of Available Resources? SWEA 60
Distributed Application & Hardware Nodes CSE 5095 m Computers & Connections May have Different Characteristics that Affect their Usage q Speed q Storage q Bandwidth SWEA 61
Objective: ‘Best’ Deployment m CSE 5095 m m A Distributed System is Optimally Deployed if it Yields the Best Performance: Efficient Use of Resources via Throughput, Response Time, or Number of Messages What are Implications in BMI? q Need to Bring Together Multiple Assets q Work Efficiently Across Network q Unifying Clinical Research Repositories SWEA 62
Distr. Systems: Combo of Requirements CSE 5095 interaction patterns software elements hardware elements Specification interfaces connections protocols SWEA 63
Deployment Influenced by Many Factors CSE 5095 algorithms software architecture underlying network replication degree Performance processing nodes usage patterns middleware deployment SWEA 64
Framework for Design and Deployment CSE 5095 SOFTWARE HARDWARE Dependencies Deployment PERFORMANCE SWEA 65
What is I 5? m CSE 5095 m m Five Definition Languages q Interface q Inheritance q Implementation q Instantiation q Installation Five Formal Integrated Graphical Languages Based on UML’s Implementation Diagrams The Application, Network, Dependencies and the Deployment are Part of an Integrated Framework SWEA 66
The Five Levels of I 5 m Implementation (I 2) - Classes of Components, Nodes and Connectors m Abstraction Interface (I 1) - Types of Components, Nodes and Connectors Integration (I 3) - Dependencies Between Component and Node Classes m Instantiation (I 4) - Instances of Each Class Definition m Installation (I 5) - Deployment of Each Instance (Requirements and Complete Deployment) Detail m CSE 5095 SWEA 67
Levels of Specification in I 5 CSE 5095 m Types - Generic Definition of Components, Nodes, and Connectors According to Their Role q Defined in I 1 q Used in I 2 to Define Classes m Classes - Different Implementations of the Types q Defined in I 2 q Used in I 3 to Associate Software Components and Hardware Artifacts and I 4 to Define Instances m Instances - Identical Copies of the Different Classes q Defined in I 4 q Used in I 5 to Deploy Instances Across Nodes SWEA 68
UML m CSE 5095 m UML is a Set of Graphical Specification Languages (OMG’s Standard Design Language Since November, 1997) Implementation Diagrams q Component Diagrams: Ø Show the Physical Structure of the Code in Terms of Code Components and Their Dependencies q Deployment Diagrams: Ø Show the Physical Architecture of the Hardware and Software in the System. Ø They Have a Type and an Instance Version. SWEA 69
UML m CSE m 5095 When to Use Deployment Diagrams “… In practice, I haven’t seen this kind of diagram used much. Most people do draw diagrams to show this kind of information but they are informal cartoons. On the whole, I don’t have a problem with that since each system has its own physical characteristics that your want to emphasize. As we wrestle more and more with distributed systems, however, I’m sure we will require more formality as we understand better which issues need to be highlighted in deployment diagrams. ” q From “UML Distilled. Applying the Standard Object Modeling Language”, by Martin Fowler. Addison-Wesley, Object Technology Series, 7 th. Reprint June, 1998. SWEA 70
Pros and Cons of Graphical Modeling CSE 5095 m Advantages: q q q m Clear to Show Structure Excellent Communication Vehicle Addresses Different Aspects of Modeling in an Integrated Fashion m Disadvantages: q q q Shows Little (or No) Details There is a Big Gap Between Specification and Implementation Limited by Screen Size & Printable Page Solution: Associate a Complete Textual Specification to Graphical Model that Contains the Necessary Details for Each Element SWEA 71
Design Concepts m CSE 5095 m m m Interface Interaction With the Outer World Signature + Requested Services Type: Abstract Entity - Interface + Semantics Subtype: Inherits the Supertype Definition Class: Implementation of a Type Realization: Relation Between a Type and a Class That Implements It Subclass: Inherits the Superclass Implementation Instance: Element of a Class SWEA 72
The I 5 Framework m CSE 5095 m m m An Integrated Specification Framework for Distributed Systems q Support for the Architectural Specification of OO and Component Based Distributed Systems q Heterogeneous Network - Platforms A Five Level Framework for Defining Software and Hardware (Platforms) With a Uniform Notation and With Different Levels of Abstraction Specified Textually in Z or Graphically in UML q Emphasis on Implementation Diagrams Please See http: //www. engr. uconn. edu/~cecilia SWEA 73
Dependencies Between Levels CSE 5095 Component Types Node Types INTERFACE Component Classes Node Classes IMPLEMENTATION Implementation Dependencies Inst. Components INTEGRATION Inst. Nodes System Instantiation Installation Req. (together, separated) INSTANTIATION Installation Req. (fix location) Complete Installation INSTALLATION SWEA 74
Interface - Software: I 1 S m CSE 5095 Components Types q q Type Supertypes Associated Interfaces Calls m Properties q q q Types are Unique Supertypes Must Be Part of I 1 S Calls Must Be Satisfied in I 1 S SWEA 75
Interface - Software: I 1 S CSE 5095 response Client <
Interface - Hardware: I 1 H m CSE 5095 m m Node Types Connector Types Connections m Properties q All Node Types Must Be Connected q Only Node and Connector Types Defined Take Part in the Connections MPI Sockets SUN Intel Pentium SWEA 77
Implementation - Software: I 2 S m CSE 5095 Component Classes q Component Type q Class q Superclasses q Calls to Classes Interfaces m Properties: q Only Types in I 1 S are Allowed q Superclasses Are Realizations of the Supertypes q Calls & Inheritance are Satisfied Within I 2 S SWEA 78
Implementation - Software: I 2 S CSE 5095 PCCtr. Cl response XCtr. Cl <
Implementation - Hardware: I 2 H CSE 5095 m m m Node Classes q Node Type q Class Connector Classes q Type q Class Connections Between Node Classes m Properties q Node and Connector Classes Refine the Types in I 1 H q Connections are With Connector Classes That Refine Connector Types in I 1 H SWEA 80
Implementation - Hardware: I 2 H CSE 5095 MPI Sockets SUN <
Software and Hardware Integration: I 3 CSE 5095 m m Relation <
Software and Hardware Integration: I 3 CSE 5095 response XCtr. Cl <
Instantiation - Software: I 4 S CSE 5095 m Component Instances q Class q Identification q Calls m Properties q Instance Calls Refine Class Calls q Only Classes in I 2 S May Be Instantiated SWEA 84
Instantiation - Software: I 4 S CSE 5095 request c 2: PCCtr. Cl c 3: PCCtr. Cl response request fe 2: XFront. End c 4: XCtr. Cl response receive gossip ct 1: Counter receive gossip fe 1: XFront. End c 1: PCCtr. Cl response receive ct 2: Counter receive gossip receive ct 3: Counter receive gossip ct 4: Counter receive gossip ct 5: Counter receive gossip ct 6: Counter SWEA 85
Instantiation - Hardware: I 4 H m CSE 5095 m Node Instances q Class q Identification Connector Instances q Class q Identification q Set of Connected Nodes m Properties q There are Only Instances of the Node & Connector Classes Defined in I 2 H q Connectors Refine I 2 H Connections SWEA 86
Instantiation - Hardware: I 4 H CSE 5095 pc 1: Win 95 pc 2: Win 95 pc 3: Win 95 pc 4: Win 95 sock 1 sock 2 sock 3 sock 4 sun 1: Sun. OS 4. 1. 4 sun 2: Sun. OS 4. 1. 4 sun 3: Sun. OS 4. 1. 4 sun 4: Sun. OS 4. 1. 4 sun 5: Sun. OS 4. 1. 4 sun 9: Sun. OS 4. 1. 4 sun 10: Sun. OS 4. 1. 4 mpi 1 sun 6: Sun. OS 4. 1. 4 sun 7: Sun. OS 4. 1. 4 sun 8: Sun. OS 4. 1. 4 SWEA 87
Installation Requirements m CSE 5095 m m m A Set of Component Instances Must Be Deployed Together or Separated Fix the Location of Some Component Instances All Installation Requirements Must Be Consistent With the Requirements Imposed by All the Previous Specification Levels Requirements q Together q Separated q Fix SWEA 88
Installation - Requirements: Ifix, Iseparated CSE 5095 receive fe 2: XFront. End fe 1: XFront. End request sun 2: Sun. OS 4. 1. 4 request sun 3: Sun. OS 4. 1. 4 separated = {ct 1: Counter, ct 2: Counter, ct 3: Counter, ct 4: Counter, ct 5: Counter, ct 6: Counter} SWEA 89
Mapping Applications to Hardware m CSE m 5095 Applications (Left) and Hardware (Right) Instances Restrictions on q Which Applications can be Deployed on Which Hardware? q Which Applications Deployed Together? q Which Applications Must be Separate? SWEA 90
Objective: ‘Best” Optimal Deployment CSE 5095 SWEA 91
Using I 5 for BMI m CSE 5095 Focus at Architectural Level q Multiple Assets to Bring Together Ø Hospital EMRs, Provider EMRs, Other Systems q q Multiple and Disparate Hardware Different Contexts and Needs Ø Clinical Practice – (Near) Real-Time Integration/Access Ø Clinical Research – De-Identified Integrated Repository m Performance will be Key Issue q Clinical Practice – Time of Access q Clinical Research – Volume of Information Ø Some Genomic Data Requires Terabytes of Data! Ø Information overload Possible SWEA 92
The Next Big Challenge m CSE 5095 m Macro-Architectures q System of Systems q Application of Applications Involves Two Key Issues q Interoperability Ø Heterogeneous Distributed Databases Ø Heterogeneous Distributed Systems Ø Autonomous Applications q Scalability Ø Rapid and Continuous Growth Ø Amount of Data Ø Variety of Data Types Ø Different Privacy Levels or Ownerships of Data SWEA 93
Interoperability: A Classic View CSE 5095 Simple Federation FDB Global Schema 4 Federated Integration Local Schema Multiple Nested Federation Federated Integration Local Schema FDB 1 Local Schema Federation FDB 3 Federation SWEA 94
What is CORBA? m CSE m 5095 Differs from Typical Programming Languages Objects can be … q Located Throughout Network q Interoperate with Objects on other Platforms q Written in Ant PLs for which there is mapping from IDL to that Language SWEA 95
What is CORBA? m CSE m 5095 Allow Interactions from Client to Server CORBA Installed on All Participating Machines SWEA 96
CORBA-Based Development CSE 5095 IDL file Client Application IDL Compiler Stub ORB/IIOP Object Implementation IDL Compiler Skeleton ORB/IIOP SWEA 97
Database Interoperability in the Internet m CSE 5095 m Technology q Web/HTTP, JDBC/ODBC, CORBA (ORBs + IIOP), XML Architecture Information Broker • Mediator-Based Systems • Agent-Based Systems SWEA 98
ORB Integration: Java Client + Legacy Application CSE 5095 Java Client Legacy Application Java Wrapper Object Request Broker (ORB) CORBA is the Medium of Info. Exchange Requires Java/CORBA Capabilities SWEA 99
Java Client with Wrapper to Legacy Application CSE 5095 Java Client Java Application Code WRAPPER Mapping Classes JAVA LAYER NATIVE LAYER Native Functions (C++) RPC Client Stubs (C) Interactions Between Java Client and Legacy Appl. via C and RPC C is the Medium of Info. Exchange Java Client with C++/C Wrapper Legacy Application Network SWEA 100
COTS and Legacy Appls. to Java Clients CSE 5095 COTS Application Legacy Application Java Application Code Native Functions that Map to COTS Appl NATIVE LAYER Native Functions that Map to Legacy Appl NATIVE LAYER JAVA LAYER Mapping Classes JAVA NETWORK WRAPPER Network Java Client Java is Medium of Info. Exchange - C/C++ Appls with Java Wrappers SWEA 101
Java Client to Legacy App via RDBS CSE 5095 Transformed Legacy Data Java Client Updated Data Relational Database System(RDS) Extract and Generate Data Transform and Store Data Legacy Application SWEA 102
JDBC m CSE 5095 m JDBC API Provides DB Access Protocols for Open, Query, Close, etc. Different Drivers for Different DB Platforms JDBC API Java Application Driver Manager Driver Oracle Driver Access Driver Sybase SWEA 103
Connecting a DB to the Web CSE 5095 m DBMS m CGI Script Invocation or JDBC Invocation Web Server Internet m Web Server are Stateless DB Interactions Tend to be Stateful Invoking a CGI Script on Each DB Interaction is Very Expensive, Mainly Due to the Cost of DB Open Browser SWEA 104
Connecting More Efficiently m CSE 5095 DBMS Helper Processes CGI Script or JDBC Invocation m Web Server Internet m To Avoid Cost of Opening Database, One can Use Helper Processes that Always Keep Database Open and Outlive Web Connection Newly Invoked CGI Scripts Connect to a Preexisting Helper Process System is Still Stateless Browser SWEA 105
DB-Internet Architecture CSE 5095 WWW Client (Netscape) WWW client (Info. Explore) WWW Client (Hot. Java) Internet HTTP Server DBWeb Gateway DBWeb Dispatcher DBWeb Gateway SWEA 106
Biomedical Architectures m CSE 5095 m Transcend Normal Two, Three, and Four Tier Solutions – Macro-Architecture An Architecture of Architectures! q Need to Integrate Systems that are Themselves Multi-Tier and Distributed q Need to Resolve Data Ownership Issues Ø State of Connecticut Agencies Don’t Share Ø Competing Hospitals Seek to Protect Market Share q T 1, T 2, and Clinical Research Requires Ø Interoperating Genomic Databases/Supercomputers Ø Integration of De-identified Patient Data from Multiple Sources to Allow Sufficient Study Samples Ø De-identified Data Repositories or Data Marts q Dealing with Ownership Issues (DNA Research) SWEA 107
Consider Team Project Architecture Providers Patients CSE 5095 PHR EMR Web-Based Portal(XML + HL 7) Open Source DB (XML or My. SQL) Feedback Repository Clinical Researchers Education Materials SWEA 108
Internet and the Web m CSE 5095 A Major Opportunity for Business q A Global Marketplace Ø Business Across State and Country Boundaries q A Way of Extending Services Ø Online Payment vs. VISA, Mastercard q A Medium for Creation of New Services Ø Publishers, Travel Agents, Teller, Virtual Yellow Pages, Online Auctions … m m A Boon for Academia q Research Interactions and Collaborations q Free Software for Classroom/Research Usage q Opportunities for Exploration of Technologies in Student Projects What are Implications for BMI? Where is the Adv? SWEA 109
WWW: Three Market Segments Server CSE 5095 Business to Business Corporate Network q q q Server Intranet q q Decision support Mfg. . System monitoring corporate repositories Workgroups Internet Corporate Server Network Internet q q Provider Network Information sharing Ordering info. /status Targeted electronic commerce Sales Marketing Information Services Server Provider Network Exposure to Outside SWEA 110
Information Delivery Problems on the Net m CSE 5095 m m m Everyone can Publish Information on the Web Independently at Any Time q Consequently, there is an Information Explosion q Identifying Information Content More Difficult There are too Many Search Engines but too Few Capable of Returning High Quality Data Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changes What are Information Delivery Issues for BMI? q Publishing of Patient Education Materials q Publishing of Provider Education Materials q How Can Patients/Providers find what Need? q How do they Know if its Relevant? Reputable? SWEA 111
Example Web Applications CSE 5095 m m m Scenario 1: World Wide Wait q A Major Event is Underway and the Latest, Up-tothe Minute Results are Being Posted on the Web q You Want to Monitor the Results for this Important Event, so you Fire up your Trusty Web Browser, Pointing at the Result Posting Site, and Wait, and Wait … What is the Problem? q The Scalability Problems are the Result of a Mismatch Between the Data Access Characteristics of the Application and the Technology Used to Implement the Application May not be Relevant to BMI: Hard to Apply Scenario SWEA 112
Example Web Applications CSE 5095 m m m Scenario 2: q Many Applications Today have the Need for Tracking Changes in Local and Remote Data Sources and Notifying Changes If Some Condition Over the Data Source(s) is Met q To Monitor Changes on Web, You Need to Fire Your Trusty Web Browser from Time to Time, Cache the Most Recent Result, and Difference Manually Each Time You Poll the Data Source(s) Issue: Pure Pull is Not the Answer to All Problems BMI: If a Patient Enters Data that Sets off a Chain Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event) SWEA 113
What is the Problem? m CSE 5095 m Applications are Asymmetric but the Web is Not q Computation Centric vs. Information Flow Centric Type of Asymmetry q Network Asymmetry Ø Satellite, CATV, Mobile Clients, Etc. q Client to Server Ratio Ø Too Many Clients can Swamp Servers q Data Volume Ø Mouse and Key Click vs. Content Delivery q Update and Information Creation Ø Clients Need to be Informed or Must Poll m Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notification SWEA 114
What are Information Delivery Styles? m CSE 5095 m m Pull-Based System q Transfer of Data from Server to Client is Initiated by a Client Pull q Clients Determine when to Get Information q Potential for Information to be Old Unless Client Periodically Pulls Push-Based System q Transfer of Data from Server to Client is Initiated by a Server Push q Clients may get Overloaded if Push is Too Frequent Hybrid q Pull and Push Combined q Pull First and then Push Continually SWEA 115
Publish/Subscribe CSE 5095 m m m Semantics: Servers Publish/Clients Subscribe q Servers Publish Information Online q Clients Subscribe to the Information of Interest (Subscription-based Information Delivery) q Data Flow is Initiated by the Data Sources (Servers) and is Aperiodic q Danger: Subscriptions can Lead to Other Unwanted Subscriptions Applications q Unicast: Database Triggers and Active Databases q 1 -to-n: Online News Groups May work for Clinical Researcher to Provider Push SWEA 116
Design Options for Nodes m CSE 5095 Three Types of Nodes: q Data Sources Ø Provide Base Data which is to be Disseminated q Clients Ø Who are the Net Consumers of the Information q Information Brokers Ø Acquire Information from Other Data Sources, Add Value to that Information and then Distribute this Information to Other Consumers Ø By Creating a Hierarchy of Brokers, Information Delivery can be Tailored to the Need of Many Users m Brokers may be Ideal Intermediaries for BMI! q Act on Behalf of Patients, Providers q Incorporate Secure Access SWEA 117
Research Challenges m CSE 5095 Ubiquitous/Pervasive Many computers and information appliances everywhere, networked together m Inherent Complexity: q Coping with Latency (Sometimes Unpredictable) q Failure Detection and Recovery (Partial Failure) q Concurrency, Load Balancing, Availability, Scale q Service Partitioning q Ordering of Distributed Events “Accidental” Complexity: q Heterogeneity: Beyond the Local Case: Platform, Protocol, Plus All Local Heterogeneity in Spades. q Autonomy: Change and Evolve Autonomously q Tool Deficiencies: Language Support (Sockets, rpc), Debugging, Etc. SWEA 118
Infosphere Problem: too many sources, too much information Internet: Information Jungle n tio a pt a e rc Ad op er ty Mg ack Clean, Reliable, Timely Information, Anywhere mt Personalized Filtering & Info. Delivery Microfeedb Digital Earth Pr specialization ou s Re Infopipes Sensors Co ntin l. Q ua rie ue s Info rm atio n Q uali ty CSE 5095 SWEA 119
Current State-of-Art CSE 5095 Web Server Mainframe Database Server Thin Client SWEA 120
Infosphere Scenario – for BMI CSE 5095 Infotaps & Fat Clients Sensors Variety of Servers Many sources Database Server SWEA 121
Heterogeneity and Autonomy m CSE 5095 Heterogeneity: q How Much can we Really Integrate? q Syntactic Integration Ø Different Formats and Models Ø Web/SQL Query Languages q Semantic Interoperability Ø Basic Research on Ontology, Etc m Autonomy q No Central DBA on the Net q Independent Evolution of Schema and Content q Interoperation is Voluntary q Interface Technology (Support for Isvs) Ø DCOM: Microsoft Standard Ø CORBA, Etc. . . SWEA 122
Security and Data Quality m CSE 5095 Security q System Security in the Broad Sense q Attacks: Penetrations, Denial of Service q System (and Information) Survivability Ø Security Fault Tolerance Ø Replication for Performance, Availability, and Survivability m Data Quality q Web Data Quality Problems Ø Local Updates with Global Effects Ø Unchecked Redundancy (Mutual Copying) Ø Registration of Unchecked Information Ø Spam on the Rise SWEA 123
Legacy Data Challenge m CSE 5095 m Legacy Applications and Data q Definition: Important and Difficult to Replace q Typically, Mainframe Mission Critical Code q Most are OLTP and Database Applications Evolution of Legacy Databases q Client-server Architectures q Wrappers q Expensive and Gradual in Any Case SWEA 124
Potential Value Added/Jumping on Bandwagon m CSE 5095 m m Sophisticated Query Capability q Combining SQL with Keyword Queries Consistent Updates q Atomic Transactions and Beyond But Everything has to be in a Database! q Only If we Stick with Classic DB Assumptions Relaxing DB Assumptions q Interoperable Query Processing q Extended Transaction Updates Commodities DB Software q A Little Help is Still Good If it is Cheap q Internet Facilitates Software Distribution q Databases as Middleware SWEA 125
Data Warehousing and Data Mining m CSE 5095 m Data Warehousing q Provide Access to Data for Complex Analysis, Knowledge Discovery, and Decision Making q Underlying Infrastructure in Support of Mining q Provides Means to Interact with Multiple DBs q OLAP (on-Line Analytical Processing) vs. OLTP Data Mining q Discovery of Information in a Vast Data Sets q Search for Patterns and Common Features based q Discover Information not Previously Known Ø Medical Records Accessible Nationwide Ø Research/Discover Cures for Rare Diseases q Relies on Knowledge Discovery in DBs (KDD) SWEA 126
Data Warehousing and OLAP m CSE 5095 m m A Data Warehouse q Database is Maintained Separately from an Operational Database q “A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for Management’s Decision Making Process [W. H. Inmon]” OLAP (on-Line Analytical Processing) q Analysis of Complex Data in the Warehouse q Attempt to Attain “Value” through Analysis q Relies on Trained and Adept Skilled Knowledge Workers who Discover Information Data Mart q Organized Data for a Subset of an Organization q Establish De-Identified Marts for BMI Research SWEA 127
Building a Data Warehouse m CSE 5095 Option 1 q Leverage Existing Repositories q Collate and Collect q May Not Capture All Relevant Data m Option 2 q Start from Scratch q Utilize Underlying Corporate Data Corporate data warehouse Option 1: Consolidate Data Marts Option 2: Build from scratch Data Mart . . . Data Mart Corporate data SWEA 128
BMI – Partition/Excerpt Data Warehouse m CSE 5095 m Clinical and Epidemiological Research (and for T 2 and T 1) Each Study Submitted to Institutional Review Board (IRB) q For Human Subjects (Assess Risks, Protect Privacy) q See: http: //resadm. uchc. edu/hspo/irb/ To Satisfy IRB (and Privacy, Security, etc. ), Reverse Process to Create a Data Mart for each Approved Study q Export/Excerpt Study Data from Warehouse q May be Single or Multiple Sources BMI data warehouse Data Mart . . . Data Mart SWEA 129
Data Warehouse Characteristics m CSE m 5095 m m Utilizes a “Multi-Dimensional” Data Model Warehouse Comprised of q Store of Integrated Data from Multiple Sources q Processed into Multi-Dimensional Model Warehouse Supports of q Times Series and Trend Analysis q “Super-Excel” Integrated with DB Technologies Data is Less Volatile than Regular DB q Doesn’t Dramatically Change Over Time q Updates at Regular Intervals q Specific Refresh Policy Regarding Some Data SWEA 130
Three Tier Architecture CSE 5095 monitor External data sources OLAP Server integrator Summarization report Operational databases Extraxt Transform Load Refresh serve Data Warehouse Query report Data mining metadata Data marts SWEA 131
Data Warehouse Design m CSE 5095 m m Most of Data Warehouses use a Start Schema to Represent Multi-Dimensional Data Model Each Dimension is Represented by a Dimension Table that Provides its Multidimensional Coordinates and Stores Measures for those Coordinates A Fact Table Connects All Dimension Tables with a Multiple Join q Each Tuple in Fact Table Represents the Content of One Dimension q Each Tuple in the Fact Table Consists of a Pointer to Each of the Dimensional Tables q Links Between the Fact Table and the Dimensional Tables for a Shape Like a Star SWEA 132
What is a Multi-Dimensional Data Cube? m CSE 5095 m m m Representation of Information in Two or More Dimensions Typical Two-Dimensional - Spreadsheet In Practice, to Track Trends or Conduct Analysis, Three or More Dimensions are Useful For BMI – Axes for Diagnosis, Drug, Subject Age SWEA 133
Multi-Dimensional Schemas m CSE 5095 m m m Supporting Multi-Dimensional Schemas Requires Two Types of Tables: q Dimension Table: Tuples of Attributes for Each Dimension q Fact Table: Measured/Observed Variables with Pointers into Dimension Table Star Schema q Characterizes Data Cubes by having a Single Fact Table for Each Dimension Snowflake Schema q Dimension Tables from Star Schema are Organized into Hierarchy via Normalization Both Represent Storage Structures for Cubes SWEA 134
Example of Star Schema CSE 5095 Product Date Month Year Sale Fact Table Date Product. No Prod. Name Prod. Desc Categoryu Product Store Customer Store. ID City State Country Region Unit_Sales Dollar_Sales Customer Cust. ID Cust. Name Cust. City Cust. Country SWEA 135
Example of Star Schema for BMI CSE 5095 Vitals Date Month Year Patient Fact Table Visit Date BP Temp Resp HR (Pulse) Vitals Symptoms Patient Symptoms Pulmonary Heart Mus-Skel Skin Digestive Medications Etc. Patient. ID Patient. Name Patient. City Patient. Country Reference another Star Schema for all Meds SWEA 136
A Second Example of Star Schema … CSE 5095 SWEA 137
and Corresponding Snowflake Schema CSE 5095 SWEA 138
Data Warehouse Issues m CSE 5095 m Data Acquisition q Extraction from Heterogeneous Sources q Reformatted into Warehouse Context - Names, Meanings, Data Domains Must be Consistent q Data Cleaning for Validity and Quality is the Data as Expected w. r. t. Content? Value? q Transition of Data into Data Model of Warehouse q Loading of Data into the Warehouse Other Issues Include: q How Current is the Data? Frequency of Update? q Availability of Warehouse? Dependencies of Data? q Distribution, Replication, and Partitioning Needs? q Loading Time (Clean, Format, Copy, Transmit, Index Creation, etc. )? q For CTSA – Data Ownership (Competing Hosps). SWEA 139
Knowledge Discovery m CSE 5095 m m Data Warehousing Requires Knowledge Discovery to Organize/Extract Information Meaningfully Knowledge Discovery q Technology to Extract Interesting Knowledge (Rules, Patterns, Regularities, Constraints) from a Vast Data Set q Process of Non-trivial Extraction of Implicit, Previously Unknown, and Potentially Useful Information from Large Collection of Data Mining q A Critical Step in the Knowledge Discovery Process q Extracts Implicit Information from Large Data Set SWEA 140
Steps in a KDD Process m CSE m 5095 m m m m Learning the Application Domain (goals) Gathering and Integrating Data Cleaning Data Integration Data Transformation/Consolidation Data Mining q Choosing the Mining Method(s) and Algorithm(s) q Mining: Search for Patterns or Rules of Interest Analysis and Evaluation of the Mining Results Use of Discovered Knowledge in Decision Making Important Caveats q This is Not an Automated Process! q Requires Significant Human Interaction! SWEA 141
OLAP Strategies m CSE 5095 m OLAP Strategies q Roll-Up: Summarization of Data q Drill-Down: from the General to Specific (Details) q Pivot: Cross Tabulate the Data Cubes q Slide and Dice: Projection Operations Across Dimensions q Sorting: Ordering Result Sets q Selection: Access by Value or Value Range Implementation Issues q Persistent with Infrequent Updates (Loading) q Optimization for Performance on Queries is More Complex - Across Multi-Dimensional Cubes q Recovery Less Critical - Mostly Read Only q Temporal Aspects of Data (Versions) Important SWEA 142
On-Line Analytical Processing m CSE 5095 m Data Cube q A Multidimensonal Array q Each Attribute is a Dimension In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Date Product Store acron Rolla, MO 7/3/99 budwiser LA, CA 5/22/99 833. 92 large pants NY, NY 2/12/99 771. 24 3’ diaper Date Cuba, MO 7/30/99 Sale 325. 24 Pants Diapers Beer Nuts West East 81. 99 Region Central Mountain South Jan Feb March April Date SWEA 143
On-Line Analytical Processing m CSE 5095 For BMI – Imagine a Data Table with Patient Data q Define Axis q Summarize Data q Create Perspective to Match Research Goal q Essentially De-identified Data Mart Medication Patient Med Birth. Dat Dosage Steve Lipitor 1/1/45 John Zocor 2/2/55 80 mg Harry Crestor 3/3/65 5 mg Lois Lipitor 4/4/66 20 mg Charles Crestor 7/1/59 10 mg Lescol Crestor Zocor Lipitor 10 mg 5 10 Dosage 20 40 80 1940 s 1950 s 1960 s 1970 s Decade SWEA 144
Examples of Data Mining The Slicing Action q A Vertical or Horizontal Slice Across Entire Cube m CSE 5095 s Months Slice on city Atlanta Products Sales Ci tie Months Multi-Dimensional Data Cube SWEA 145
Examples of Data Mining The Dicing Action q A Slide First Identifies on Dimension q A Selection of Any Cube within the Slice which Essentially Constrains All Three Dimensions m CSE 5095 Months Products Sales At lan ta Products Sales Ci tie s Months Electronics March 2000 Atlanta Dice on Electronics and Atlanta SWEA 146
Examples of Data Mining Drill down on Q 1 Q 2 Q 3 Q 4 Roll Up on Location (State, USA) Roll Up: Combines Multiple Dimensions From Individual Cities to State Ca Ari li zo Ge forn na or ia Io gia wa Q 1 Q 2 Q 3 Q 4 Products Sales C A Ga olu tlan m t Sa ines bu a va vil s nn le ah Jan Feb March Products Sales Cit Location (city, GA) ies Drill Down - Takes a Facet (e. g. , Q 1) and Decomposes into Finer Detail CSE 5095 SWEA 147
Mining Other Types of Data m CSE m 5095 Analysis and Access Dramatically More Complicated! Time Series Data for Glucose, BP, Peak Flow, etc. Spatial databases Multimedia databases World Wide Web Time series data Geographical and Satellite Data SWEA 148
Advantages/Objectives of Data Mining m CSE 5095 m m Descriptive Mining q Discover and Describe General Properties q 60% People who buy Beer on Friday also have Bought Nuts or Chips in the Past Three Months Predictive Mining q Infer Interesting Properties based on Available Data q People who Buy Beer on Friday usually also Buy Nuts or Chips Result of Mining q Order from Chaos q Mining Large Data Sets in Multiple Dimensions Allows Businesses, Individuals, etc. to Learn about Trends, Behavior, etc. q Impact on Marketing Strateg SWEA 149
Data Mining Methods (1) m CSE 5095 Association q Discover the Frequency of Items Occurring Together in a Transaction or an Event q Example Ø 80% Customers who Buy Milk also Buy Bread Hence - Bread and Milk Adjacent in Supermarket Ø 50% of Customers Forget to Buy Milk/Soda/Drinks Hence - Available at Register m Prediction q Predicts Some Unknown or Missing Information based on Available Data q Example Ø Forecast Sale Value of Electronic Products for Next Quarter via Available Data from Past Three Quarters SWEA 150
Association Rules m CSE m 5095 m m Motivated by Market Analysis Rules of the Form q Item 1^Item 2^…^ Itemk+1 ^ … ^ Itemn Example q “Beer ^ Soft Drink Pop Corn” Problem: Discovering All Interesting Association Rules in a Large Database is Difficult! q Issues Ø Interestingness Ø Completeness Ø Efficiency q Basic Measurement for Association Rules Ø Support of the Rule Ø Confidence of the Rule SWEA 151
Data Mining Methods (2) m CSE 5095 Classification q Determine the Class or Category of an Object based on its Properties q Example Ø Classify Companies based on the Final Sale Results in the Past Quarter m Clustering q Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity q Example Ø Group Crime Locations to Find Distribution Patterns SWEA 152
Classification m CSE 5095 m m Two Stages q Learning Stage: Construction of a Classification Function or Model q Classification Stage: Predication of Classes of Objects Using the Function or Model Tools for Classification q Decision Tree q Bayesian Network q Neural Network q Regression Problem q Given a Set of Objects whose Classes are Known (Training Set), Derive a Classification Model which can Correctly Classify Future Objects SWEA 153
An Example m Attributes m Class Attribute - Play/Don’t Play the Game Training Set q Values that Set the Condition for the Classification q What are the Pattern Below? CSE 5095 m Attribute Possible Values outlook sunny, overcast, rain temperature continuous humidity continuous windy true, false Outlook Temperature Humidity sunny 85 85 overcast 83 78 sunny 80 90 sunny 72 95 sunny 72 70 … … … Windy false true false … Play No Yes No No Yes. . . SWEA 154
Data Mining Methods (3) m CSE 5095 Summarization q Characterization (Summarization) of General Features of Objects in the Target Class q Example Ø Characterize People’s Buying Patterns on the Weekend Ø Potential Impact on “Sale Items” & “When Sales Start” Ø Department Stores with Bonus Coupons m Discrimination q Comparison of General Features of Objects Between a Target Class and a Contrasting Class q Example Ø Comparing Students in Engineering and in Art Ø Attempt to Arrive at Commonalities/Differences SWEA 155
Summarization Technique m CSE m 5095 Attribute-Oriented Induction Generalization using Concert hierarchy (Taxonomy) barcode category 14998 milk brand diaryland content size Skim 2 L food 12998 mechanical Motor. Craft valve 23 a 12 in … … . . . Milk … Skim milk … 2% milk Category milk … Content Count skim 2% … 280 98. . . bread White whole bread … wheat Lucern … Dairyland Wonder … Safeway SWEA 156
Why is Data Mining Popular? m CSE 5095 Technology Push q Technology for Collecting Large Quantity of Data Ø Bar Code, Scanners, Satellites, Cameras q Technology for Storing Large Collection of Data Ø Databases, Data Warehouses Ø Variety of Data Repositories, such as Virtual Worlds, Digital Media, World Wide Web m m Corporations want to Improve Direct Marketing and Promotions - Driving Technology Advances q Targeted Marketing by Age, Region, Income, etc. q Exploiting User Preferences/Customized Shopping What is Potential for BMI? q How do you see Data Mining Utilized? q What are Key Issues to Worry About? SWEA 157
Requirements & Challenges in Data Mining m CSE 5095 m m m Security and Social q What Information is Available to Mine? q Preferences via Store Cards/Web Purchases q What is Your Comfort Level with Trends? User Interfaces and Visualization q What Tools Must be Provided for End Users of Data Mining Systems? q How are Results for Multi-Dimensional Data Displayed? Performance Guarantees q Range from Real-Time for Some Queries to Long. Term for Other Queries Data Sources of Complex Data Types or Unstructured Data - Ability to Format, Clean, and Load Data Sets SWEA 158
CSE 5095 An Initiative of the University of Connecticut Center for Public Health and Health Policy Robert H. Aseltine, Jr. , Ph. D. Cal Collins January 16, 2008 SWEA 159
What is CHIN? m CSE 5095 m State of Connecticut Agencies Collect and Maintain Data in Separate Databases such as: q Vital Statistics: Birth, Death (DPH) q Surveillance data: Lead Screening and Immunization Registries (DPH) q Administrative services: LINK system (DCF), CAMRIS (DMR) q Benefit programs: WIC (DPH), Medicaid (DSS) q Educational achievement: (PSIS) Such Data is Un-Integrated q Impossible to Track Assess Target Populations q Difficult to Develop Evidence-Based Practices q Limits Meaningful Interactions Among State Agencies SWEA 160
What Do We Mean by “Integration? ” UCONN Health Center Low Birth Weight Infant Registry Dept. of Mental Retardation Birth to Three System CT Dept. of Education PSIS System CSE 5095 Last Name First Name DOB SSN Birth Wt. (kg) Last Name First Name DOB Street Town Appel April 01/01/1 999 016 -000 -9876 2. 8 Allen Gwen 01/01/19 99 Apple Enfie Berry John 02/02/1 997 216 -000 -4576 2. 9 Buck Jerome 07/01/19 99 Burbank West Carat Colleen 03/03/1 993 119 -000 -1234 1. 9 Cleary Jane 03/03/19 93 Cedar Tolla Ernst Max 04/04/1 994 116 -000 -3456 2. 7 Dory Daniel 03/03/19 93 Dogfish Hartf Gomez Gloria 05/05/1 995 036 -000 -9999 2. 6 Ernst Max 04/04/19 94 Elm Enfie Hurst William 06/06/1 996 016 -000 -5599 3. 1 Friday Joe 11/03/19 99 Fruit Wind Keller Helene 07/07/1 997 017 -000 -2340 2. 5 Glenn Valerie 03/23/19 98 Glen Branf Martinez Pedro 08/08/1 998 018 -000 -9886 Martinez Pedro 08/08/19 98 High Hartf Rodriguez Felix 09/09/1 999 029 -000 -9111 Riley Lily 03/03/19 96 Ipswich Bridg Smith Peggy 10/10/2 000 016 -000 -8787 Sanchez Ramon 03/03/19 93 Juniper New 2. 8 2. 5 First Name CMT Math Polio Vac Date Days in Attendance Appel April 134 01/05/ 1999 179 Carat Colleen 256 05/01/ 1998 122 Cleary Jane 268 01/28/ 2000 178 Ernst Max 152 01/09/ 1999 145 Gomez 3. 0 Last Name Gloria 289 01/01/ 1999 168 Friday Joe 265 10/01/ 1999 170 Keller Helene 309 11/01/ 2001 180 Martinez Pedro 248 12/01/ 2003 180 Riley Lily 201 01/01/ 1999 122 Sanchez Ramon 249 01/01/ 1999 159 Last Name First Name DOB SSN Birth Wt. Street Town CMT Math Grade 3 Polio Vaccination Date Days in Attendance Ernst Max 04/04/1994 116 -000 -3456 2. 7 Elm Enfield 152 01/09/1999 145 Martinez Pedro 08/08/1998 018 -000 -9886 3. 0 High Hartford 248 12/01/2003 180 SWEA 161
Key Challenges to Integrating Data m CSE 5095 m m Security and Privacy q HIPAA q FERPA q WIC, Social Security (Medicaid/Medicare) regulations q State statutes Alteration/disruption of business practices Unique identification of individuals/cases Accuracy and reliability of data Disparate hardware/software platforms SWEA 162
Key Challenges to Integrating Data m CSE 5095 m m Security and Privacy q HIPAA q FERPA q WIC, Social Security (Medicaid/Medicare) regulations q State statutes Alteration/disruption of business practices Unique identification of individuals/cases Accuracy and reliability of data Disparate hardware/software platforms SWEA 163
The Solution: CHIN m CSE m 5095 m Connecticut Health Information Network A Federated Network That: q Allows Shared Access to “Health”-related Data From Heterogeneous Databases q Allows Agencies to Retain Complete Control Over Access to Data q Has Minimal Impact on Business Practices q Complies with Security and Privacy Statutes q Incorporates Cutting-edge Approaches to Case Matching Partnership of: q Early Partners: DPH, DCF, DDS, Do. E, DOIT, UConn, Akaza Research SWEA 164
CHIN Processes and Components Map data elements to source database Publish “metadata” to CHIN with security and privacy rules CHIN Metadata Registry CSE 5095 Define data elements in CHIN Contributor CHIN Metadata Registry and CHIN Trusted Broker Query Execution: Identifier Matching and Data Merge CHIN GRID and Trusted Broker Review Committee Approval Build Query CHIN Enterprise Administration CHIN Metadata Registry and CHIN Query Builder De-identify Data CHIN Trusted Broker and De-Identification Engine Integrated, De-identified Data SWEA 165
Original CHIN Architecture CSE 5095 http: //publichealth. uconn. edu/CHIN. php SWEA 166
Second CHIN Architecture: User Side CSE 5095 A & A Contributor SWEA 167
Second CHIN Architecture: Contributor Side CSE 5095 A & A Front End Trusted Broker SWEA 168
Current CHIN Architecture CSE 5095 SWEA 169
CHIN Architecture: Standards-based m CSE 5095 m All data is mapped to Health Level Seven’s Clinical Document Architecture (CDA) in XML q Health Level Seven (HL 7), is an ANSI-approved Standards Developing Organization q HL 7 has its own XML Special Interest Group, responsible for developing XML implementations of its standards in XML q HL 7 is also an active participant in W 3 C, the organization responsible for the development of XML q CDA was approved as an ANSI standard in November of 2000. Component Architecture communicates via Web Services and OGSA Grid standards SWEA 170
CHIN Arch. : Proven, Open Components m CSE 5095 Components are based on open-source libraries q The grid-based servers Mako and Virtual Mako are part of the Mobius Project from Ohio State University’s Dept. of Bio. Informatics q The translation tools to get data into XML are provided by the XQuare and XBridge projects, hosted on the Object. Web website, an open source middleware community q The algorithm and code for identity management is FEBRL, Freely Extensible Biomedical Record Linkage, which was developed at Australian National University q Nu. SOAP Web Services Engine for component integration SWEA 171
FEBRL m CSE m 5095 m m m Identifier matching in FEBRL proceeds in four steps: Data cleansing and standardization q Removes, to the degree possible, string discrepancies based on common misspellings, extra white space, or misplaced name or address components. Indexing q Reduces the size of the number of record comparisons which must be performed for scalability; blocking, sorting, and bigram indexing methods are all supported. Record comparison q Conducted using an arbitrary composition of exact or inexact string comparison methods over any combination of fields Classification. q Follows the Felligi-Sunter 34 model, with records pairs assigned a weight based on a pallet of probabilities and matches determined based on the record pair weights SWEA 172
FEBRL m CSE 5095 m The current prototype uses FEBRL to implement a simplistic method of linkage whereby record pairs are declared a match if the first and last name are exactly equal. Next Steps q Evaluate the accuracy of linking records over a rubric of five data fields - first name, last name, date of birth, social security number, and gender. q Exact and inexact matching (ie misspellings and slight discrepancies), including experimental variations of the service based on the blinded bigram matching algorithm. q Assess false positives and false negatives produced by each palette of field comparison algorithms. q Evaluate the accuracy of linking records using fabricated data sets with characteristics similar to real datasets q Experiment with variations of canopy cluster matching algorithm. SWEA 173
Other CHIN Issues m CSE 5095 m Why Choose an Open Architecture? q Increased Accountability q Plenty of Documentation and Research q Greater Transparency q Ease of Installation, Maintenance, Dissemination How is Data Ported into CHIN? q CHIN is based on a Grid, with each organization supporting its own data through a Contributor server q Agency staff has complete control over access to data on CHIN by other users q Only one server faces to the outside network SWEA 174
Creating a Contributor Server CSE 5095 ed ish ubl D to M R P Data Elements Firewall L M e. X t era n Ge External IP Address Connection to CHIN Trusted Broker SS L Contributor Server Contains: XML generated files Mako service Java files m m *. xqy files XML files to generate CDA compliant files Datasource SWEA 175
Connecting to rest of Network CSE 5095 ed ish ubl D to M R P N HI o. C Data Elements c Ac L M e. X t era n Ge External IP Address Connection to • Metadata Registry takes information • CHIN Trusted About data elements • About data security • Broker information Datasource SS • Contributor profile is registered with L CHIN Network Admin t ss Firewall e Contributor Server Contains: XML generated files Mako service Java files m m *. xqy files XML files to generate CDA compliant files Datasource SWEA 176
How do we get data out? m CSE 5095 m The Trusted Broker component: q Pulls XML from the Virtual Mako which reaches out to all Contributors q Compares records from different Contributors using FEBRL q De-identifies data sets to generate a final data set for Investigators The Front End component: q Provides a central place for users to connect to the system q Connects to the Metadata Registry and the Trusted Broker via Web Services calls q Allows different users of the system to perform different actions SWEA 177
Getting Data from CHIN CSE 5095 SWEA 178
Getting Data From CHIN CSE 5095 XML Files • CHIN also contains: • A Front-end server to take queries • A Trusted Broker to compare data, perform record linkage, and de-identify results FEBRL Result Set Deidentify Final Result Set SWEA 179
Progress to Date m CSE m 5095 m m Needs assessment completed Technical and functional specifications identified MOU’s with state agencies Expanding list of partners Prototype developed Funding for Model Network Development/Deployment /Evaluation 2008 SWEA 180
Demo CSE 5095 SWEA 181
EMR Architectures m CSE 5095 Provider-Based Systems have Two Variants q All Data In House Ø Larger Providers (Clinics) Ø Control All Own Data Ø Sizeable IT Staff for 24 -7 Operations Ø Control of Own Backups q Limited In House – Off Site Storage (Larger, Multi -Site Practices Ø Smaller Providers – Limited IT Staff Ø Desire Out-of-Box Solution Ø Local Data for Ease of Access Ø Remote Storage – Promotes Off-Hours Access q Even 1 st Variant – Service for “Backups” SWEA 182
EMR for Large Providers - All. Script CSE 5095 SWEA 183
EMR for Smaller Providers Provider’s Office Vendor’s Location Server/Data Farm CSE 5095 Local EMR Patient Data Remote EMR Remote Access SWEA 184
Integrating Clinical Repositories m CSE 5095 m Provider/Hospital Relationship q Provider has Privileges at Hospital q Provider Chooses Office-Based EMR q More Easily Integrated with Hospital EMR q Emerging at Community Hospital Level Example: q Milford Hospital, MA q All Area Providers with Privileges Linked in q Ability to See Patient Records, Tests, at Hospital q Unclear on Uploads from Providers to Hospital q However, No Link to UMass Medical Center (of which Milford Hospital is Affiliated) SWEA 185
Integrating Clinical Repositories m CSE m 5095 m CTSA – Region Wide Clinical/Translational Research Target Area Hospitals q St. Francis, Hartford, Hosp. Central CT, CCMC q Each Hospital has Own Clinical Repository (EMR) For Wider-Scoped T 1, T 2, and Clinical Research q Need to Integrate these Repositories at Some Level q What is Most Practical? Ø Setting up Centralized De-Identified Repository? Ø Creating Data Marts as you go? Ø What are Pros and Cons of Each? q Researcher Seeking CHF Patient Data Needs to have De-Identified Data Mart SWEA 186
Integrating Clinical Repositories CSE 5095 SWEA 187
Integrating Clinical Repositories CSE 5095 SWEA 188
Integrating Clinical Repositories CSE 5095 SWEA 189
Integrating Clinical Repositories CSE 5095 NHIN Prototype Phase I SWEA 190
Integrating Clinical Repositories CSE 5095 NHIN Prototype Phase II SWEA 191
CSE 5095 SWEA 192
Personal Health Record Integration CSE 5095 SWEA 193
Concluding Remarks m CSE 5095 m Only Scratched Surface on Architectures q Micro Architectures q Macro Architectures q Super-Macro Architectures (We’ll see …) What’s are Key Facets in the Discussion? q Role and Impact of Standards q Open Solutions q Architectural Variants – Reuse “Architecture” Ø Can we Reuse CHIN for Clinical Practice? Ø Are All Contributors Simply Each Hospital and EHR? Ø How do we Connect all of the Pieces? m What are Next Steps? q Let’s Review Some other Work q Source: Wide Range of Presentations on Web SWEA 194