Qo S-driven Lifecycle Management of Service -oriented Distributed

Qo. S-driven Lifecycle Management of Service -oriented Distributed Real-time & Embedded Systems Aniruddha Gokhale a. gokhale@vanderbilt. edu www. dre. vanderbilt. edu/~gokhale Assistant Professor ISIS, Dept. of EECS Vanderbilt University Nashville, Tennessee February 16 th, 2006 www. dre. vanderbilt. edu

Service-oriented Style of Distributed Realtime & Embedded Systems – Regulating & adapting to (dis)continuous changes in runtime environments • e. g. , online prognostics, dependable upgrades – Satisfying tradeoffs between multiple (often conflicting) Qo. S demands • e. g. , secure, real-time, reliable, etc. – Satisfying Qo. S demands in face of fluctuating and/or insufficient resources • e. g. , mobile ad hoc networks (MANETs) 2

Characteristics of SOA-style DRE Systems • Manifestation of Service-Oriented Architectures (SOA) in the distributed real-time & embedded (DRE) systems space – – Applications composed of a one or more “operational string” of services A service is a component or an assembly of components Dynamic (re)deployment of services into operational strings is necessary New class of Qo. S (performance + survivability) requirements • Realized using enabling component middleware technologies e. g. , CCM, . NET and J 2 EE 3

Qo. S Issues for SOA-style DRE Systems • Per-component concern – choice of implementation – Depends of resources, compatibility with other components in assembly • Communication concern – choice of communication mechanism used • Assembly concerns – what components to assemble dynamically? What order? What configurations end-to-end are valid? • Failure recovery concern – what is the unit of failover? • Sharing concern – shared components will need proactive survivability since it affects several services simultaneously • Availability concern – what is the degree of redundancy? What replication styles to use? Does it apply to whole assembly? • Deployment concern – how to select resources? Risk alleviation? 4

Tangled Concerns in SOA-style DRE Systems • Demonstrates numerous tangled para-functional concerns • Significant sources of variability that affect end-to -end Qo. S (performance + survivability) Separation of Concerns & Managing Variability is the Key Design-time Deployment-time Run-time 5

(1) Design-time Variability Management in SOA-style DRE Systems • Focus on Separation of Concerns • “What if” Analysis • Analytical methods • Simulation methods • Model-driven generative programming for “what if” • Understanding the impact of individual concerns • Students involved: • Krishnakumar Balasubramanian, Jaiganesh Balasubramanian, Gan Deng, Amogh Kavimandan, James Hill, Sumant Tambe, Arundhati Kogekar, Dimple Kaul Work partly supported by DARPA PCES program (PI), DARPA ARMS Program, PI on subcontracts from Lockheed Martin ATL, & NSF CSR-SMA Program, PI 6

Separation of Concerns using Co. SMIC • Project Lead and PI DARPA PCES program • Co. SMIC project focuses on separation of deployment and configuration concerns • Model-driven generative programming framework • Complementary technology to CIAO and DAn. CE middleware • www. dre. vanderbilt. edu/ cosmic • • Co. SMIC tools e. g. , PICML used for separation of concerns in operational strings Captures the data model of the OMG D&C specification Synthesis of static deployment plans for DRE components New capabilities being added for static deployment planning Work supported by DARPA PCES Program, PI 7

Case Study for “What if” Analysis: Virtual Router • Network services need support for efficient (de)-multiplexing, dispatching and routing/forwarding • . e. g. , VPN Service provided by a virtual router • Provides differentiated services to customers, e. g. , prioritized service • VPN setup messages must be efficiently (de) multiplexed, serviced and forwarded • Implemented using middleware • Need to estimate capacity of the system at design-time Problem boils down to capacity planning and estimating performance of configured middleware 8

Performance Analysis of Reactor Pattern in VR • Customers send VPN setup messages to router • VPN setup messages manifest as events at the VR • VR must service these events (e. g. , resource allocation) and honor the prioritized service, if any • Accepted messages are forwarded • Events could be dropped in overload conditions The Reactor architectural pattern allows event-driven applications to demultiplex & dispatch service requests that are delivered to an application from one or more clients. • Reactor pattern decouples the detection, demultiplexing, & dispatching of events from the handling of events • Participants include the Reactor, Event handle, Event demultiplexer, abstract and concrete event handlers 9

Modeling VR Capabilities in a Reactor • Consider VPN service for two customer classes Ø Reactor accepts and handles two types of input events • Differentiated services for two classes Ø Events are handled in prioritized order • Each event type has a separate queue to hold the incoming events. Buffer capacity for events of type one is N 1 and of type two is N 2. • Event arrivals are Poisson for type one and type two events with rates l 1 and l 2, resp. Model of a single-threaded, select-based reactor implementation • Event service time is exponential for type one and type two events with rates m 1 and m 2, resp. 10

Performance Metrics of Interest for Reactor • Throughput: -Number of events that can be processed -Applications such as telecommunications call processing. • Queue length: -Queuing for the event handler queues. -Appropriate scheduling policies for applications with real-time requirements. • Total number of events: -Total number of events in the system. -Scheduling decisions. -Resource provisioning required to sustain system demands. • Probability of event loss: -Events discarded due to lack of buffer space. -Safety-critical systems. -Levels of resource provisioning. • Response time: -Time taken to service the incoming event. -Bounded response time for real-time systems. 11

Performance Analysis using Stochastic Reward Nets Transition A 2 B 1 Place B 2 Sn 1 Immediate transition Sn 2 A 1 N 1 Sr 1 (a) N 2 Inhibitor arc St. Snp. Sht T_Srv. Snp. Sht T_End. Snp. Sht Token S 2 Sr 2 Snp. Sht. In. Prog (b) • Stochastic Reward Nets (SRNs) are an extension to Generalized Stochastic Petri Nets (GSPNs) which are an extension to Petri Nets. • Extend the modeling power of GSPNs by allowing: Guard functions Marking-dependent arc multiplicities General transition probabilities Reward rates at the net level • Allow model specification at a level closer to intuition. • Solved using tools such as SPNP (Stochastic Petri Net Package). 12

Modeling the Reactor using SRN (1/2) A 1 N 1 B 1 Sn 1 Event arr. A 2 Service queue B 2 Sn 2 S 1 Sr 1 • • N 2 Servicing the(a) event Sr 2 Drop events on overflow St. Snp. Sht T_Srv. Snp. Sht T_End. Snp. Sht Prioritized service Snp. Sht. In. Prog (b) Service completio n Models arrivals, queuing, and prioritized service of events. Transitions A 1 and A 2: Event arrivals. Places B 1 and B 2: Buffer/queues. Places S 1 and S 2: Service of the events. Transitions Sr 1 and Sr 2: Service completions. Inhibitor arcs: Place B 1 and transition A 1 with multiplicity N 1 (B 2, A 2, N 2) - Prevents firing of transition A 1 when there are N 1 tokens in place B 1. Inhibitor arc from place S 1 to transition Sr 2: - Offers prioritized service to an event of type one over event of type two. - Prevents firing of transition Sr 2 when there is a token in place S 1. 13

Modeling the Reactor using SRN (2/2) A 2 A 1 N 2 N 1 B 2 Sn 1 Sn 2 S 1 S 2 Sr 1 Sr 2 St. Snp. Sht (a) T_Srv. Snp. Sht T_End. Snp. Sht. In. Prog (b) • Process of taking successive snapshots • Reactor waits for new events when currently enabled events are handled • Sn 1 enabled: Token in St. Snp. Sht & Tokens in B 1 & No Token in S 1. • Sn 2 enabled: Token in St. Snp. Sht & Tokens in B 2 & No Token in S 2. • T_Srv. Snp. Sht enabled: Token in S 1 and/or S 2. • T_End. Snp. Sht enabled: No token in S 1 and S 2. • Sn 1 and Sn 2 have same priority • T_Srv. Snp. Sht lower priority than Sn 1 and Sn 2 14

VR SRN: Performance Estimates • SRN model solved using Stochastic Petri Net Package (SPNP) to obtain estimates of performance metrics. • Parameter values: l 1 = 0. 5/sec, l 2 = 0. 5/sec, m 1 = 2. 0/sec, m 2 = 2. 0/sec. • Two cases: N 1 = N 2 = 1, and N 1 = N 2 = 5. Perf. metric N 1 = N 2 = 1 N 1 = N 2 = 5 #1 #2 Throughput 0. 37/s 0. 40/s Queue length 0. 065 0. 12 Total events Loss probab. 0. 25 0. 065 0. 27 0. 065 0. 32 0. 35. 00026 Observations: • Probability of event loss is higher when the buffer space is 1 • Total number of events of type two is higher than type one. • Events of type two stay in the system longer than events of type one. • May degrade the response time of event requests for class 2 customers compared to requests from class 1 customers 15

VR SRN: Sensitivity Analysis • Analyze the sensitivity of performance metrics to variations in input parameter values. • Vary l 1 from 0. 5/sec to 2. 0/sec. • Values of other parameters: l 2 = 0. 5/sec, m 1 = 2. 0/sec, m 2 = 2. 0/sec, N 1 = N 2 = 5. • Compute performance measures for each one of the input values. Observations: • Throughput of event requests from customer class #1 increases, but rate of increase declines. • Throughput of event requests from customer class #2 remains unchanged. 16

Middleware Pattern Simulations in OMNe. T++ • OMNe. T++ is a discrete event simulator for networked systems • Developers write C++ code for simulation • www. omnetpp. org. ned files Simulation kernel Mod Submod 1 Submod 2 Statistics Output Vector File Output Scalar File Mod_n. h/. cpp Submod 1. h/. cpp Submod 2. h/. cpp Visualization and Animation OMNe. T++ Initialization File OMNe. T++ Message File UI Library 17

The Simulation Model for Reactor Event Handlers with queues Statistics Collector Synchronous Event Demultiplexer Event Generator Reactor 18

Addressing Middleware Variability Challenges Although middleware provides reusable building blocks that capture commonalities, these blocks and their compositions incur variabilities that impact performance in significant ways. • Compositional Variability • Incurred due to variations in the compositions of these building blocks • Need to address compatibility in the compositions and individual configurations • Dictated by needs of the domain • E. g. , Leader-Follower makes no sense • Per-Block Configuration Variability in a single threaded Reactor • Incurred due to variations in implementations & configurations for a patterns-based building block • E. g. , single threaded versus thread-pool based reactor implementation dimension that crosscuts the event demultiplexing strategy (e. g. , select, poll, Wait. For. Multiple. Objects 19

Automation Goals for “What if” Analysis Applying design-time performance analysis techniques to estimate the impact of variability in middleware-based DRE systems Refined model of a pattern • Build and validate performance models for invariant parts of middleware building blocks • Weaving of variability concerns Refined Invariant Refined manifested in a building block into model weave model of weave ofvariabilitymodel of the performance models a pattern workload • Compose and validate Refined performance models of building Refined model of blocks mirroring the anticipated model of a pattern software design of DRE systems a pattern • Estimate end-to-end performance Composed System of composed system • Iterate until design meets performance requirements system workload 20

Automating & Scaling the “What if” Process • Model-driven Generative technologies • Developed the SRN Modeling Language (SRNML) in GME • Applied C-SAW framework (from Univ of Alabama, Birmingham) for model scalability R&D supported by NSF CSR-SMA Program in collaboration with Dr. Jeff Gray (UAB) and Dr. Swapna Gokhale (UConn) 21

Analyzing Impact of Individual Concerns Engineering Mechanics – Statics & Dynamics – for analyzing impact of concerns? • Borrow concepts from physical systems to analyze the impact of individual concerns on end-to-end system • Method of joints, method of sections, free body diagrams, equilibrium conditions 22

Engineering Mechanics for DRE Systems A concern is viewed as a “force” Challenges • Directionality – are concerns vectors? • Rigidity – are assemblies rigid or deformable? • Force distribution – does a concern have components along Cartesian axes • Well-defined structures – do software components have properties like trusses • Second order effects – transient effects showing up elsewhere • Notion of friction – these are probably the capacities of resources 23

(2) Deployment-time Intelligence • Near optimal deployment planning decisions • Specialized middleware stacks • Students involved: • Arvind Krishna (graduated), Jaiganesh Balasubramanian, Gan Deng, Dimple Kaul, Arundhati Kogekar, Amogh Kavimandan Work partly supported by DARPA ARMS Program, PI on subcontracts from Lockheed Martin ATL 24

Deployment Challenges • Service workloads and resource capacity issues – service placement depends on workloads and available resources • Component accessibility patterns -- component survivability depends on its sharing degree • Differentiated levels of service –affects resource provisioning and survivability strategies • Service failover – different failover possibilities e. g. , as a whole or part assembly or one component at a time • Resource sharing – increases the risk of component(s) requiring proactive survivability strategy 25 • No one-size-fits-all dependability strategy – cannot dictate one FT strategy on all services

Service Placement Problem • A resource configuration is a tuple RC = (C, D, HC, EC) where: C 1 S 1 A 2 A 1 C 2 C 3 A 3 S 3 C 4 S 4 • C: is a set of computation nodes each attributed by: • PI(c): processing index (capacity) • MI(c): memory index • RI(c): reliability index • D: is a set of Data access units of types in {Ai, Sj} • HC: C (D): is a map associating each c in C with a set of data access units • EC C C : is a set of comm. links each attributed by: • BI(e): bandwidth index • RI(e): reliability index • System performance can be measured in a variety of ways. Considering a task assignment TA: T C: • Resource utilization: for processing it is defined as the average of all task processing utilization, given as • Memory utilization MU(TA) and link utilization LU(TA) can defined similarly • System utilization factor: The weighted sum percentage of utilizing the system resources • Reliability is more tricky to measure. In general, the reliability of a given computation string is the multiplication of the reliability indices of the underlying nodes and communication edges. • The reliability factor RF(TA) for a given task assignment, TA, depends on: • The reliability of all its computation strings. • The group reliability the underlying nodes (taking into account their relative distances). • The resource utilization of the systems. The more the system hardware utilized the less reliable it is. 26

Specializations via Generative Programming • GME-based POSAML language for POSA 2 pattern language • Generative programming to synthesize FOCUS and Aspect. C++ rules • Synthesize specialized middleware stacks for distributed deployment of operational 27 strings.

Run-time Qo. S-aware Mechanisms • Focus on Autonomic Mechanisms • Survivability & Fault tolerance • Students involved: • Jaiganesh Balasubramanian, Sumant Tambe, Jules White, Nishanth Shankaran Work supported by DARPA ARMS Program, PI on subcontracts from Lockheed Martin ATL, BBN Technologies, & Telcordia 28

Distributed Virtual Container Approach • primary … Virtual Container Concept for Component M/W • • • … … Virtual Container • • • Salient features • … secondary Based on a virtualization idea Spans boundaries across all the replicas, which could be placed on different physical nodes Provides a single point for resource provisioning & component programming Seamless environment for configuring FT, LB, online swapping Handles fine-grained checkpointing across all the replicas in virtual container Reliable multicast & state synchronization confined to a virtual container Maintains information about how the replicas are connected to the external component assemblies • • Provides an operating context for the components/assemblies requiring Qo. S Relieves programmer from having to configure the middleware for Qo. S support Clients are oblivious to replication • Normal container programming model • Middleware hides the virtualization details 29

Run-time Qo. S & Survivability Mechanisms • A configurable approach to survivability including micro- (infrastructure) & macro- (assembly & operational string) level strategies • Micro-level strategies monitor infrastructure state to make proactive decisions at • Component level (swapping & migration) • Middleware level (configurations) • Component Server Level (process resource allocations) • Node level (multiple components) • Macro-level strategies monitor assembly health to make failover decisions • Failover based on type of failover unit • Affects service placement decisions • May involve load balancing • State synchronization issues • Replication styles (hidden by FT strategies) • Initial prototype developed using Component-Integrated ACE ORB (CIAO) & Deployment & Configuration Engine (DAn. CE) (www. dre. vanderbilt. edu) 31

Research Summary Applications Middleware R&D in new, holistic approaches to end-to-end Qo. S management in services-enabled distributed real-time & embedded systems Research Challenge • Managing problem space variability OS & Protocols Hardware Research Approach Benefits • Model-driven generative approach to separation of concerns • Enhance the state-of-art in MDD and AOSD technologies • Design-time “What-if” • Variety of analysis techniques analysis using including non traditional generative prog mechanisms • Generative technologies for automated analysis • Application of Engineering Mechanics • Deployment-time intelligent decisions • New applications of constraints optimization theory • Middleware specializations • Near optimal deployment • Specialized middleware stacks • Run-time Mechanisms • Multilevel, proactive Qo. S mgmt schemes • Virtualization ideas • Largely autonomic • Survivable systems 33