Скачать презентацию Outline Part A WF Specification and Verification Part Скачать презентацию Outline Part A WF Specification and Verification Part

aaca0b9ea5a9f5d103095760313f0775.ppt

  • Количество слайдов: 77

Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration What Is It All About? WF Specification Techniques Statecharts CTL and Model Checking Summary and WF Execution Infrastructure Open Research Issues © Gerhard Weikum • Failure Handling • Stochastic Modeling • WF System Configuration • Summary and Open Research Issues 1

WFMS Architecture for E-Services Clients WF server type 2 WF server type 1 Comm WFMS Architecture for E-Services Clients WF server type 2 WF server type 1 Comm server . . . © Gerhard Weikum . . . App server type 1 App server type n 2

Interoperability between WF Systems WFMS 1 101 WFMS 2 WF Mediator © Gerhard Weikum • wrap WFMS using XML-based interface (e. g. , WSDL/WSFL or ? ? ? ) • route activity and sub-WF invocations through WF mediator • same protocol for activities and sub-WFs • add‘l functions for sub-WF monitoring 3

Some “AI-complete” Problems Grand challenge: service discovery and matchmaking State of the art: standardized Some “AI-complete” Problems Grand challenge: service discovery and matchmaking State of the art: standardized syntax & protocols (à la UDDI) with queries on yellow pages (“business registry“) Needed: semantics & interoperability with automatic reasoning about process/activity interface, behavior, and outcome standardized ontologies are a step forward, but still far from final goal; we need: Semantic Web + Intelligent Search © Gerhard Weikum 4

Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration What Is It All About? WF Specification Techniques Statecharts CTL and Model Checking Summary and WF Execution Infrastructure Failure Handling Open Research Issues © Gerhard Weikum • Stochastic Modeling • WF System Configuration • Summary and Open Research Issues 5

Important System Issues • Scalability, Reliability, Availability, Manageability, . . . • Differentiated quality Important System Issues • Scalability, Reliability, Availability, Manageability, . . . • Differentiated quality of service and performance guarantees (e. g. , class-specific response time and workflow turnaround time) see, e. g. , Mentor-lite, http: //www-dbs. cs. uni-sb. de/~mlite/ • World-wide failure masking for exactly-once behavior with easy app development see, e. g. , Phoenix project http: //research. microsoft. com/db/phoenix/ © Gerhard Weikum 6

The Need for Failure Masking Please review and place your order Place your order The Need for Failure Masking Please review and place your order Place your order Your server command (process id #20) has been terminated. Re-run your command (severity 13) in /export/home/WWW/your-reliable-eshop. biz/mb_1300_db. mb 1 © Gerhard Weikum 7

Long-Lived & Distributed Execution / Budget: =1000; Trials: =1 Select Conf Go Check Conf. Long-Lived & Distributed Execution / Budget: =1000; Trials: =1 Select Conf Go Check Conf. Fee [Found] / Cost: =0 Check Cost Check Tr. Expenses [Fok & Eok] / Cost : = Conf. Fee + Tr. Expenses No [!Found] © Gerhard Weikum Atomic (transactional) write of persistent state & context guarantees forward recovery 8

Digression: Two-Phase Commit Protocol (2 PC) for Distributed Atomic Transactions Coordinator Agent 1 write Digression: Two-Phase Commit Protocol (2 PC) for Distributed Atomic Transactions Coordinator Agent 1 write „begin“ send „prepare“ Agent 2 send „prepare“ force log entries & write „prepared“ send „yes“ write „commit“ send „ack“ send „commit“ write „commit“ send „ack“ write „end“ © Gerhard Weikum 9

Statechart for 2 PC Protocol prepare 1 / yes 1 initial T|F / prepare Statechart for 2 PC Protocol prepare 1 / yes 1 initial T|F / prepare 1; prepare 2 collecting sorry 1 | sorry 2 yes 1 & yes 2 / abort 1; / commit 1; abort 2 committed aborted / abort 1; abort 2 T|F ack 1 & T|F ack 2 ack 1 C-pending & ack 2 A-pending / commit 1; commit 2 © Gerhard Weikum forgotten T 1|F 1 initial 1 T 1|F 1 prepare 1 / sorry 1 prepared 1 commit 1 / ack 1 committed 1 abort 1 / ack 1 aborted 1 commit 1 / ack 1 abort 1 / ack 1 T 2|F 2 initial 2 prepare 2 / sorry 2 / yes 2 T 2|F 2 prepared 2 abort 2 commit 2 / ack 2 committed 2 aborted 2 commit 2 / ack 2 abort 2 / ack 2 10

Long-lived & Distributed Execution / Budget: =1000; Trials: =1 Select Conf Go Check Conf. Long-lived & Distributed Execution / Budget: =1000; Trials: =1 Select Conf Go Check Conf. Fee [Found] / Cost: =0 Check Cost Check Tr. Expenses [Fok & Eok] / Cost : = Conf. Fee + Tr. Expenses No [!Found] Queued transactions & 2 PC guarantee consistency of distributed WFMS & exactly-once execution © Gerhard Weikum 11

From ACID To Recovery Guarantees Problem: Client that does not receive a returncode from From ACID To Recovery Guarantees Problem: Client that does not receive a returncode from transactional server cannot easily find out the transaction outcome and may be tempted to re-initiate the (non-idempotent) transaction, thus producing unacceptable effects. Approach: In addition to atomicity, the transactional server needs to guarantee the exactly-once execution of the transaction, where execution includes the server‘s reply message. (almost) perfect failure masking © Gerhard Weikum 12

Stateless Applications Based on Queues stateless application (running on client, or app server or Stateless Applications Based on Queues stateless application (running on client, or app server or data server): • user sends input message • app program sends request message to data server • data server executes transaction and sends reply message to app • app program sends output message to user there are no conversations with the user within a transaction, and subsequent transactions are independent Solution Queued Transactions: • message recovery by queue manager with persistent, recoverable message queues • exactly-once execution by enclosing message dequeue and enqueue into transaction © Gerhard Weikum 13

Illustration of 2 -Tier Queued Transaction User Application Process (Client) input output . . Illustration of 2 -Tier Queued Transaction User Application Process (Client) input output . . . enqueue request dequeue reply dequeue request Database Server © Gerhard Weikum enqueue reply . . . server transaction 14

Illustration of 3 -Tier Queued Transaction User Client input output . . . enqueue Illustration of 3 -Tier Queued Transaction User Client input output . . . enqueue request dequeue reply dequeue request Application Server . . . Database Server enqueue reply . . . © Gerhard Weikum distributed server transaction 15

Correctness of Queued Transaction Protocol Theorem: With the queued transaction protocol for stateless applications, Correctness of Queued Transaction Protocol Theorem: With the queued transaction protocol for stateless applications, the following guarantees hold: 1. Once the user-input transaction is committed, a request is executed by the server exactly once. 2. Once the user-input transaction is committed, the user output is delivered at least once. 3. If user output is testable, the user output is delivered exactly once, provided the user-input transaction has been committed. Inherent (small window of) uncertainty: • (last) user input may get lost • (last) user output may be sent more than once can be eliminated with testable output (using special hardware) © Gerhard Weikum 16

Client During Normal Operation user-input processing by client: begin transaction; enqueue (request); commit transaction; Client During Normal Operation user-input processing by client: begin transaction; enqueue (request); commit transaction; user-output processing by client: wait until reply queue is not empty; begin transaction; dequeue (reply); while user has not acknowledged the reply or sent the next request do present reply to user; end /*while*/; commit transaction; © Gerhard Weikum 17

Server During Normal Operation request-reply processing by data server: begin transaction; dequeue (request); perform Server During Normal Operation request-reply processing by data server: begin transaction; dequeue (request); perform data operations and generate reply; enqueue (reply); commit transaction; © Gerhard Weikum 18

Client and Server Restart Client restart: check reply queue; if not empty then process Client and Server Restart Client restart: check reply queue; if not empty then process reply like during normal operation; end /*if*/; Server restart: check request queue; if not empty then initiate processing of requests like during normal operation end /*if*/; © Gerhard Weikum 19

Pseudo-Conversational Transactions for Stateful Applications • Queue-based message recovery for entire conversations • Conversational Pseudo-Conversational Transactions for Stateful Applications • Queue-based message recovery for entire conversations • Conversational “logical unit of work” broken down into chain of stateless transactions with (small) application state maintained in the queue (analogously to Cookies, but more general and much more reliable) • Dequeue of reply and enqueue of next request combined into one transaction for exactly-once execution guarantee • good for apps such as travel reservation, electronic shopping, etc. © Gerhard Weikum 20

Illustration of Pseudo-Conversational Transactions User Application Process (Client) . . . Database Server . Illustration of Pseudo-Conversational Transactions User Application Process (Client) . . . Database Server . . . © Gerhard Weikum 21

Correctness of Pseudo-Conversational Transaction Protocol Theorem: With the queue-based message recovery for conversational multi-step Correctness of Pseudo-Conversational Transaction Protocol Theorem: With the queue-based message recovery for conversational multi-step transaction chains, the following guarantees hold: 1. Once the initial user-input transaction that starts the entire conversation is committed, the entire transaction chain is executed by the server exactly once. 2. Once the initial user-input transaction is committed, each user-output message throughout the conversation is delivered at least once. 3. If user output is testable, each user-output message is delivered exactly once, provided the initial user-input transaction has been committed. © Gerhard Weikum 22

Queue-based Message Recovery for Exactly-Once Workflow Execution At end of activity execute transaction that Queue-based Message Recovery for Exactly-Once Workflow Execution At end of activity execute transaction that combines: • writing the activity‘s modifications of workflow state and context to persistent store • writing the state modifications that result from the firing of outgoing transitions to persistent store • writing the context modifications that result from the actions of firing transitions to persistent store • notifying the follow-up activities by enqueueing messages Newly invoked activity executes transaction that combines: • dequeueing of notification message • writing the workflow state and context to persistent store © Gerhard Weikum 23

Use of Queued Transactions in Travel Planning Workflow Check. Conf. Fee Check Flight / Use of Queued Transactions in Travel Planning Workflow Check. Conf. Fee Check Flight / Budget: =1000; Trials: =1; Select Conference Go Select Tutorials [Cost Budget] Compute Fee [Conf. Found] / Cost: =0 / Cost = Conf. Fee + Travel. Cost Check Áirfare Check Hotel [!Conf. Found] © Gerhard Weikum Check. Travel. Cost Queued Check Cost [Cost > Budget & Trials 3] No transactions & 2 PC guarantee consistency of distributed WFMS [Cost >& exactly-once Trials++ Budget & Trials < 3] / execution 24

Compensation of Invoked Applications / Budget: =1000; Trials: =1 Select Conf [Found] / Cost: Compensation of Invoked Applications / Budget: =1000; Trials: =1 Select Conf [Found] / Cost: =0 Check Cost Check Tr. Expenses Provide compensating steps & invoke steps (mostly) automatically © Gerhard Weikum Go Check Conf. Fee [Fok & Eok / Cost : = Conf. Fee + Tr. Expenses No [!Found] Cancel Travel Cancel Conf 25

Meaningful Compensation Spheres (1) / Budget: =1000; Trials: =1 Select Conf Check Conf. Fee Meaningful Compensation Spheres (1) / Budget: =1000; Trials: =1 Select Conf Check Conf. Fee [Found] / Cost: =0 Check Tr. Expenses Arbitrary compensation spheres may leave workflow in non-resumable configuration ! © Gerhard Weikum ? Go Check Cost [Fok & Eok / Cost : = Conf. Fee + Tr. Expenses No [!Found] 26

Meaningful Compensation Spheres (2) / Budget: =1000; Trials: =1 Select Conf [Found] / Cost: Meaningful Compensation Spheres (2) / Budget: =1000; Trials: =1 Select Conf [Found] / Cost: =0 Check Tr. Expenses Restrict atomicity spheres to a single state and its enclosed activities & apps © Gerhard Weikum Go Check Conf. Fee ! Check Cost [Fok & Eok / Cost : = Conf. Fee + Tr. Expenses No [!Found] 27

The Need for Multi-Tier Application Recovery Realistic example: Expedia or Travelocity style multi-tier service The Need for Multi-Tier Application Recovery Realistic example: Expedia or Travelocity style multi-tier service Client Expedia App Web Server Expedia App Server Data Server © Gerhard Weikum Sabre App Server Amadeus App Server Data Server 28

Need for Integrated & Application-transparent Data, Message, and Process Recovery Users Web app server Need for Integrated & Application-transparent Data, Message, and Process Recovery Users Web app server Business portal server ? Data server Other clients for largely autonomous components © Gerhard Weikum 29

Efficient Solution: Recovery Contracts For each process: • log all non-deterministic events (non-forced) Upon Efficient Solution: Recovery Contracts For each process: • log all non-deterministic events (non-forced) Upon interaction between sender and receiver: • sender promises recoverable state and message (e. g. , via replay) and resends message if necessary • receiver promises duplicate elimination and recoverable state when releasing sender promise + low run-time overhead: one forced log write per multi-tier request/reply + fast restart ( high availability) rebuild process state & message table and replay + independent recovery of autonomous components prototype implementation for IE 6 / Apache / PHP / My. SQL plus COM+-based implementation work in Phoenix project at MSR © Gerhard Weikum 30

Committed Interaction Contract (CIC) • Sender Obligation S 1: persistent state as of message Committed Interaction Contract (CIC) • Sender Obligation S 1: persistent state as of message time or later • Sender Obligation S 2: persistent message • S 2 a: resend message periodically until released by receiver • S 2 b: resend message upon explicit request until released • Sender Obligation S 3: unique messages • Receiver Obligation R 1: duplicate message elimination • Receiver Obligation R 2: persistent state • R 2 a: persistent state as of message time or later before releasing sender from S 2 a (stable interaction) • R 2 b: persistent state & message before releasing sender from S 2 b (installed interaction) Immediately Committed Interaction (ICIC): Receiver releases sender from S 2 a, S 2 b immediately (similar to optimized 2 PC) – crucial for autonomous recovery © Gerhard Weikum 31

Statechart for CIC sender [true] stability notification interaction running / make state recoverable: / Statechart for CIC sender [true] stability notification interaction running / make state recoverable: / message S 1, S 2 sent and message transfer persistent promised [true] commit notification [true] / log message arrival running message transfer [true] interaction (known to be) stable: (S 2 a released) interaction (known to be) installed: (S 2 b released) / stability notification message received [interaction stable] / make state persistent receiver © Gerhard Weikum interaction stable: R 2 a promised interaction installed: R 2 b promised / install notification 32

Statechart for ICIC sender interaction / make state recoverable: / message running and message Statechart for ICIC sender interaction / make state recoverable: / message running and message transfer sent S 1, S 2 persistent promised interaction stability (known to be) and installed: install notification S 2 released [true] running message transfer message received / make state persistent interaction installed: R 2 promised / stability and install notification © Gerhard Weikum receiver 33

External Interaction Contract (XIC) and Transactional Interaction Contract (TIC) XIC: • input from user: External Interaction Contract (XIC) and Transactional Interaction Contract (TIC) XIC: • input from user: receiver promises ICIC, sender doesn’t • output to user: sender promises ICIC, receiver doesn’t consequence: crash may lead to lost input or duplicated output (for small but inherently unavoidable window of vulnerability) TIC: receiver of transactional request promises: • atomic state transition • faithful reply message • persistent reply message sender of transactional request promises: • persistent state and commit request message • unique messages © Gerhard Weikum 34

Special Case: Client-Server Application Recovery during normal operation User input Application. . . Process Special Case: Client-Server Application Recovery during normal operation User input Application. . . Process (Client) Database. . . Server 2 nd App Process output request reply . . . during client restart User Application Process (Client) Database Server 2 nd App Process © Gerhard Weikum crash replay input. . . request. . . reply ? . . . 35

General Considerations for Client-Server Stateful Application Recovery • Message logging for message recovery and General Considerations for Client-Server Stateful Application Recovery • Message logging for message recovery and deterministic program replay (of piecewise deterministic program) • Installation points for process recovery and reduced program replay • Server processes concurrent threads on behalf of many clients • Server “commits state” upon sending a reply to a client • Forced logging should be minimized • Server should be able to perform independent recovery © Gerhard Weikum 36

Server Reply Logging Method • Client and server each • maintain a message lookup Server Reply Logging Method • Client and server each • maintain a message lookup table (MLT) and • write message log entries to a stable log • Client performs lazy, non-forced, logging, and periodically creates intallation point, and force-logs user-input messages • Server forces its log buffer before sending a reply message • Server recovery rebuilds message lookup table and replays incomplete requests to produce reply may need logging of read/write interleaving among threads • Client recovery rebuilds MLT, reloads app from last installation point and replays application, intercepting message events and obtaining the contents of messages from local MLT or the server • Client sends stability notifications to facilitate server log truncation © Gerhard Weikum 37

Data Structures for Server Reply Logging lazy logging . . . installation point MSN Data Structures for Server Reply Logging lazy logging . . . installation point MSN Type 15 input 20 request 40 reply 45 output 65 input 70 request client server © Gerhard Weikum stable log file . . . 70 request 80 reply . . . message lookup table MSN Type 10 request 20 request 30 reply 40 reply force log upon reply 38

Replaying Incomplete Requests with Server Reply Logging MSN Type 15 input 20 request 15 Replaying Incomplete Requests with Server Reply Logging MSN Type 15 input 20 request 15 20 . . . client server . . . 10 20 30 . . . MSN Type 10 request 20 request 30 reply . . . R(x)W(x)R(y)W(y)R(y). . . © Gerhard Weikum 39

Log Truncation with Server Reply Logging client message lookup table 15 20 client c Log Truncation with Server Reply Logging client message lookup table 15 20 client c . . . client log 40 15 MSN Type 15 input 20 request 40 reply 45 output 70 request 70 + stability notification 45 20 40 70 80 server . . . server log 40 70 80 . . . 20 Redo. MSN for client c other clients © Gerhard Weikum 40

Efficient Multi-tier Application Recovery and Failure Masking altogether 16 messages (8 requests + 8 Efficient Multi-tier Application Recovery and Failure Masking altogether 16 messages (8 requests + 8 replies) per user request Client Expedia App Web Server 10 forced log writes: • 1 user request at client • 4 replies at data servers (transactional ICs) • 3 replies at external app servers (ICICs) • 2 app server replies at Web server (ICICs) • no forced logging between Web server and app server in same „recovery ensemble“ (CIC) Expedia App Server Data Server © Gerhard Weikum Sabre App Server Amadeus App Server Data Server as opposed to 32 forced log writes with 2 PC for every sender-receiver pair Data Server 41

Additional System Guarantees for Workflows Exactly-once execution guarantees to preserve the guaranteed semantic properties Additional System Guarantees for Workflows Exactly-once execution guarantees to preserve the guaranteed semantic properties in a failure-prone, distributed system environment High availability through server and data replication Scalable performance Guaranteed performance e. g. : response time < 5 seconds with probability 0. 95 for 1000 concurrently active workflows auto-tuning and zero-admin © Gerhard Weikum 42

Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration What Is It All About? WF Specification Techniques Statecharts CTL and Model Checking Summary and WF Execution Infrastructure Failure Handling Stochastic Modeling Open Research Issues © Gerhard Weikum • WF System Configuration • Summary and Open Research Issues 43

The Need for Performance and Qo. S Guarantees Check Availability (Look-Up Will Take 8 The Need for Performance and Qo. S Guarantees Check Availability (Look-Up Will Take 8 -25 Seconds) Internal Server Error. Our system administrator has been notified. Please try later again. © Gerhard Weikum 44

From Best Effort To Performance & Qo. S Guarantees ”Our ability to analyze and From Best Effort To Performance & Qo. S Guarantees ”Our ability to analyze and predict the performance of the enormously complex software systems. . . are painfully inadequate” (Report of the US President’s Technology Advisory Committee) • Very slow servers are like unavailable servers • Tuning for peak load requires predictability of workload config performance function • Self-tuning requires mathematical models • Stochastic guarantees for huge #clients P [response time 5 s] > 0. 95 © Gerhard Weikum 45

WFMS Architecture for E-Services Clients WF server type 2 WF server type 1 Comm WFMS Architecture for E-Services Clients WF server type 2 WF server type 1 Comm server . . . © Gerhard Weikum . . . App server type 1 App server type n 46

Digression: Markov Chains A discrete-time finite-state Markov chain is a pair ( , p) Digression: Markov Chains A discrete-time finite-state Markov chain is a pair ( , p) with a state set ={s 1, . . . , sn} and a transition probability function p: [0, 1] with the property for all i where pij : = p(si, sj). A Markov chain is called ergodic (stationary), if for each state sj the limit exists and is independent of si, with for t>1 and pij(t) : = pij for t=1. For an ergodic finite-state Markov chain, the stationary state probabilities pj can be computed by solving the linear equation system: © Gerhard Weikum 47

Markov Chain Example 0. 2 0. 8 0: sunny 0. 5 1: cloudy 2: Markov Chain Example 0. 2 0. 8 0: sunny 0. 5 1: cloudy 2: rainy 0. 3 0. 4 p 0 = 0. 8 p 0 + 0. 5 p 1 + 0. 4 p 2 p 1 = 0. 2 p 0 + 0. 3 p 2 = 0. 5 p 1 + 0. 3 p 2 p 0 + p 1 + p 2 = 1 p 0 0. 657, p 1 = 0. 2, p 2 0. 143 © Gerhard Weikum 48

Digression: Continuous Time Markov Chains A finite-state continuous-time Markov chain (CTMC) is a pair Digression: Continuous Time Markov Chains A finite-state continuous-time Markov chain (CTMC) is a pair ( , q) with a state set ={s 1, . . . , sn} and transition rates q: with A CTMC can be „factorized“ into a discrete-time Markov chain with transition probabilities and exponentially distributed state residence times with For an ergodic CTMC the stationary state probabilities pj can be computed by solving the system of linear flow balance equations: and © Gerhard Weikum 49

CTMC Example 1: Stationary Availability only transient, repairable failures availability = P[system is operational CTMC Example 1: Stationary Availability only transient, repairable failures availability = P[system is operational at random time point] Single server: Mirrored server pair: 1 / MTTF 1: up 0: down 2 / MTTF both 2: up 1 / MTTR p 0 / MTTR = p 1 / MTTF p 1 /MTTF = p 0 / MTTR p 0 + p 1 = 1 © Gerhard Weikum 1 / MTTR 1 / MTTF 1 up 1: 1 down both 0: down 1 / MTTR p 1 / MTTR = 2 p 2 / MTTF + p 0 / MTTR = p 1 / MTTR + p 1 / MTTF = p 0 / MTTR p 0 + p 1 + p 2 = 1 availability of server pair 50

CTMC Example 2: Reliability some repairable, some non-repairable failures reliability = P[lifetime of system CTMC Example 2: Reliability some repairable, some non-repairable failures reliability = P[lifetime of system t] or E[lifetime] Mirrored disk pair: 2 / MTTF both 2: up 1 / MTTF 1 up 1: 1 down both 0: down E[time between entering i and entering j] 1 / MTTR E 21 = H 2 = MTTF / 2 E 20 = H 2 + E 10 H 1 = MTTF MTTR / (MTTF + MTTR) E 10 = H 1 +MTTF / (MTTF+MTTR) E 20 E 12 = H 1 E 20 = E[time until absorbing state is reached from initial state] © Gerhard Weikum 51

Digression: Basics of Queuing Systems (1) prob. distr. of scheduling interarrival time policy (e. Digression: Basics of Queuing Systems (1) prob. distr. of scheduling interarrival time policy (e. g. : M = exp. distr. ) (e. g. : FCFS) . . . customers (requests) arrival queue prob. distr. of service time (e. g. : M = exp. distr. ) service station e. g. , of type M/M/1/ /FCFS waiting time service time departure time response time (sojourn time) © Gerhard Weikum 52

Digression: Basics of Queuing Systems (2) Classification of queueing systems: A/B/m/K/Z with A: distribution Digression: Basics of Queuing Systems (2) Classification of queueing systems: A/B/m/K/Z with A: distribution of interarrival times (type of arrival process) B: distribution of service times m: number of service stations with shared queue K: capacity of the queue (often assumed to be ) Z: service scheduling policy (e. g. , FCFS, priority-based, etc. ) Measures of interest: – arrival rate (1/mean of interarrival time distr. ) X – throughput (departure rate): served requests per time unit W – (mean) waiting time in queue R – (mean) response time S – (mean) service time (with higher moments S 2, S 3, . . . ) – utilization (probability of server being busy) N – (mean) queue length, including request in service © Gerhard Weikum 53

Digression: Basics of Queuing Systems (3) Operational Laws (queuing theory theorems): 1. Utilization law: Digression: Basics of Queuing Systems (3) Operational Laws (queuing theory theorems): 1. Utilization law: = X * S 2. Forced flow law: X = for <1 3. Little‘s law: N = X * R N- =X*W © Gerhard Weikum 54

Digression: M/M/1 Queuing Systems N(t): number of requests in queue (or in service) : Digression: M/M/1 Queuing Systems N(t): number of requests in queue (or in service) : arrival rate 0 1 2 . . . : service rate flow balance equations: for n 1 and for n 0 for response time distribution: © Gerhard Weikum 55

Digression: M/G/1 Queuing Systems N(t) at request departure times forms embedded Markov chain with Digression: M/G/1 Queuing Systems N(t) at request departure times forms embedded Markov chain with Laplace-Stieltjes transform of random variable X: © Gerhard Weikum 56

Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration What Is It All About? WF Specification Techniques Statecharts CTL and Model Checking Summary and WF Execution Infrastructure Failure Handling Stochastic Modeling WF System Configuration Open Research Issues © Gerhard Weikum • Summary and Open Research Issues 57

Stochastic Model of Workflow System clients Workload: workflow types, activity types App server service Stochastic Model of Workflow System clients Workload: workflow types, activity types App server service requests 0. 5 0. 4 0. 1 Markov model load per server max. throughput, E[waiting time] Performability model © Gerhard Weikum System config: Server types (replicated) App server WF server ORB WF server E[downtime], P[degradation] # replicas, failure rates restart rates Performance model: M/G/1 queues Availability model 58

Stochastic Modelling of Control Flow workflow spec. as statechart resulting CTMC /st!(Act 1) S Stochastic Modelling of Control Flow workflow spec. as statechart resulting CTMC /st!(Act 1) S 1 [C 1] lse fa ] [Act 3_DONE] /st!(Act 4) [not(C 4)] /st!(Act 3) 3 1 P[C 4=false] 4 S 4 P[C 4=true] [C 4]/st!(Act 5) © Gerhard Weikum S 5 2 1 = S 3 [Act 2_DONE] /st!(Act 3) C 1 [Not(C 1)] /st!(Act 3) 1 P[ /st!(Act 2) S 2 P[C 1=true] 5 1 A 59

Modelling of Loop Iterations • Assumption: # iterations uniformly distributed over {m. . n} Modelling of Loop Iterations • Assumption: # iterations uniformly distributed over {m. . n} • Expansion of loop states • Modified transition probabilities C 1 P[ = = C 1 P[ 1 fa fa lse 1 ] ] 3 3, 1 1 P[C 4=false] 4 4, 1 P[C 4=true] © Gerhard Weikum 1 1 3, 2 1 4, 2 1 … … 1 … 3, m 1 … 3, n … 1 -p 1 1–p 4, m 1/(n-m+1) =: p … 4, n 1 60

Stochastic Load Model: Some Detail Continuous-time Markov chain (CTMC) pik transition probabilities pij sk Stochastic Load Model: Some Detail Continuous-time Markov chain (CTMC) pik transition probabilities pij sk mean state residence times Hi s 0 si Hi s. A state departure rates vi = 1/Hi pij sj transition rates qij = vi pij Mean turnaround time f 0 A (expected first-passage time for s. A) derived by solving: Expected generated load L derived from Markov reward model: with and probabilities © Gerhard Weikum for uniformized CTMC 61

(Stationary) Availability Model 2, 2, 0 2, 1, 0 2, 2, 1, 1 2, (Stationary) Availability Model 2, 2, 0 2, 1, 0 2, 2, 1, 1 2, 2, 2 2, 0, 2 1, 1, 0 1, 2, 1 1, 0, 2 0, 1, 0 0, 2, 1 0, 0, 0 0, 1, 1 0, 1, 2 i(Yi – (Xi-1)) 1, 0, 1 1, 1, 2 0, 2, 0 0, 2, 2 1, 0, 0 1, 1, 1 1, 2, 2 System states: 2, 0, 1 2, 1, 2, 0 © Gerhard Weikum 2, 0, 0, 1 0, 0, 2 stationary state prob. (strong) availability = 62

Availability Example • 3 server types: –communication server: one failure per month –workflow engine: Availability Example • 3 server types: –communication server: one failure per month –workflow engine: one failure per week –application server: one failure per day • repair rate for each server type: • expected unavailability depending on configuration (Y 1, Y 2, Y 3): © Gerhard Weikum 63

Non-exponentially Distributed Time-to-Failure and Downtime Important to capture realistic behavior (e. g. , planned Non-exponentially Distributed Time-to-Failure and Downtime Important to capture realistic behavior (e. g. , planned maintenace): Approximate more general distributions of state-residence time by E 1, n distribution: q qij sj 0 state sj H qjk 1 -q sj 1 sj 2 H . . . H sjn H Special case q=0: n exponential stages with mean H behave like Erlang-n distributed state with mean n. H © Gerhard Weikum 64

Workflow System Configuration Tool Workflow Repository Operational Workflow System Config. Mapping Modeling Monitoring Calibration Workflow System Configuration Tool Workflow Repository Operational Workflow System Config. Mapping Modeling Monitoring Calibration Admin Hypothetical config Evaluation Recommendation © Gerhard Weikum Max. Throughput Avg. waiting time Expected downtime 65

Workflow System Configuration Tool Workflow Repository Operational Workflow System Config. Mapping Modeling Monitoring Calibration Workflow System Configuration Tool Workflow Repository Operational Workflow System Config. Mapping Modeling Monitoring Calibration Evaluation Recommendation © Gerhard Weikum Admin Goals: min(throughput) max(waiting time) max(downtime) + constraints Min-cost config. 66

Workflow System Configuration Tool Workflow Repository Operational Workflow System Config. Mapping Modeling Monitoring Calibration Workflow System Configuration Tool Workflow Repository Operational Workflow System Config. Mapping Modeling Monitoring Calibration Evaluation Recommendation © Gerhard Weikum Goals: min(throughput) max(waiting time) max(downtime) + constraints Automatic reconfiguration 67

Goliat: Goal-driven Auto-configuration Tool (for Mentor-lite) © Gerhard Weikum 68 Goliat: Goal-driven Auto-configuration Tool (for Mentor-lite) © Gerhard Weikum 68

Prediction Accuracy of Goliat EC_SC EC_INIT_S /st!(New. Order) New. Order_S [Pay. Bill and New. Prediction Accuracy of Goliat EC_SC EC_INIT_S /st!(New. Order) New. Order_S [Pay. Bill and New. Order_DONE] [Pay. By. Credit. Card and New. Order_DONE] /st!(Credit. Card. Check) Credit. Card. Check_S [Credit. Card. OK and Credit. Card. Check_DONE] Shipment_S /st!(Notify) Notify_INIT_S [Credit. Card. Not. OK and Credit. Card. Check_DONE] Benchmark: E-Commerce Order Processing Workflow Results: [Notify_DONE] Notify_S Notify_EXIT_S Delivery_INIT_S /st!(Find. Store) [Item. Available and Check. Store_DONE] Find. Store_S Check. Store_S [Items. Left and Find. Store_DONE] /fs!(Item. Available) st!(Check. Store) [All. Items. Processed] [in(Notify_EXIT_S) and in(Delivery_EXIT_S) and Pay. Bill] /st!(Payment) Delivery_EXIT_S on Mentor-lite configuration: [in(Notify_EXIT_S) and in(Delivery_ EXIT_S) and Pay. By. Credit. Card] /st!(Credit. Card. Charge) Credit. Card. Charge_S Payment_S [Payment_DONE] © Gerhard Weikum [Credit. Card. Charge_DONE] EC_EXIT_S 69

Multi-class Workloads with Diff Qo. S What-if portfolio analyses . . Guest (potential future Multi-class Workloads with Diff Qo. S What-if portfolio analyses . . Guest (potential future customer) Online brokerage Backend servers Premium customer “Channel” Middleware Customer Type Class-specific e-Service request queues Stock price info service class priorities ? ? ? © Gerhard Weikum 70

HEART: Help for Ensuring Acceptable Response Time Input: • class-specific arrival rates, service time HEART: Help for Ensuring Acceptable Response Time Input: • class-specific arrival rates, service time moments • class-specific goals, e. g. : E[RT(class 1)] 5 s E[RT(class 2)] 2 s Var[RT(class 2)] 4 s 2 P[RT(class 3) 5 s] 0. 95. . . Output: class-specific priorities for messaging middleware (MQ Series) for satisfying all goals © Gerhard Weikum 71

Autonomic Computing Vision: all computer systems must be self-managed, self-organizing, and self-healing Eight laws: Autonomic Computing Vision: all computer systems must be self-managed, self-organizing, and self-healing Eight laws: Motivation: • know thy self • ambient intelligence • configure thy self (sensors in every room, your body optimize thy self • etc. ) • heal thy self • reducing complexity and improving manageability • protect thy self of very large systems • grow thy self Role model: • know thy neighbor biological systems (really ? ? ? ) • help thy users My interpretation: need component design for predictability: self-inspection, self-analysis, self-tuning © Gerhard Weikum 72

Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration Outline Part A: WF Specification and Verification Part B: WF System Architecture and Configuration What Is It All About? WF Specification Techniques Statecharts CTL and Model Checking Summary and WF Execution Infrastructure Failure Handling Stochastic Modeling WF System Configuration Summary and Open Research Issues © Gerhard Weikum Open Research Issues 73

Mentor-lite Prototype Worklist Mgt History Mgt Wrapper App Workflow Engine Statechart Interpreter XML Other Mentor-lite Prototype Worklist Mgt History Mgt Wrapper App Workflow Engine Statechart Interpreter XML Other WF Engines (SAP etc. ) © Gerhard Weikum Comm. Mgr Log. Mgr Workflow Log Event-Process Chains etc. Specification, verification, configuration workbench Statecharts Workflow Repository Worklist DB 74

Summary and Open Research Issues Dependable, self-organizing („autonomic“) systems require • comprehensive data/message/process recovery Summary and Open Research Issues Dependable, self-organizing („autonomic“) systems require • comprehensive data/message/process recovery with failure masking • and (dynamic) configuration and tuning procedures based on tractable mathematical models Interesting research topics for graduate students: rigorous verification of efficient data/message/process recovery algorithms comprehensive & efficient implementation of recovery contracts in WFMS / Web service environment guarantees about response time percentiles for multi-class workloads dynamic reconfiguration based on transient performability predictions for given time horizon comprehensive configuration tool for commercial WFMS / Web service suite • • • © Gerhard Weikum 75

The Future Workflow technology is successful Commercial world is driven by time to market The Future Workflow technology is successful Commercial world is driven by time to market Need further courageous steps towards Provably correct behavior Guaranteed quality of results Predictably good performance High reliability and availability ”Our ability to analyze and predict the performance of the enormously complex software systems. . . are painfully inadequate" (Report of the US President’s Technology Advisory Committee) "Success is a lousy teacher" (Bill Gates) © Gerhard Weikum 76

Recommended Literature • F. Leymann, D. Roller: Production Workflow – Concepts and Techniques, Prentice Recommended Literature • F. Leymann, D. Roller: Production Workflow – Concepts and Techniques, Prentice Hall, 2000 • W. van der Aalst, K. van Hee: Workflow Management – Models, Methods, and Systems, MIT Press, 2002 • A. Dogac, L. Kalinichenko, T. Özsu, A. Sheth (Eds. ): Workflow Management Systems and Interoperatibility, Springer, 1998 • D. Harel, M. Politi: Modeling Reactive Systems with Statecharts - The Statemate Approach, Mc. Graw Hill, 1998 • E. M. Clarke, O. Grumberg, D. Peled: Model Checking, MIT Press, 2000 • G. Weikum: Towards Guaranteed Quality and Dependability of Information Services, German Database Conf. (BTW), 1999 • G Weikum, G. Vossen: Transactional Information Systems - Theory, Algorithns, and the Practice of Concurrency Control and Recovery, Morgan Kaufmann, 2001 • R. Barga, D. Lomet, G. Weikum: Recovery Guarantees for General Multi-Tier Applications, IEEE CS Data Engineering Conf. , 2002 • H. C. Tijms, Stochastic Models – An Algorithmic Approach, Wiley & Sons, 1994 • G. Haring, C. Lindemann, M. Reiser (Eds. ): Performance Evaluation – Origins and Directions, Springer, 2000 • M. Gillmann, G. Weikum, W. Wonner: Workflow Management with Service Quality Guarantees, ACM SIGMOD Conf. , 2002 • G. Weikum (Editor): Special Issue on Infrastructure for Advanced E-Services, IEEE CS Data Engineering Bulletin, March 2001 © Gerhard Weikum 77