
16bbdca3ccd0f0667c042d4b8d96e827.ppt
- Количество слайдов: 36
Timeliness, Failure Detectors, and Consensus Performance Alex Shraer Joint work with Dr. Idit Keidar Technion – Israel Institute of Technology In PODC 2006
How do you survive failures and achieve high availability?
Replication
State Machine Replication • Replicas are identical deterministic state machines • Process operations in the same order remain consistent a b a a b c
Consensus • Building block for state machine replication • Each process has an input, should decide on an output so that– Agreement: decisions are the same Validity: decision is input of one process Termination: eventually all correct processes decide
Basic Model • Message passing • Links between every pair of processes – do not create, duplicate or alter messages (integrity) • Process and link failures Keidar & Shraer, Technion, Israel PODC 2006
Synchronous Model • Known bound Δ on message delay, processing • Very convenient for algorithms • Requires very conservative timeouts – in practice: avg. latency < max. latency 100 [Cardwell, Savage, Anderson 2000], [Bakr-Keidar 2002] – Computation might be too sloooow! Keidar & Shraer, Technion, Israel PODC 2006
Asynchronous Model • Unbounded message delay • Much more practical Fault-tolerant consensus impossible [FLP 85] Keidar & Shraer, Technion, Israel PODC 2006
Eventually Stable (Indulgent) Models • Initially asynchronous – for unbounded period of time • Eventually reach stabilization – GST (Global Stabilization Time) – following GST certain assumptions hold • Examples – ES (Eventual Synchrony) – starting from GST all links have a bound on message delay [Dwork, Lynch, Stockmeyer 88] – failure detectors • Example: W (leader) failure detector – Outputs one trusted process – From some point, all correct processes trust the same correct process [Chandra, Toueg 96], [Chandra, Hadzilacos, Toueg 96] Keidar & Shraer, Technion, Israel PODC 2006
Indulgent Models: Research Trend • Weaken post-GST assumptions as much as possible [Guerraoui, Schiper 96], [Aguilera et al. 03, 04], [Malkhi et al. 05] Weaker = better? Keidar & Shraer, Technion, Israel PODC 2006
Indulgent Models: Research Trend You only need ONE machine with eventually ONE timely link. Buy the hardware to ensure it, set the timeout accordingly, and EVERYTHING WILL WORK. Keidar & Shraer, Technion, Israel PODC 2006
Consensus with Weak Assumptions Why isn’t anything happening ? ? ? Network Don’t worry! It will eventually happen! Keidar & Shraer, Technion, Israel PODC 2006
Consensus with Weak Assumptions Network Keidar & Shraer, Technion, Israel PODC 2006
What’s Going On? • In practice, bounds just need to hold “long enough” for the algorithm (TA) to finish • But TA depends on our synchrony assumptions – with weak assumptions, TA might be unbounded • For practical systems, eventual completion of the job is not enough! Keidar & Shraer, Technion, Israel PODC 2006
Our Goal • Understand the relationship between: – assumptions (1 timely link, failure detectors, etc. ) that eventually hold – performance of algorithms that exploit these assumptions, and only them • Challenge: How do we understand the performance of asynchronous algorithms that make very different assumptions? Keidar & Shraer, Technion, Israel PODC 2006
Typical Metric: Count “Rounds” • Algorithms normally progress in rounds, though rounds are not synchronized among processes at process pi: forever do send messages receive messages while (!some conditions) compute… • Previous work: – look at synchronous runs (every message takes exactly time) – count rounds or “ s” [Keidar, Rajsbaum 01], [Dutta, Guerraoui 02], [Guerraoui, Raynal 04] [Dutta et al. 03], etc. Keidar & Shraer, Technion, Israel PODC 2006
Are All “Rounds” the Same? • Algorithm 1 waits for messages from a majority that includes a pre-defined leader in each round – takes 3 rounds • Algorithm 2 waits for messages from all (unsuspected) processes in each round – E. g. , group membership – takes 2 rounds Keidar & Shraer, Technion, Israel PODC 2006
Do All Rounds Cost the Same? LAN Market Oranges $1. 00 Keidar & Shraer, Technion, Israel Apples $1. 00 PODC 2006
Do All “Rounds” Cost the Same? • On the Internet, n 2 timely links can be a rarity, [Bakr, Keidar 02] • Timely communication require timeouts – with leader orders of magnitude smaller – with majority Oranges $100. 00 WAN Market Keidar & Shraer, Technion, Israel Apples $1. 00 PODC 2006
GIRAF General Round-based Algorithm Framework • Inspired by Gafni’s RRFD, generalizes it • Organize algorithms into rounds • Separate algorithm logic from waiting condition • Waiting condition defines model • Allows reasoning about lower and upper bounds for rounds of different types
Defining Properties in GIRAF • Environment can have – perpetual properties – eventual properties • In every run r, there exists a round GSR(r) • GSR(r) – the first round from which: – no process fails – all eventual properties hold in each round Keidar & Shraer, Technion, Israel PODC 2006
Defining Properties • Timeliness of incoming, outgoing and bidirectional links. • Some known failure detector properties • Use properties to clearly define models Keidar & Shraer, Technion, Israel PODC 2006
Some Results: Context • Consensus problem • Global decision time metric – Time until all correct processes decide • Message passing • Crash failures – t < n/2 potential failures out of n>1 processes Keidar & Shraer, Technion, Israel PODC 2006
◊LM Model: Leader and Majority • Nothing required before GSR • In every round k ≥ GSR – Every correct process receives a round k message from a majority of processes, one of which is the Ω-leader. • Practically requires much shorter timeouts than Eventual Synchrony [Bakr, Keidar] Keidar & Shraer, Technion, Israel PODC 2006
◊LM: Previous Work • Most Ω-based algorithms wait for majority in each round (not ◊LM) • Paxos [Lamport 98] works for ◊LM – Takes constant number of rounds in Eventual Synchrony (ES) – But how many rounds without ES? Keidar & Shraer, Technion, Israel PODC 2006
Paxos Run in ES Ω Leader (“prepare”, 2) 1 2 5 . . . 20 yes 5 . . . (“prepare”, 21) 20 20 21 21 . . 21 21 21 no Ballot. Num number of attempts to decide initiated by leaders Keidar & Shraer, Technion, Israel (Commit, 21, v 1) 21 yes (Commit, 21 , v 1) decide v 1 PODC 2006
Ω Leader Paxos in ◊LM (w/out ES) (“prepare”, 2) 1 2 (“prepare”, 9) ok 2 9 no (5) 5 5 Commit takes no (8) O(n) rounds! (“prepare”, 14) ok ok 9 9 8 Ballot. Num 8 13 13 20 20 no (13) 13 20 GSR+1 Keidar & Shraer, Technion, Israel GSR+2 9 GSR+3 PODC 2006
What Can We Hope For? • Tight lower bound for ES: 3 rounds from GSR [DGK 05] • ◊LM weaker than ES • One might expect it to take a longer time in ◊LM than in ES Keidar & Shraer, Technion, Israel PODC 2006
Result 1: Don't Need ES • Leader and majority can give you the same performance! • Algorithm that matches lower bound for ES! Keidar & Shraer, Technion, Israel PODC 2006
Our ◊LM Algorithm in a Nutshell • Commit with increasing ballot numbers, decide on value committed by majority – like Paxos, etc. • Challenge: Don’t know all ballots, how to choose the new one to be highest one? • Solution: Choose it to be the round number • Challenge: rounds are wasted if a prepare/commit fails. • Solution: pipeline prepares and commits: try in each round • Challenge: do they really need to say no? • Solution: support leader’s prepare even if have a higher ballot number – challenge: higher number may reflect later decision! Won’t agreement be compromised? – solution: new field “trust. Me” ensures supported leader doesn't miss real decisions Keidar & Shraer, Technion, Israel PODC 2006
Example Run: GSR=100
Question 2: ◊S and Ω Equivalent? • ◊S and Ω equivalent in the “classical” sense [Chandra, Hadzilacos, Toueg 96] – Weakest for consensus • ◊S: eventually (from GSR onward), – all faulty processes are suspected by every correct process – there exists one correct process that is not suspected by any correct process. • Can we substitute Ω with ◊S in ◊LM? Keidar & Shraer, Technion, Israel PODC 2006
Result 2: ◊S and Ω not that Equivalent • Consensus takes linear time from GSR • By reduction to mobile failure model [Santoro, Widmayer 89] Keidar & Shraer, Technion, Israel PODC 2006
Result 3: Do We Need Oracles? • Timely communication with majority suffices! • ◊AFM (All-From-Majority) simplified: – In every round k ≥ GSR, every correct process p receives round k message from a majority of processes, and p’s message reaches a majority of processes. • Decision in 5 rounds from GSR – 1 st constant time algorithm w/out oracle or ES – idea: information passes to all nodes in 2 rounds Keidar & Shraer, Technion, Israel PODC 2006
Result 4: Can We Assume Less? • ◊MFM: Majority from Majority – The rest receive a message from a minority • Only a little missing for ◊AFM • Stronger than models in literature [Aguilera et al. 03, 04], [Malkhi et al. 05] • Bounded time from GSR impossible! Keidar & Shraer, Technion, Israel PODC 2006
Conclusions • Which guarantees should one implement ? – weaker ≠ better • some previously suggested assumptions are too weak – sometimes a little stronger = much better • worth longer timeouts / better hardware – ES is not essential • not worth longer timeouts / better hardware – future: more models, bounds to explore • GIRAF