Clock Synchronization Ken Birman Why do clock

Clock Synchronization Ken Birman

Why do clock synchronization? w Time-based computations on multiple machines n n n Applications that measure elapsed time Agreeing on deadlines Real time processes may need accurate timestamps w Many applications require that clocks advance at similar rates n n n Real time scheduling events based on processor clock Setting timeouts and measuring latencies Ability to infer potential causality from timestamps

Famous example w Scud rockets launched by Iraq towards Israel w Ground-based Patriot missiles fire back w But missiles always missed the warhead! w Why?

Famous example w Scud rockets launched by Iraq towards Israel w Ground-based Patriot missiles fire back w But missiles always missed the warhead! w Why? n n After 72 hours of waiting control system was out of sync relative to Patriot guidance system “be at (x, y, z) at time t” was misinterpreted!

Goals for clock synchronization? w We might be concerned with n n n Clock accuracy relative to real-time Clock precision, or degree to which correct clocks agree with one-another Rate of possible clock drift w Would we want the Patriot system to be optimally accurate, or optimally precise, if we can’t have both?

The System Model w Hardware clocks n n Physical clock of process q designated Rq(t) Clocks have a drift rate ρ: l l n (1+ ρ)-1(t 2 -t 1) Rp(t 2)- Rp(t 1) (1+ ρ) (t 2 -t 1) Implies that rate of drift is bounded by dr = ρ(2+ ρ)/(1+ ρ) For Byzantine model assume nothing about the clock l May increase or decrease or return a random number l May get “stuck” (surprisingly common in real systems) l Cannot necessarily be modeled by functions. w There is a limit tdel on message latency

Clock synchronization goals w A clock synchronization protocol implements a virtual clock function mapping real time t to Cp(t) w Agreement condition: n n |Cp(t) - Cq(t)| Dmax for all correct p, q Dmax bounds the difference between two virtual clocks running on different processors w Accuracy condition: n n (1+ )-1 t + a Cp(t) (1+ )t +b, for constants a, b, Says that p’s clock must be within a linear envelope of “real time”

Clock Time (1+ )t +b Clocks and True Time k Id b a loc C al e C o Cl l a u irt V : ck (t) p -1 t + ) (1 True Time +a

Authenticated Algorithm w Solution for system of n processes, at most f of which are faulty. w Let P be the logical time between resynchronizations n n A process expects the k’th resynchronization at time k. P When Cp(t)=k. P broadcast a signed message for the form “round k” When a process receives f+1 such messages, it sets its logical clock Cq(t)=k. P+ for some constant greater than the increase in Cq since q sent its own round k message. Also, q relays round k messages it receives w Srikanth and Toueg give proofs of correctness. Insight: at least one of the round k messages is from a correct process

Overview of proof w Lemma 1: The k’th resynchronization is bounded in size by some constant dmin, such that for k 1, endk-begink dmin w Lemma 2: After k’th resynchronization, correct clocks differ by at most dmin(1+ρ) w Lemma 3: No correct process starts its k’th clock until at least some correct process is ready to do so: for k 1, begink readyk w Lemma 4: All correct processes start their k’th clock soon after one correct process is ready to do so: endk-readyk (1+ ρ)Dmax+tdel w Lemma 5, 6, 7: The periods between resynchronizations and maximum deviations between clocks are bounded and do not overlap w Theorem: the algorithm achieves agreement & accuracy

Optimality w Bound on accuracy: Srikanth and Toueg show that for any synchronization, accuracy cannot exceed that of the underlying hardware clocks w And they show that their simple algorithm achieves optimal accuracy w Proof is remarkably tricky!

Unauthenticated algorithm w The algorithm relies on properties of the message system: n n n Correctness: If at least f+1 correct processes broadcast round k messages by time t, then every correct process accepts a message by time t+tdel Unforgeability: If no correct process broadcasts a round k message by time t, then no correct process accepts the message by time t or earlier Relay: If a correct process accepts the message round k at time t, then every correct process does so by time t+tdel

Simulating Authentication w Here they reference a different paper: n n T. K. Srikanth and S. Toueg. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Distributed Computing 2(2): 80 -94 (1987). Based on an echoing scheme where witnesses to a broadcast effectively “sign it” l l Cost is O(n 3) messages per broadcast round, hence per clock synchronization round Paper claims cost is O(n 2) but this assumes a built-in way of sending one message to n processes in one step w Realistic cost of resynchronization is something like O(n 4) since each process needs to do one of these broadcasts

Other ways to think about resynchronization w Cristian: probabilistic clock synchronization n n Starts with observation about RPC If I “ping” you in a network Most round-trip times will be small l But distribution may have a heavy tail l n Expressed in terms of expectation: “with probability p a reply to a ping will be received within time ”

Cristian’s scheme w His idea: System contains some number of time “authorities” that everyone trusts n i. e. they have a GPS receiver – cheap and common… w Periodically, client machine a pings authority b asking “what time is it? ” w If round-trip time is less than , then a replaces Ca(t) with (Ca(t)+ (Cb(t)- /2))/2 w With high probability this scheme gives very good clock synchronization. Not tolerant of faults but can be extended into a fault-tolerant solution

Verissimo and Rodriguez w They notice that clock synchronization is really bounded not by actual latencies but by uncertainty in latency w Instead of , think of min+ , for some 0 w Leads to a solution where accuracy is limited by rather than by

Other practical considerations w Real systems have n n Hardware from multiple vendors Operating systems from multiple sources w Tends to limit our ability to synchronize clocks n n Several widely supported standards but no single solution that everyone uses Hence when crossing machine boundaries, expect problems!

Real-world clocks w Real systems n n Sometimes stop the clock Sometimes even run the clock backwards! w Better approach? n n n Pick a constant and synchronize during periods of time long If clock needs to be adjusted by , adjust at rate / over the course of a period, value catches up Avoids sudden discontinuities or stopping the clock

Summary w We often assume synchronized clocks w In practice, quality of synchronization remains relatively poor w At best synchronization will be limited by quality of physical clocks, rates of physical clock drift, and uncertainty in latencies w Cristian’s probabilistic scheme makes these uncertainties explicit and also works very well