Synchronization Tanenbaum Chapter 5 Synchronization

Synchronization Tanenbaum Chapter 5

Synchronization • • Multiple processes sometimes need to agree on order of a sequence of events. This requires some synchronization, which is more elaborate in distributed systems. Synchronization may be based on time (absolute or relative), leader election The aim is to make the order of events global…

Clock Synchronization Actual Time • Execution of Make utility in a distributed system: The edited local version is created later than the object file according to the local clocks, although this was because of the discrepancy of local clocks. – When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time. 2143 looks earlier than 2144, although it the other way around in actual time.

Physical Clocks • Maintenance of a physical clock at a single machine is no problem, as every event would occur in the right order. • Only when more than computer are involved in a distributed application would this time matter! • The real time is based on the solar day: dividing the time between two consecutive Sun TRANSITs( where it is at its highest point in the sky-NOON) into 24, or 1440 minutes, or 86400 seconds.

Physical Clocks: Intern. Atomic Time Computation of the mean solar day. • The period of earth’s rotation is not constant • Starting 1958 International Atomic Time (TAI) was accepted, counting the number transitions of Cesium 133 in an average solar second (9, 192, 631, 770 transitions=1 second), one solar second is 1/86400 solar day, which is between to sun peak times in the sky. Averaged over 50 labs. • Solar day length seems to change because of atmospheric drag and tidal friction issues on the earth, detected 1940.

Physical Clocks: leap seconds • TAI seconds are of constant length, unlike solar seconds. However leap seconds are introduced when necessary, to keep in phase with the solar clock, 1 sec in every 800 msec of discrepancy is incorporated. So far, 30 leap seconds are introduced to achieve UTC-Universal Coordinated Time, thus taking into account the changes in the solar system…

Clock Synchronization Algorithms • The relation between clock time and UTC when clocks in the distributed env. tick at different rates. – In perfect world, C(t)=t, where t is the UTC, C(t) is value of the local clock, on all machines. With modern timer chips, the relative error is 10 -5. – Two clocks needs to be synchronized according to maximum drift rate for each clock. – For a clock to work within its specification the first derivative of local time should satisfy 1 - ≥d. C(t)/dt≤ 1+

Clock Synchronization Algorithms • If two clocks are not to be allowed to differ more than in a synchronization period, then a resynchronization is required every /2 period , when two clocks drifts in opposite direction, 2.

GPS-Global Positioning System • With three satellites it possible to compute the position of a station on earth. • A forth satellite will allow to compute the deviation of the earth station from the actual time. • Because various imperfect conditions, the distance can be in error by 1 -5 meters, and time by 20 -35 nanoseconds

Network Time Protocol-NTP • Machines in distributed environment probe each other to compute clock offsets and the propagation delay. • Christiam’s algorithm is one such algorithm. • In NTP the server is passive, other machines periodically asks it for the time. • In case of Berkeley algorithm, the server is active, polling every machine from time to ask what time it is there.

Formal Network Time Protocol v v v v v RFC 1305 defines the NTP The recent implementations provide accuracy of up to 1 microseconds It is designed to execute on top of IP and UDP NTP is organized into multiple Tree structures, with primary servers at the root the secondary servers at the internal nodes NTP design goals: accurate UTC synchronization, Survival despite the losses of connectivity, allow frequent resynchronization, protect against malicious interference NTP communicates clock offset (diff between two clocks), round-trip delay, dispersion (max error) Statistical technique is used, based on multiple comparisons of timing information exchanged It may operate in three modes: multicast, client/server, symmetric The SNTP-Simple NTP is also defined in RFC 1769, with no fault tolerance

Cristian's Algorithm Getting the current time from a time server. • The time should never set to smaller value, as it will cause consistency problems. So, a large discrepancy should be consumed slowly, by adjusting numb of msec to be added per clock interrupt. • (T 1 -T 0 -I)/2 is the one way propagation time, counting for the server’s request (interrupt) handling time I. Cristian suggest taking average of the delays in the system… Note that the time server is passive.

The Berkeley Algorithm: the time server is active and poling the clients. a) b) c) d) The time daemon sends its time and asks all the other machines for their clock discrepancy values The answers from the machines is received an average time discrepancy is computed, for each computer… Then, the time daemon tells everyone else how to adjust their clock The daemons’s time may be set periodically by the operator or radio time servers, if not no harm is done if external communication is not possible…

Distributed Clock synchronization • • • Cristian’s and Berkeley’s algorithms are centralized In decentralized distributed algorithms case, every machine should periodically broadcast its time and collects time from other peers. Every peer comes to conclusion about the average time, using the same algorithm distributedly, taking into account the communication latencies…

Use of Synchronized clocks • • • Used in the implementation of at-most-once message delivery: – Every message is sent with a connection number and a time stamp – For each connection the recent time stamp is recorded – If any message on any connection is lower than the recorded one, the message is discarded. To remove old messages, – The server removes all the messages with old time stamps older than G=Current. Time-Max. Life. Time-Max. Clock. Skew – Max. Life. Time is the max time a message can live in the system… – Max. Clock. Skew is the distance from UTC. To recover from a crash, every T, the normalized current time G needs to be written to the hard disk, to be processed later, during the recovery phase….

Logical Clocks v Time stamp based coordination to achieve a consistent global state: to be covered later… v Use of clock to elect Coordinators, which is important in distributed systems

Coordinator or Leader Election Algorithms • Bully Algorithm, Fig 5. 11 – A process holds an election for the coordinator, if it thinks coordinator is failed: • • Send an election message to all the processes with higher id numbers, If no one responds process declares itself as coordinator If one of the higher-up stations answer, it withdraws from the contest Ring Algorithm, Fig 5. 12 – The process are logically or physically ordered in the form of a ring: • Process detecting the missing coordinator, sends a message down the ring, if message comes back to the sender, then it declares itself as the coordinator…

The Bully Algorithm (1) The bully election algorithm a) Process 4 holds an election b) Process 5 and 6 respond, telling 4 to stop c) Now 5 and 6 each hold an election

The Bully Algorithm (2) d) e) Process 6 tells 5 to stop Process 6 wins and tells everyone

A Ring Algorithm • Election algorithm using a ring. Both 5 and 2 decide on failure of the coordinator, at about the same time. Both messages make a full trip round the network.

Mutual Exclusion: • • • Mutual exclusion involves execution of critical sections, one at a time, in mutual exclusion. In centralized systems this is achieved using semaphores, monitors, and similar constructs… How to establish mutual exclusion in distributed systems: – Centralized approach – Distributed approach

Mutual Exclusion: A Centralized Algorithm a) b) c) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted Process 2 then asks permission to enter the same critical region. The coordinator does not reply. When process 1 exits the critical region, it tells the coordinator, it will then reply to 2…

MX: A Distributed Algorithm a) b) c) Two processes want to enter the same critical region at the same moment. Processes 0 and 2 contend for the CR, so they send a time stamped “MX access to the resource” message to every one else. Process 0 has the lowest timestamp, so it wins. When process 0 is done, it sends an OK also, so 2 can now enter the critical region.

MX: A Token Ring Algorithm a) An unordered group of processes on a network, logically numbered. A logical ring constructed in software, where a token is released by one of the nodes, initially 0. b) – – Token loss must be handled properly, with token generation algorithm. Node failure must be handled too…

Comparison number of messages per process to enter/exit a critical region Messages per entry/exit Delay before entry (in message times) Problems Centralized 3 (req, grant, release) 2 Coordinator crash Distributed 2 ( n – 1 ) (req and grant) 2(n– 1) Crash of any process Token ring 1 to (token may circulates forever!) 0 to n – 1 Lost token, process crash Algorithm A comparison of three mutual exclusion algorithms for n odes, regarding complexity and failure or loss situation.

The Transaction Model • Transaction model is all or nothing model. • Analogy can be made with a discussion process going on for a project towards signing a contract. Unless the contract is signed, any party can withdraw with no harm. • Programming with tx requires special primitives supplied by the OS, language, or a middleware. The exact list of primitives may be different for different application or system environments.

The Transaction Model (1) Updating a daily master inventory tape is fault tolerant. If something goes wrong, every thing is redone from the beginning, ie. rewind the tapes to the beginning and restart the process- all or nothing.

The Transaction Model (2) Primitive Description BEGIN_TRANSACTION Make the start of a transaction END_TRANSACTION Terminate the transaction and try to commit ABORT_TRANSACTION Kill the transaction and restore the old values READ Read data from a file, a table, or otherwise WRITE Write data to a file, a table, or otherwise Typical examples of primitives for transactions. Either all nothing between the begin and end is executed.

The Transaction Model (3) reservation flight seat from White. Plains in NY via JFK Airport to Malindi in Kenya, via capitol city Nairobi. BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi; END_TRANSACTION (a) BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full => ABORT_TRANSACTION (b) a) Transaction to reserve three flights commits, as three different operations b) Transaction aborts when third flight is unavailable, during the same booking, as if nothing has happened

The Transaction Model (4) Transaction properties a) Atomicity: indivisibility of the tx b) Consistency: no violation of the invariants c) Isolated: no interference between concurrent txs d) Durable: changes are made permanent once committed e) …ACID property of txs

Classification of Txs a) Flat Txs: Txs of ACID properties discussed so far: not practical for most distributed tx applications… b) Nested Txs: a number of logically related complementing sub-transactions form one nested tx. One problem is the level of ACID, top level parent aborts; every done child must be undone… c) Distributed Txs: flat indivisible Tx that operates on data that is distributed across multiple computers.

Nested and Distributed Transactions a) b) A nested transaction: a tx logically decomposed into a hierarchy of subtransactions A distributed transaction, a flat indivisible transaction that works on distributed data: may require distributed commit, distributed locking etc are to be solved

Implementation How to implement nothing or all principle in case of Dist Txs? a) Private workspace: implemented so that individual updates can be undone without effecting the original data, depending on commit/abort b) Writeahead log: log of changes is created throughout execution, so that commit/abort can be taken care of…

Private Workspace a) b) c) The file index and disk blocks for a three-block file The situation after a transaction has modified block 0 and appended block 3 After committing

Writeahead Log x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y=y+2 x = y * y; END_TRANSACTION; (a) Log Log [x = 0 / 1] [y = 0/2] [x = 1/4] (b) (c) (d) a) N example transaction that changes x and y b) – d) The log before each statement is executed. First value is before the change, second value is after the change

Concurrency Control (1) General organization of managers for handling transactions. Top level ensures atomicity, middle level ensures consistency, bottom level ensures execution

Concurrency Control (2) General organization of managers for handling distributed transactions.

Concurrency Control Methods How can the operations be synchronized? • Two-phase locking as a pessimistic approach in synchronization • Pessimistic time-stamp ordering • Optimistic time-stamp ordering

Two-phase locking-2 PL-1 • Require all the locks during the growing phase, release them during the shrinking phase. – On conflict, operation is delayed – A lock is never released before the operation on the data for which the lock is set is complete – Once a lock is released on behalf of a transaction no other lock can be granted to the same transaction • In strict 2 PL, all the acquired resource are released at the same time…This avoids cascaded aborts deadlocks • 2 PL can easily cause deadlocks to happen • Centralized and versions of distributed 2 PL are possible

Two-Phase Locking (2) Two-phase locking.

Two-Phase Locking (3) Strict two-phase locking.

Pessimistic time-stamp ordering-1 • Every operation of a Tx is time stamped as ts by an appropriate algorithm (Lamport’s algorithm) • Every data item in the system is time-stamped for the last read (ts. R) and last write (ts. W) transaction operations • If two operations on a data item x conflict, the data manager grant the operation to the Tx with earlier ts

Pessimistic time-stamp ordering-2 • Read operation of a Tx with time-stamp ts – If ts <ts. W abort the Tx – If ts>ts. W allow execution and set ts. R to max(ts, ts. R) • Write operation of a Tx with time-stamp ts – If ts <ts. R abort the Tx – If ts>ts. R allow execution and set ts. W to max(ts, ts. W)

Pessimistic Timestamp Ordering-3 Concurrency control using timestamps.

Optimistic time-stamp ordering • Go ahead do whatever you want, if there is conflict during the commit handle it then: If conflicts are rare, most of the time commits take place without any problem • This requires recording of all read and write ts on the data items, to check if any of the items have been changed during decision a commit… • Abort, if a changed is detected, commit otherwise • This scheme has not been preferred much for distributed systems…