CSC 8420 Advanced Operating Systems Georgia State University

Скачать презентацию CSC 8420 Advanced Operating Systems Georgia State University

397232bbb01258a4fc6087bf1d0b2e40.ppt

Количество слайдов: 166

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Goal 2. Transparency 3. Service CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Efficiency 2. Flexibility 3. Consistency 4. Robustness CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Access transparency 2. Location transparency 3. Migration transparency 4. Concurrency transparency 5. Replication transparency 6. Parallelism transparency 7. Failure transparency 8. Performance transparency 9. Size transparency 10. Revision transparency CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Primitive Services 2. Services by System Servers 3. Value added services. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Workstation-server model Servers Communication Network Workstations 2. Processor-pool model Processor Pool Server Communication Network Servers Intelligent Terminals CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. LAN – Local Area Network 2. MAN – Metropolitan Area Network 3. WAN – Wide Area Network 4. On board interconnection networks CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Physical Layer 2. Data Link Control (DLC) Layer 3. Network Layer 4. Transport Layer 5. Session Layer 6. Presentation Layer 7. Application Layer Note: TCP/IP Protocol Suite Correspondence. 1, 2, as it is, 3 is IP, 4 and parts of 5 is TCP/UDP. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Object models and naming schemes. 2. Distributed Coordination. 3. Interprocess Communication. 4. Distributed Resources. 5. Fault tolerance and security. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

• Everything in a computer system are objects. Objects are characterized by operations. • Object identification schemes • By name. • By address. • By service. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Barrier Synchronization 2. Condition Coordination 3. Mutual Exclusion Related Problems: 1. Deadlock handling 2. Agreement protocols CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Point-to-point communication (unicast) 2. Group communication (multicast or broadcast) 1. Client/server model (socket) 2. Remote Procedure Call (RPC) CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Processing Power (processors) – load sharing, multiprocessor scheduling 2. Distributed Data (distributed shared memory and distributed file system) - synchronization, replica management CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Transparency 2. Recovery 3. Confidentiality, authentication and authorization CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

• Processes are programs in execution --- Each process has its own PCB, page table and address space. • Threads are lightweight processes --- Multiple threads run on a process sharing the address space of the process. --- Each thread has its own TCB consisting of program counter, stack pointers and register set. --- Further each thread has its own stack. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

ü Ease of creation – no need to copy address space ü Context switching is cheaper, particularly for threads implemented in user space ü Allow concurrency as well as blocking system calls making the programming easier ü Simpler and more efficient resource sharing CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Ø Dispatcher-workers model. Ø Team model. Ø Pipeline model. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Requests CSC 8420 Advanced Operating Systems Worker thread Dispatcher thread Georgia State University Yi Pan

Requests Thread o Identical threads o Static o Not identical threads o Dynamic CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Requests Thread CSC 8420 Advanced Operating Systems Thread Georgia State University Yi Pan

q Thread creation -- Static vs. Dynamic. q Thread termination q Thread synchronization q Thread scheduling -- Priority assignment, quantum size variation, handoff and affinity scheduling. q Signal handling -- Signal handler, signals must not be lost. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

§ Implementing in user space § Implementing in kernel space § Hybrid approach CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

§ As an add-on library § OS schedules the process, not threads, limited scheduling choice – non-preemptive priority. § Low context switching overhead. § One thread may block all other. § Possible unfair process scheduling. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

§ All scheduling policies may be implemented. § OS is aware of threads and schedule them. § Context switching overhead is higher. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

User space threads LWP LWP Kernel space threads Multiprocessor system CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

To analyze the performance requirements, the processes should be represented in abstract ways, hiding unnecessary detail. Models for process representation: Ø Synchronous process graph. Ø Asynchronous process graph. Ø Timing diagram. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Represents the precedence relation between communicating processes using Directed Acyclic Graph (DAG). Can be used for total completion time analysis etc. Fork/join or cobegin/coend constructs may be used to implement the precedence relation. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Represents communicating processes without any reference to the precedence relation. Used for processor allocation etc. ü One-way communication ü Client/server communication ü Peer to peer communication CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The events alongwith their time of occurrence is shown in this method. Provides greater detail and both the synchronous and asynchronous process graphs may be derived from the timing diagram. P 1 P 2 P 3 Time CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Fact: Many applications need the timing information. Ex: Two processes updating a file. The file server received two simultaneous write requests. Without a time stamp it is not possible for the file server to determine which request is older. Fact: It is not possible to perfectly synchronize clocks in a distributed system. Fact: Even if the clocks may be synchronized, physical clocks will not run at same pace. Therefore, clock skew will develop over time. Hence, we need some protocol to ensure that the clocks run in reasonable agreement. Such protocols are called clock synchronization protocols. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Two types of clocks: • Physical Clock (a close approximation to the real life clock). • Logical Clock (used to provide ordering of the events). Different algorithms are used to synchronize physical and logical clocks. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Many Universal Coordinated Time (UTC) sources are available for use by computers. Ø NIST short wave radio Ø Automated Computer Time Service (via modem) Ø GPS satellites Using these time services are costly and/or complicated. Therefore, a few time servers (TS) access the UTC sources directly. Other computers obtain the timing information from one or more TS. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Fact: Accessing a TS via network requires a non-deterministic message exchange time. Therefore, the following methods are used to compensate the delay. 1. Cristian’s Algorithm 2. The Berkeley Algorithm 3. Averaging Algorithm CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Sender Request Time Server T 0 I, time to process interrupt T 1 Time, T If d is the estimate of delay from the TS to the sender, the sender should set its clock to T+d. If nothing is known about I, then estimated d=(T 1 -T 0)/2. If I is known, then estimated d=(T 1 -T 0 -I)/2. Or d may be estimated using average or minimum of multiple requests. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The time server periodically gathers time data from all other machines and determines the average time. Then it communicates the average time (in fact the difference) to all machines. * Suitable for systems which can not access any UTC server. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The Berkeley algorithm is centralized and therefore has a single point of failure. Every computer broadcasts its time information at a regular interval known systemwide. Every machine averages the broadcast information to compute its own time. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Note that the clock of the system can not be set backwards. Doing so may result in an inconsistent system state. To solve the problem the clock is slowed down for a certain period of time. Once the system clock reaches the desired state, the rate of change of the clock may be restored to the old state. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

It is possible to put an upper bound (D) on the rate at which the clock skew develops. That means, the clock in a system may deviate at most Dt from the real clock over time t. Therefore, if two clocks are synchronized at time 0, then they will differ by at most 2 Dt at the end of time period t. If the system specification allows the clocks to differ by at most T, then every computer will have to synchronize its clock at least once every T/2 D time. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Ø Lamport’s algorithm Ø Vector logical clocks Ø Matrix logical clocks CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Lamport’s algorithm is based on two facts. 1. The events within a processor may be consistently ordered using the system clock provided that the system clock is monotonically increasing. 2. The ordering of the events occurring in different processors do not matter as long as it is consistent with the communication pattern. In other words, If processor Pj sends a messsage to processor Pk at time T, then Pk can not receive it on or before T. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Let T be the universal physical time and Cj(T) be the value of the logical clock in processor j at time T. Further, Cj(T) is monotonically non-decreasing. 1. Stamp each internal event by the logical clock value of that processor. 2. Whenever a message is sent by processor k, stamp it by Ck(T). 3. When a message is received by processor k with a time stamp C’ set the logical clock of k to max(C’+1, Ck(T)). CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Note that the version of the Lamport’s algorithm provides only a partial ordering of the events across the system. If a total ordering of the events is necessary, then the time stamp may be appended with the processor id and the process id. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

One problem with Lamport’s algorithm is it does not help to recognize the concurrent events in the sense that two concurrent events may have different times stamps. Therefore, if T(a)

The problem may be solved using vector logical clock. Each processor (k) maintains an array of logical clocks Ck(T)=(Ck 1 (T), …, Ckn(T)) where the ith entry corresponds to the logical clock of processor I. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Definition: ck(t)<cj(t) if all coordinates of ck(t) are less than or equal to the Definition: ck(t)

The processor k maintains its vector clock as follows. • The clock Ck(T) is initiated at (0, …, 0). • The kth coordinate (Ckk(T)) changes as the local clock changes. • Each message sent out is stamped by the vector clock. • Whenever a message with time stamp Ci(T) is received, coordinate p of the logical clock (Ckp(T)) is replaced by max(Ckp(T), Cip(T)) for p=1, …, n. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Instead of maintaining an array of logical clocks, an array of vector clock is maintained. The matrix thus obtained is used to stamp messages. Upon receipt of a message, the matrix clock is updated by taking the pair-wise maximum. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Message Passing (lowest level of communication, unstructured) 2. Request/Reply (Client/server, RPC) 3. Transactions (Satisfies ACID property) CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

• Send(destination, message) • Receive(source, message) Source/destination may be described in four different ways 1. Process name 2. Link 3. Mailbox 4. port CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Unique global process id. required. May be obtained by adding machine address to process id. 2. Allows only one logical communication link between two processes. 3. Process identifiers need to be known at coding time. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Allows multiple logical communication path between communicating processes. 2. The communicating processes need to know about each other. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Mailboxes are global data structures shared by some sender and some receiver processes. 1. Allows indirect communication between sender and receiver. 2. Allows multipoint and multipath communication. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Ports are finite mailboxes with first-in-first-out (FIFO) message queues. 1. Created by user processes using system calls. 2. May have restrictions in terms of access rights. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Send/receive primitives may be blocking or non-blocking. Blocking receive implies the process cannot continue till the message is received. Blocking send may be of different types. • Ordinary blocking send • Reliable blocking send • Explicit blocking send • Request and reply CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Used in both Windows and UNIX. • Pipes: Byte streams shared by parent process and children. • Named pipes: FIFO files shared by unrelated processes • Sockets: two-way communication links shared by processes in different domains. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Provides Privacy, Integrity and Authenticity. Authentication is done by third-party certification authority. Privacy and integrity are maintained by handshake protocol and cryptography. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

• Reliability (Best effort vs. Reliable) • Delivery Order (FIFO, Causal Order, Total Order) • Failure of recipient(s) vs. Failure of originator • Overlapping groups CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The algorithm is very similar to the vector logical clock. 1. Each message is time stamped by a vector where each entry in the vector is the number of messages received by the sender from that group member. 2. Accept a message from process i if a) you have received all previous messages from i and b) you have received all messages seen by i. Otherwise, delay accepting the message. 3. Reject any duplicated message. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Atomic multicast alongwith total order delivery provided by two-phase, total-order multicast. Originator: • Send the message, collect acknowledgements with time stamp. • Send the commit message with the highest logical ack time stamp (commit stamp). Recipient: • Send acknowledgement with the logical clock value as time stamp (local ack stamp). • Do not deliver a message with commit stamp t until the commit message for all messages with local ack stamp < t has been committed. • Deliver messages in the order of the commit stamp. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

In request/reply communication the sender is blocked (or the message is considered not delivered) until it receives a reply. RPC (Remote Procedure Call) is a language-level abstraction to support request/reply communication. RPC is implemented by stub procedures at both the client end and the server end. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Implementation issues: • Parameter passing and data conversion - Call-by-value and call-by-copy/restore are favorite choices. - Data typing, data representation, data transfer syntax problems. Use a canonical data representation. • Binding: directory server and port mapper. • RPC compilation • RPC exception and failure in-band or out-band signaling, idempotent services, request id. , reliable transport layer, handling server crash, handling client crash CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Transactions are communications with ACID property: • Atomicity: all or nothing • Consistency: interleaving results in serial execution in some order • Isolation: Partial results are not visible outside • Durability: After committing, the results are permanent CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

There is one coordinator and multiple participants. Each of them have access to some stable storage. Activity log is maintained in the stable storage. Coordinator: 1. Prepare to commit the transaction by writing every update in activity log. 2. Write a precommit message in the activity log. Send a voting message to all participants asking whether they are ready to commit. 3. If all participants vote yes within the time-out period, multicast a commit message. Otherwise, multicast an abort message. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Participant: 1. Prepare to commit the transaction by writing every update in activity log. 2. Write a precommit message in the activity log. Wait for request to vote from coordinator. 3. Wait for commit message from the coordinator. If received, commit the transaction. If abort message is received, abort the transaction. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Coordinator: 1. If the processor crashes, it will check the activity log for the transaction. 2. If the precommit message is not in the log, abort the transaction. 3. If the commit message is not in the log, retake the vote. 4. If the commit message is there in the log, finish the transaction. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

• Contention based Lamport’s algorithm, Ricart & Agrawala’s algorithm, Voting algorithm • Token based Ring structure, tree structure, broadcast structure CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Every process maintains a queue of pending requests for entering critical section ordered according to the logical timestamp. Requestor: 1. Send a request to every process. 2. After all replies are collected, enter its own request in its own queue 3. If own request is at the head of the queue, enter critical section. 4. Upon exiting the critical section, send a release message to every process. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Other processes: 1. After receiving a request, send a reply and enter the request in the queue 2. After receiving release message, remove the corresponding request from the queue. 3. If own request is at the head of the queue, enter critical section. Problems: • 3(N-1) messages per requests • Multiple points of failure CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Requestor: 1. Send a request to all other process. 2. Enter critical section upon receipt of reply from all processes. Other Processes: 1. Upon receipt of a request check whether it has any older (by logical clock) pending request of its own or whether it is executing in critical section. 2. If neither of the above conditions hold, send reply. Otherwise delay the reply message till both the conditions are false. Problems: 1. 2(N-1) message exchange per request. 2. Multiple points of failure. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Requestor: 1. Send a request to all other process. 2. Enter critical section once REPLY from a majority is received 3. Broadcast Other processes: RELEASE upon exit from the critical section. 1. REPLY to a request if no REPLY has been sent. Otherwise, hold the request in a queue. 2. If a REPLY has been sent, do not send another REPLY till the RELEASE is received. Observations: 1. No single point of failure, but possibility of deadlock. 2. O(N) messages per request. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Deadlock prevention method is used by ensuring no hold-and-wait Requestor: 1. Send a request with time stamp to all other process. 2. Enter critical section once REPLY from a majority is received 3. Return the REPLY message upon receipt of an INQUIRY if not currently executing in the critical section. 4. Broadcast RELEASE upon exit from the critical section. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Other processes: 1. If no REPLY has been sent, REPLY to any REQUEST. 2. If a REPLY has been sent but the RELEASE has not been received, compare the time stamp of the new REQUEST with the time stamp of the replied REQUEST. If the new REQUEST has an earlier time stamp, try to retract the old REPLY sending INQUIRY message. If the older REPLY is returned, send REPLY to the new REQUEST. Otherwise, wait for the RELEASE. Observation: 1. High (O(N)) messages per critical section entry. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

To reduce message overhead, the set of (N) processes is divided into N sets S 1 to Sn such that Si I Sk is non-null for all i and k. Si is called the quorum of process i. Requestor: 1. Need to secure the REPLY messages from all members of its quorum only. Observations: 1. It is possible to reduce number of messages significantly. 2. Depending on the structure of the quorum and the exact algorithm, single point of failure may exist. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Control-token circulates in the system in some fixed order. Possession of the token grants permission to enter critical section. • Ring structure – simple, deadlock-free, fair. The token circulates even in the absence of any request (unnecessary traffic). Long path (O(N)) – the wait for token may be high. • Tree structure • Broadcast structure CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The processes are organized in a logical tree structure, each node pointing to its parent. Further, each node maintains a FIFO list of token requesting neighbors. Each node has a variable Tokenholder initialized to false for everybody except for the first token holder (token generator). Entry section: If not Tokenholder If the request queue empty request token from parent; put itself in request queue; block self until Tokenholder is true; CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Exit section: If the request queue is not empty parent = dequeue(request queue); send token to parent; set Tokenholder to false; if the request queue is still not empty, request token from parent; CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Upon receipt of a request: If Tokenholder If in critical section put the requestor in the queue else parent = requestor; Tokenholder = false; send token to parent; endif else if the queue is empty send a request to the parent; put the requestor in queue; CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Upon receipt of a token: Parent = Dequeue(request queue); if self is the parent Tokenholder = true else send token to the parent; if the queue is not empty request token from parent; Note that all requests/token receipts must be processed in FIFO order and the routines must execute atomically. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Data Structure: The token contains Token vector T(. ) – number of completion of the critical section for every process. Request queue Q(. ) – queue of requesting processes. Every process (i) maintains the following seq_no – how many times i requested critical section. Si(. ) – the highest sequence number from every process i heard of. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Entry Section (process i): Broadcast a REQUEST message stamped with seq_no. Enter critical section after receiving token. Exit Section (process i): Update the token vector T by setting T(i) to Si(i). If process k is not in request queue Q and there are pending requests from k (Si(k)>T(k)), append process k to Q. If Q is non-empty, remove the first entry from Q and send the token to the process indicated by the top entry. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Processing a REQUEST (process j): Set Sj(k) to max(Sj(k), seq_no) after receiving a REQUEST from process k. If holds an idle token, send it to k. Requires broadcast. Therefore message overhead is high. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Used to elect a centralized controller if the older one fails. The election of a new controller helps to alleviate some of the problems arising from having a single point of failure. Two types of election algorithm • Extrema- finding – based on global priority • Preference-based – based on local preferences. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Election algorithms and mutual-exclusion algorithms are similar in the sense that they try to find one process that will be leader or enter the critical section. However, there are significant differences between two. 1. Leader election is one time. A process may yield to another process. In mutual exclusion algorithm, a process competes until it succeeds. 2. In leader election starvation is not an issue. 3. Once the leader is elected, all other processes must know who is the leader. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Election algorithms are designed on the basis of the logical topology of the group of the processes. 1. Complete topology 2. Ring 3. Tree CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Assumptions: 1. A process may reach all other processes in one logical hop. 2. Communication is reliable, only processes may fail. 3. Time to handle a message is bounded. 4. Process faults are benign, i. e. , a faulty process stops to generate messages. In other words, a process never generates confusing wrong messages. 5. Upon recovery, a process knows that it experienced a failure. 6. A failed process may rejoin the group upon recovery. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Our algorithm is an extrema-finding algorithm. It is called the bully algorithm. Upon detecting that the leader has failed, a process sends an inquiry message to all higher-priority processes. If no reply is received within the time-out period, the initiator sends an enter-election message to all lower priority processes. After the initiator receives a reply from all lower-priority processes or after the time-out, it declares itself as new leader and broadcasts that information to all processes. If a process receives an inquiry message from a lower priority process, it send inquiry message to all higher priority process and enters election. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Our algorithm is an extrema-finding algorithm. The processes are organized in a logical ring. The initiator sends a message along ring with its own id. Requesting everybody to elect it the leader. A process compares its own priority with that of the id. on an incoming message. If its own priority is higher, and it never forwarded any other messages, replace the message with its own id. and forward it. If its own priority is lower, forward the message. If the message carries its own id. then elect itself the leader and forward the election message. Message complexity O(N 2) in the worst-case, O(N) in the best-case. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

There are two types of election algorithms those use the logical tree topology. 1. The algorithm runs after the spanning tree is constructed. 2. The algorithm runs before the spanning tree is constructed and constructs the spanning tree in the process. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Note that the tree is an undirected tree without any parent-child relation. When the election begins there are possible more than one contender. Contender: 1. If receives an elect-me message from some other process before step 2 is completed, stop contending. 2. Send an elect-me message to all neighbors with a timestamp. 3. If receives reply from all neighbors, declare itself as leader by broadcasting I-am-leader message. 4. If receives I-am-leader message from some other process, stop contending. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Leaf Nodes: 1. Among possible multiple elect-me messages pick the one with lowest time-stamp and reply to it. Do not send multiple replies. Internal nodes: 1. Upon receiving an elect-me message from neighbor u, compare its time stamp with other such messages received earlier (if any). If this one is the oldest elect-me message, forward it to all neighbors except u. Designate u as parent. 2. Upon receiving reply from all neighbors except parent, forward it to the parent. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. The effect of communication overhead 2. The effect of underlying architecture 3. Dynamic behavior of the system 4. Resource utilization 5. Turnaround time 6. Location and performance transparency 7. Task and data migration CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Static or off-line 2. Dynamic or on-line 3. Real-time CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Precedence graph 2. Communication graph 3. Disjoint process model 4. Statistical load modeling CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Communication system model 2. Processor pool model 3. Isolated workstation model 4. Migration workstation model CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Speedup 2. Resource utilization 3. Makespan or completion time 4. Load sharing and load balancing CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

m l Turnaround time=1/(m-l) l m CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

m 2 l m Turnaround time=m/((m-l)(m+l)) CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The general problem is NP-complete. 1. Stone’s algorithm (2 heterogeneous processors, arbitrary communication process graph). 2. Bokhari’s algorithm (n homogeneous processors, linear communication process graph, linear communication system) CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Assumptions: 1. Two heterogeneous processors. Therefore, the execution cost of each process depends on the processor. Further, the execution cost for each process is known for both processors. 2. Communication cost between each pair of processes is known. 3. Interprocess communication incurs negligible cost if both the processes are in the same processor. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Objective Function: Stone’s Algorithm minimizes the total execution and communication cost by properly allocating the processors among the processes. Let, G=(V, E) be the communication process model. Let A and B be two processors. Let w. A(u) and w. B(u) be the cost of executing process u on processors A and B respectively. Let, c(u, v) be the cost of interprocess communication between processes u and v if they are allocated different processors. Further, S be the set of processes to be executed on processor A and V-S be the set of process to be executed on processor B. Stone’s algorithm computes S such that the objective function Su e S w. A(u)+Sv e V-S w. B(v)+Su e S, v e V-S c(u, v) is minimized. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Example: Processes 1 2 3 4 Cost at Processor A Processor B 3 1 2 3 2 5 4 3 Computation Costs 1 1 2 3 4 4 5 0 1 2 2 3 2 4 Communication Costs If process 1 and 2 are allocated to processor A, and the rest in processor B, then the total cost is 3+1+4+3+5+0+1+2=19. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Commodity-flow problem: Stone’s algorithm reduces the scheduling problem to the commodity-flow problem described below. Let G=(V, E) be a graph with two special nodes u (source) and v (sink). Each edge has a maximum capacity to carry some commodity. What is the maximum amount of the commodity that can be carried from the source to the sink. 1. There is a known polynomial time algorithm to solve the commodity-flow problem 2. Let S be a subset of V such that the source is in S and the sink is in V-S. A set of edges (say C) with one end in S and the other end in V-S is called a cut and the sum of the capacities of the edges in C is called the weight of the cut. It can be shown that the maximum-flow is equal to the minimum possible cut-weight. This is called max-flow, min-cut theorem. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Reduction to the commodity-flow problem: Stone’s algorithm reduces the scheduling problem to the commodity-flow problem as follows. Let G=(V, E) be the communication process graph. Construct a new graph G’=(V’, E’) by adding two new nodes a (source, corresponding to processor A) and b (sink, corresponding to processor B). For every node u in G, add the edges (a, u) and (b, u) in G’. The weights (capacities) of the new edges will be w. B(u) and w. A(u) respectively. 1. Note that a cut in G’ gives a processor allocation for the job. 2. Further, the weight of the cut is same as the cost (objective function) of execution + cost of communication for the processor allocation given by the cut. 3. Therefore, if we compute the max-flow on G’, the corresponding min-cut gives the best processor allocation. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

5 Example: 3 2 4 1 1 2 2 1 A B 5 4 2 4 3 CSC 8420 Advanced Operating Systems 3 2 3 Georgia State University Yi Pan

Assumptions: 1. n homogeneous processors, connected in a linear topology. 2. k>n processes, linear communication graph. 3. Communication links are homogeneous. 4. Computation and communication costs are known. 5. Two processes not communicating directly will not be allocated the same processor. 6. Two communicating processes, if not allocated same processor, must be in the adajacent processors. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Objective function: Bokhari’s algorithm minimizes the sum of the total communication cost and the maximum execution cost in a processor. Note the difference with the objective function in Stone’s algorithm. Let P 1, …, Pn be processors in linear order and p 1, …, pk be processes in linear order. Let the execution cost of pi be wi and the communication cost between pi and pi+1 be ci. If processes pj(j) to pi(j+1)-1 be allocated to processor Pi, then the objective function for Bokhari’s algorithm will be maxi. Wj+Si=1 ncj(i) where Wj is the execution cost for processor j. Wj=wi(j)+wj(i+1)-1. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Example: 3 4 2 P 1 2 4 P 2 1 1 3 2 2 1 P 3 The value of the objective function max(3+2, 4, 1+2+1)+(2+1)=8. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

To solve the problem Bokhari constructed the a layered graph. The graph has n+2 layers, where n is the number of processors. The top and bottom layer has 1 node each. All other layers have k nodes where k is the number of processes. We number the layers from 0 thru k+1 and denote the ith node of jth layer by v(i, j). The node in the top (0 th) layer is adjacent to all nodes in layer 1. The node in the bottom layer is adjacent to all nodes in the kth layer. Other v(i, j)s are adjacent to v(i, j+1), …, v(k, j+1). Example of a layered graph constructed for n=2 and k=3. Note that any path from the top layer to the bottom layer gives a processor allocation subject to the restrictions described earlier and vice versa. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Each edge e in the graph has two weights, w 1(e) and w 2(e). If e is an edge connecting v(i, j) and v(r, j+1) then w 1(e) is the total execution time of the processes pi to pr-1 and w 2(e) is the communication cost between pr-1 and pr. Note that, a shortest path from the top to bottom on the basis of w 2 will minimize the total communication cost. Any known shortest path algorithm may be used to do that. Further, w 1 may take only certain values. To be precise, there are k+1 C 2 possible values of w 1. We can compute those values and sort them. Then we can restrict the w 1 by deleting all edges with w 1(e) greater than the restricted value. A shortest path in the new layered graph will minimize the total communication cost subject to the restriction that the maximum of the execution cost in any processor will not exceed the threshold value. Using this method we can compute the objective function for each of the possible k+1 C 2 threshold values. The best processor allocation can thus be computed. Note that a clever binary search over the possible values of w 1 may reduce the computational complexity of the algorithm significantly. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The Goal of dynamic scheduling is to load share and balance dynamically. This goal is achieved by migrating tasks from heavily loaded processors to lightly loaded processors. Two types of task migration algorithms: • Sender-initiated - Transfer policy (when to transfer? ) - Selection policy (which task to transfer? ) - Location policy (where to transfer? ) Potentially unstable. • Receiver-initiated CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

A real-time task consists of Si, the earliest start time, Di, the deadline to finish the task and Ci, the worst-case completion time. A set of real-time tasks is called a real-time task set or a real-time system. A system of real-time tasks is called hard real-time system if all of the tasks in the system must complete execution within their respective deadlines. Even if one task misses its deadline, the whole system is considered incorrectly executed. A system of real-time tasks is called soft real-time system if the performance is judged by how many tasks missed deadline and by how much. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

A schedule for a real-time system is a CPU assignment to the tasks of the system such that a CPU is assigned to no more than one task at one time. A schedule for a real-time system is called valid if no task is assigned CPU before its earliest-start time and all tasks complete execution within their respective deadlines. A real-time system is called feasible if there is a valid schedule for the system. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Dispersed files and clients - login transparency, access transparency - location transparency, location independence 2. Multiple users and files - concurrency transparency - replication transparency CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Basic Concepts 1. Files and file systems 2. Services and servers 3. File mounting and server registration 4. Stateful and stateless file servers 5. File access and semantics od sharing 6. Version control CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Files have names, attributes and data. 2. File organization – flat or hierarchical 3. File access – sequential, direct, indexed/indexed sequential 4. Key components of file service – directory, authorization, file service (basic and transaction) and system services. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. A service may be provided by one/many servers. Similarly one process may provide multiple services. 2. Client/server relationship is relative. A schematic is given below. Directory services Clients Auth. services File services System services CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Mounting is method to build users’ directory structure by using files and directories dispersed over multiple systems. 2. Different types of mounting – Explicit, Boot and Auto-mounting. 3. Mounting may or may not provide uniform global view. 4. Mounting is not a location transparent protocol, hence server registration is useful CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. A stateful server maintains state information for each open file. Staeless server does not maintain state information. Examples of state information – list of open files and clients, file descriptors and handles, file position pointers, mounting information, lock status, session keys, cache or buffer etc. 2. For stateless servers there are certain issues to be taken care of - Idempotency requirement - File locking mechanism - Session key management - Cache consistency CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

space time Remote Access Cache Access Down/upload acc. Simple R/W No true Sharing Coherency control Coherency Control Transaction Concurrency Control Coherency and Concurrency Control Session Not applicable Ignore sharing CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. UNIX Semantics - Can be implemented using only server-side cache. - Closely approximated using client-side cache with write-through and writeinvalidate/update - As a compromise write-back (file updated and cache invalidated only at the end of the batch) cache coherence policy may be used. 2. Transaction Semantics - File at server is updated only at the end of a successful transaction 3. Session Semantics - File at server is updated only at the end of the session. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

One solution of the file sharing/replication problem is to create a new version of the file upon every write or upon every close. Problem: What if two applications open an older version of a file, then application one closes the file (and thus creates a newer version) before application 2. Then application 2 closes the file. Three solutions: 1. Ignore Conflict 2. Resolve version conflict 3. Resolve serializability conflict. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Transaction requirements: 1. The execution of transactions are all or none. 2. The interleaving of multiple transactions is serializable. 3. Update is atomic. Clients Transaction manager Scheduler Object manager Objects CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

A serializable schedule is a legal schedule such that its execution will produce result equivalent to some serial execution of all transactions. A simple, inefficient method to achieve serializability: Complete transactions in private space, then update distributed objects using total order multicast. Three concurrency control protocol for transaction management 1. Two-phase locking 2. Timestamp ordering 3. Optimistic concurrency control CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

An object must be locked before accessing it. No new lock can be acquired after releasing a lock. Problems: 1. Potential deadlock. May be solved by rollback or abort. 2. Commit dependence resulting in probable rolling aborts if locks are released as early as possible. May be solved using strict two-phase locking – locks released only at commit or abort point. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Each transaction has a time-stamp. Transactions are serialized according to this time-stamp. In other words, older transactions abort and restart if they have a conflict with an younger transaction. Each object has two time stamps RD and WR – times of the transactions which read (wrote) that object last. Further each object will have a list of tentative transaction times for pending writes. Let Tmin be the minimum of these tentative times. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The following actions are taken with each event: • READ: Does not conflict with other reads. If the time-stamp of this read is smaller than WR, this transaction is aborted. If it is allowed to proceed if it’s time stamp is in between WR and Tmin. It is kept in a queue otherwise for some other writes to commit. • WRITE: Allowed to proceed only if it’s time stamp is greater than both RD and WR. Then the write is marked tentative and put in the list. • ABORT: Aborted read has no effect. Aborted write is removed from the tentative list and if a pending read reaches the head of the queue, it is performed. • COMMIT: A commit may not involve any pending read. If it has any pending write then all preceding tentative writes aborted and this transaction is committed. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Transactions are allowed to proceed freely in the private work space. Before committing it is validated. Once validated, the objects may be updated in updation phase. Each transaction ti has two time stamps, TSi and TVi for the start of the execution and the validation phase respectively. Each object Oj records the times for its last read and write commits, RDj and WRj, respectively. Ri and Wi are read set and write set for ti. The transactions are serialized in the order of TVi. VALIDATION: If tk is already in validation phase, and TVi < TVk, then ti can not be validated and must be aborted. If ti has no overlap with any other transaction, it is validated. If ti execution phase overlaps with update phase of tk, then Ri and Wk needs to be disjoint to validate ti. If ti execution phase overlaps with validate and update of tk, then Ri and Wk needs to be disjoint as well as Wi and Wk need to be disjoint. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Reading operation: 1. Read-one-primary 2. Read-one 3. Read-quorum 4. Writing operation: 5. Write-one-primary 6. Write-all 7. Write-all-available 8. Write-quorum 9. Write-Gossip CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. 2*write-quorum > number of replicas 2. Read-quorum+write_quorum > number of replicas 3. Usually read-quorum is chosen to be smaller than write-quorum 4. Voting by witnesses 5. Weighted voting schemes. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Each File service agent (FSA) f maintains a time stamp of last updation – TSf Each Replica manager (RM) i maintains a time stamp of last updation – TSi Basic Gossip Protocol: Read: If TSf > TSi the replica manager do not have current data. Either wait or contact a different replica manager. Otherwise proceed with reading. Write: Increase TSf. If TSf > TSi update and ask the replica manager to propagate the update by gossip. Otherwise, depending on the application, either overwrite or update and read. In both cases increase TSf. Gossip message: If the gossip message carries a newer time-stamp, update data, TSi and depending on application, repeat the gossip with Tsi. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Uniform Memory Access (UMA) Cache MM PE Network Cache 2. PE MM Non-Uniform Memory Access (NUMA) MM PE Network MM PE CSC 8420 Advanced Operating Systems Cache Georgia State University Cache Yi Pan

Virtual Memory Space MM MM MM Frame MM Offset Page Table Entry CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. Temporal and Spatial locality of references may be exploited using cache 2. Scalability may be improved by using improved network and/or by using cache. 3. DSM helps the application programmer by providing communication transparency. 4. Programmers are more familiar with shared memory. Further many software available for single processor and multiprocessor shared memory systems. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The performance of a DSM may be improved by 1. Placing a data block in appropriate memory module, 2. By migrating data block to an appropriate memory module and 3. by replicating block. The performance of migration and replication strategies are measured by hit ratio. Block-migration strategies suffer from block-bouncing problem. Both block-migration and block-replication strategies suffer from the problem of false-sharing. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

In a distributed system, it is not possible to enforce uniprocessor-like coherency for shared data items. 1. There is no global clock. Therefore, it is not possible to determine latest write. 2. There will always be delays (non-deterministic). Hence we use some weaker consistency model for shared data in distributed environment. The programmer is supposed to know what kind of consistency is guaranteed and act accordingly. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

If the user can not give any synchronization information, then general access consistency models are used. The general access consistency models are 1. Atomic consistency (usual uniprocessor consistency, a read never gets stale data) 2. Sequential consistency 3. Causal consistency 4. Processor consistency 5. Slow memory consistency. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The operation of all processors are executed in some sequential order and the operations of each individual processor are executed in the order specified by its program. P 1: W(X)1 P 2: P 3: W(Y)2 R(Y)2 CSC 8420 Advanced Operating Systems R(X)0 R(X)1 Georgia State University Yi Pan

Only causally related writes must appear in same order to all processors. P 1: W(X)1 W(X)3 P 2: RX(1) W(X)2 P 3: R(X)1 R(X)3 R(X)2 P 4: R(X)1 R(X)2 R(X)3 Causally consistent but not sequentially consistent. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Writes from same processor must appear in the order they were issued. P 1: W(X)1 P 2: RX(1) W(X)2 P 3: R(X)1 R(X)2 P 4: R(X)2 R(X)1 Processor consistent, but neither causally consistent nor sequentially consistent. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Writes from the same processor only in the same location (memory address) must appear in the order they were issued. P 1: W(X)1 W(Y)0 W(Y)2 P 2: R(X)1 W(X)3 R(Y)0 Slow memory consistent, but not processor consistent. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

The user controls access to the ordinary shared variable by using semaphore or other explicit synchronizing variables. The system provides some consistency only for synchronizing variables. Synchronization access consistency models are: 1. Weak consistency 2. Release consistency 3. Entry consistency. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Access to a synchronization variable is synch(S). 1. Access to all synchronizing variables are sequentially consistent 2. A processor must not access a synchronizing variable with pending access to any data 3. A processor must not access any variable with pending access to a synchronizing variable. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Access to a synchronization variable are acquire(S) and release(S). 1. Access to all synchronizing variables are processor consistent 2. Access to all shared variables between acquire and release are exclusive. 3. Changes made to shared variables are made visible to the outside world after release. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Access to a synchronization variable are acquire(X) and release(X). 1. Access to all synchronizing variables are processor consistent 2. Access to shared variable X between acquire and release are exclusive. 3. Changes made to X are made visible to the outside world after release. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Master copy E < Replicated block V E Replicated block Processor- bits > V E CSC 8420 Advanced Operating Systems Georgia State University V – valid/invalid E - exclusive Yi Pan

Any cache manager faces the following events, and must take appropriate actions to maintain functionality 1. Read-hit 2. Read-miss 3. Write-hit 4. Write-miss The performance of a cache management algorithm is judged by both hit-ratios and the overhead that occurs at each of the four events. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Event Action by processor Action by directory Read hit Use data from local cache Do nothing Read miss Get the block from master copy. Set V bit. May set E bit also. Send the block to the requestor. Set P bit, may set E bit also. Write hit Change own cache, send data and invalidate message to the directory. Update master copy, send invalidate message to other copies. Write miss Same as write hit. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Event Action by processor Action by directory Read hit Use data from local cache Do nothing Read miss Get the block from master copy. Set V bit. May set E bit also. Send the block to the requestor. Set P bit, may set E bit also. Write hit Change own cache, send data and invalidate message to the directory. Update master copy, send update message to other copies. Write miss Same as write hit. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

If the communication medium is a broadcast medium like bus, there is no need of the directory, as all processors can monitor all cache related messages and take appropriate actions. Clearly, such systems can implement sequential consistency. However, bus is a potential bottleneck and a single point of failure. Several approaches are taken to alleviate the problem. 1. Cache traffic is carried on a separate bus. 2. Multiple bus in hierarchical configuration. 3. Multistage interconnection networks (MIN) made of bus. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

In a DSM system, memory modules are strictly local and communication is only via high-latency links. Therefore, such systems simulates/emulates shared memory by message passing. Obviously, a process in such a system finds its address-space distributed in the memories of different processors. Therefore, it has three options while accessing a non-local memory page. 1. Remote access 2. Migration 3. Replication CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

With three access mechanisms and two access types (read and write), there are nine possible DSM management algorithms. Out of nine, the following four are more interesting. 1. Read-remote-write-remote 2. Read-migrate-write-migrate 3. Read-replicate-write-migrate 4. Read-replicate-write-replicate CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

One or multiple server serves the memory pages. 1. Simple to implement 2. If implemented with a single server, the system is always sequentially consistent provided that the server serializes request and response services. 3. High latency 4. Servers are potential bottlenecks. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

A memory page migrates to the new processors memory upon access 1. Coherence problem need not be handled separately 2. Exploits program localities well 3. Upon migration, all processors sharing the page need to update their virtual page to physical block mapping 4. Ping-pong effect is a problem that becomes more severe by false sharing. May be handled by smaller page size (associated with high overhead), or a combination of migrate-replicate-remote access strategy. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Same as the write-invalidate strategy for cache management. 1. Popular because many software use the multiple-read/exclusive-write semantic. 2. Performs well when reads are dominant operation. 3. In the absence of broadcast support in hardware, write becomes a costly operation. 4. Maintains strong consistency. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Same as the write-update strategy for cache management. 1. Performs well when reads are not dominant operation. 2. In the absence of broadcast support in hardware, write becomes a very costly operation. 3. Maintains strong consistency only when used with a suitable protocol like two-phase commit. CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Two problems need to be solved: 1. Finding the current owner of a page (if there is migration). 2. Finding the set of processors with a copy of the page (if there is replication. ) CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

1. If there is a master owner, that may keep track of the current owner 2. Otherwise, the following data structure may be used. Block probable owner Block probable owner CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

From To’s Spanning Tree From NIL Forwarded Head master next Update Invalidate Request Head master next Acknowledgement Requestor Linked list CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Distributed Shared Memory systems are classified on the basis of the following criteria 1. Implementation – Hardware, Software or Hybrid 2. Architecture Configuration or Interconnection Topology – Bus, Ring, Cube, MIN etc. 3. Shared Data Organization – Structured (Objects, higher level types etc. ) or Non-structured. 4. Granularity/Coherence unit – Word, Cache, Block, Page, Object etc. 5. DSM Algorithms – SRSW, MRMW. 6. Management Responsibility – Distributed or centralized 7. Consistency Model 8. Coherence Protocol CSC 8420 Advanced Operating Systems Georgia State University Yi Pan

Processor Cache Local Bus Read/Write LB to VB Interface Node Local Memory I/O CSC 8420 Advanced Operating Systems I/O Lock Space Control Spc Data Spc Write Network Interconnect RM BUS Write VME Bus Georgia State University Yi Pan

1. Implementation – Hybrid, Reflective memory + library routines 2. Architecture Configuration or Interconnection Topology – Hierarchical bus 3. Shared Data Organization – Structured (Data structure) 4. Granularity/Coherence unit – Data structure 5. DSM Algorithms – MRMW. 6. Management Responsibility - Distributed 7. Consistency Model – Entry consistency 8. Coherence Protocol – Write-update CSC 8420 Advanced Operating Systems Georgia State University Yi Pan