Скачать презентацию Concurrent Systems Parallelism Final Exam Schedule Скачать презентацию Concurrent Systems Parallelism Final Exam Schedule

558642943e6e4bfd9c8eb7d667dcf0c2.ppt

  • Количество слайдов: 55

Concurrent Systems Parallelism Concurrent Systems Parallelism

Final Exam Schedule • CS 1311 Sections L/M/N Tuesday/Thursday 10: 00 A. M. • Final Exam Schedule • CS 1311 Sections L/M/N Tuesday/Thursday 10: 00 A. M. • Exam Scheduled for 8: 00 Friday May 5, 2000 • Physics L 1

Final Exam Schedule • CS 1311 Sections E/F Tuesday/Thursday 2: 00 P. M. • Final Exam Schedule • CS 1311 Sections E/F Tuesday/Thursday 2: 00 P. M. • Exam Scheduled for 2: 50 Wednesday May 3, 2000 • Physics L 1

Concurrent Systems Concurrent Systems

Sequential Processing • All of the algorithms we’ve seen so far are sequential: – Sequential Processing • All of the algorithms we’ve seen so far are sequential: – They have one “thread” of execution – One step follows another in sequence – One processor is all that is needed to run the algorithm

A Non-sequential Example • Consider a house with a burglar alarm system. • The A Non-sequential Example • Consider a house with a burglar alarm system. • The system continually monitors: – The front door – The back door – The sliding glass door – The door to the deck – The kitchen windows – The living room windows – The bedroom windows • The burglar alarm is watching all of these “at once” (at the same time).

Another Non-sequential Example • Your car has an onboard digital dashboard that simultaneously: – Another Non-sequential Example • Your car has an onboard digital dashboard that simultaneously: – Calculates how fast you’re going and displays it on the speedometer – Checks your oil level – Checks your fuel level and calculates consumption – Monitors the heat of the engine and turns on a light if it is too hot – Monitors your alternator to make sure it is charging your battery

Concurrent Systems • A system in which: – Multiple tasks can be executed at Concurrent Systems • A system in which: – Multiple tasks can be executed at the same time – The tasks may be duplicates of each other, or distinct tasks – The overall time to perform the series of tasks is reduced

Advantages of Concurrency • Concurrent processes can reduce duplication in code. • The overall Advantages of Concurrency • Concurrent processes can reduce duplication in code. • The overall runtime of the algorithm can be significantly reduced. • More real-world problems can be solved than with sequential algorithms alone. • Redundancy can make systems more reliable.

Disadvantages of Concurrency • Runtime is not always reduced, so careful planning is required Disadvantages of Concurrency • Runtime is not always reduced, so careful planning is required • Concurrent algorithms can be more complex than sequential algorithms • Shared data can be corrupted • Communications between tasks is needed

Achieving Concurrency • Many computers today have more than one processor (multiprocessor machines) CPU Achieving Concurrency • Many computers today have more than one processor (multiprocessor machines) CPU 1 CPU 2 bus Memory

Achieving Concurrency • Concurrency can also be achieved on a computer with only one Achieving Concurrency • Concurrency can also be achieved on a computer with only one processor: – The computer “juggles” jobs, swapping its attention to each in turn – “Time slicing” allows many users to get CPU resources – Tasks may be suspended while they wait for something, such as device I/O task 1 task 2 ZZZZ CPU task 3 ZZZZ

Concurrency vs. Parallelism • Concurrency is the execution of multiple tasks at the same Concurrency vs. Parallelism • Concurrency is the execution of multiple tasks at the same time, regardless of the number of processors. • Parallelism is the execution of multiple processors on the same task.

Types of Concurrent Systems • • Multiprogramming Multiprocessing Multitasking Distributed Systems Types of Concurrent Systems • • Multiprogramming Multiprocessing Multitasking Distributed Systems

Multiprogramming • Share a single CPU among many users or tasks. • May have Multiprogramming • Share a single CPU among many users or tasks. • May have a time-shared algorithm or a priority algorithm for determining which task to run next • Give the illusion of simultaneous processing through rapid swapping of tasks (interleaving).

Multiprogramming Memory User 1 User 2 CPU User 1 User 2 Multiprogramming Memory User 1 User 2 CPU User 1 User 2

Tasks/Users Multiprogramming 4 3 2 1 1 2 CPU’s 3 4 Tasks/Users Multiprogramming 4 3 2 1 1 2 CPU’s 3 4

Multiprocessing • Executes multiple tasks at the same time • Uses multiple processors to Multiprocessing • Executes multiple tasks at the same time • Uses multiple processors to accomplish the tasks • Each processor may also timeshare among several tasks • Has a shared memory that is used by all the tasks

Multiprocessing Memory User 1: Task 1 User 1: Task 2 User 2: Task 1 Multiprocessing Memory User 1: Task 1 User 1: Task 2 User 2: Task 1 CPU User 2

Tasks/Users Multiprocessing 4 Shared Memory 3 2 1 1 2 CPU’s 3 4 Tasks/Users Multiprocessing 4 Shared Memory 3 2 1 1 2 CPU’s 3 4

Multitasking • A single user can have multiple tasks running at the same time. Multitasking • A single user can have multiple tasks running at the same time. • Can be done with one or more processors. • Used to be rare and for only expensive multiprocessing systems, but now most modern operating systems can do it.

Multitasking Memory User 1: Task 1 User 1: Task 2 User 1: Task 3 Multitasking Memory User 1: Task 1 User 1: Task 2 User 1: Task 3 CPU User 1

Multitasking Tasks 4 Single User 3 2 1 1 2 CPU’s 3 4 Multitasking Tasks 4 Single User 3 2 1 1 2 CPU’s 3 4

Distributed Systems Multiple computers working together with no central program “in charge. ” ATM Distributed Systems Multiple computers working together with no central program “in charge. ” ATM Buford ATM Student Ctr ATM Perimeter ATM North Ave Central Bank

Distributed Systems • Advantages: – No bottlenecks from sharing processors – No central point Distributed Systems • Advantages: – No bottlenecks from sharing processors – No central point of failure – Processing can be localized for efficiency • Disadvantages: – Complexity – Communication overhead – Distributed control

Questions? Questions?

Questions? Questions?

Parallelism Parallelism

Parallelism • Using multiple processors to solve a single task. • Involves: – Breaking Parallelism • Using multiple processors to solve a single task. • Involves: – Breaking the task into meaningful pieces – Doing the work on many processors – Coordinating and putting the pieces back together.

Parallelism Network Interface Memory CPU Parallelism Network Interface Memory CPU

Parallelism Tasks 4 3 2 1 1 2 CPU’s 3 4 Parallelism Tasks 4 3 2 1 1 2 CPU’s 3 4

Pipeline Processing Repeating a sequence of operations or pieces of a task. Allocating each Pipeline Processing Repeating a sequence of operations or pieces of a task. Allocating each piece to a separate processor and chaining them together produces a pipeline, completing tasks faster. input A B C D output

Example • Suppose you have a choice between a washer and a dryer each Example • Suppose you have a choice between a washer and a dryer each having a 30 minutes cycle or • A washer/dryer with a one hour cycle • The correct answer depends on how much work you have to do.

One Load Transfer Overhead wash combo dry One Load Transfer Overhead wash combo dry

Three Loads wash dry wash combo dry combo Three Loads wash dry wash combo dry combo

Examples of Pipelined Tasks • Automobile manufacturing • Instruction processing within a computer A Examples of Pipelined Tasks • Automobile manufacturing • Instruction processing within a computer A 2 1 B 3 4 2 1 C 5 3 2 1 0 1 2 3 2 time 4 5 4 3 1 D 5 4 5 5 4 3 6 7

Task Queues • A supervisor processor maintains a queue of tasks to be performed Task Queues • A supervisor processor maintains a queue of tasks to be performed in shared memory. • Each processor queries the queue, dequeues the next task and performs it. • Task execution may involve adding more tasks to the task queue. P 1 P 2 Super P 3 Pn Task Queue

Parallelizing Algorithms How much gain can we get from parallelizing an algorithm? Parallelizing Algorithms How much gain can we get from parallelizing an algorithm?

Parallel Bubblesort We can use N/2 processors to do all the comparisons at once, Parallel Bubblesort We can use N/2 processors to do all the comparisons at once, “flopping” the pair-wise comparisons. 93 87 74 65 57 45 33 27 87 93 65 74 45 57 27 33 87 65 93 45 74 27 57 33

Runtime of Parallel Bubblesort 3 65 87 4 65 45 6 45 93 27 Runtime of Parallel Bubblesort 3 65 87 4 65 45 6 45 93 27 74 33 57 87 27 93 33 74 57 65 27 87 33 93 57 74 45 27 65 33 57 93 74 7 27 45 33 65 57 87 74 93 8 27 33 45 57 65 74 87 93 87

Completion Time of Bubblesort • Sequential bubblesort finishes in N 2 time. • Parallel Completion Time of Bubblesort • Sequential bubblesort finishes in N 2 time. • Parallel bubblesort finishes in N time. O(N) parallel Bubble Sort O(N 2)

Product Complexity • • Got done in O(N) time, better than O(N 2) Each Product Complexity • • Got done in O(N) time, better than O(N 2) Each time “chunk” does O(N) work There are N time chunks. Thus, the amount of work is still O(N 2) • Product complexity is the amount of work per “time chunk” multiplied by the number of “time chunks” – the total work done.

Ceiling of Improvement • Parallelization can reduce time, but it cannot reduce work. The Ceiling of Improvement • Parallelization can reduce time, but it cannot reduce work. The product complexity cannot change or improve. • How much improvement can parallelization provide? – Given an O(NLog. N) algorithm and Log N processors, the algorithm will take at least O(? ) time. O(N) time. – Given an O(N 3) algorithm and N processors, the algorithm will take at least O(? )2) time. O(N time.

Number of Processors • Processors are limited by hardware. • Typically, the number of Number of Processors • Processors are limited by hardware. • Typically, the number of processors is a power of 2 • Usually: The number of processors is a constant factor, 2 K • Conceivably: Networked computers joined as needed (ala Borg? ).

Adding Processors • A program on one processor – Runs in X time • Adding Processors • A program on one processor – Runs in X time • Adding another processor – Runs in no more than X/2 time – Realistically, it will run in X/2 + time because of overhead • At some point, adding processors will not help and could degrade performance.

Overhead of Parallelization • Parallelization is not free. • Processors must be controlled and Overhead of Parallelization • Parallelization is not free. • Processors must be controlled and coordinated. • We need a way to govern which processor does what work; this involves extra work. • Often the program must be written in a special programming language for parallel systems. • Often, a parallelized program for one machine (with, say, 2 K processors) doesn’t work on other machines (with, say, 2 L processors).

What We Know about Tasks • Relatively isolated units of computation • Should be What We Know about Tasks • Relatively isolated units of computation • Should be roughly equal in duration • Duration of the unit of work must be much greater than overhead time • Policy decisions and coordination required for shared data • Simpler algorithm are the easiest to parallelize

Questions? Questions?

More? More?

Matrix Multiplication Matrix Multiplication

Inner Product Procedure inner_prod(a, b, c isoftype in/out Matrix, i, j isoftype in Num) Inner Product Procedure inner_prod(a, b, c isoftype in/out Matrix, i, j isoftype in Num) // Compute inner product of a[i][*] and b[*][j] Sum isoftype Num k isoftype Num Sum <- 0 k <- 1 loop exitif(k > n) sum <- sum + a[i][k] * b[k][j] k < k + 1 endloop endprocedure // inner_prod

Matrix definesa Array[1. . N] of Num N is // Declare constant defining size Matrix definesa Array[1. . N] of Num N is // Declare constant defining size // of arrays Algorithm P_Demo a, b, c isoftype Matrix Shared server isoftype Num Initialize(NUM_SERVERS) // Input a and b here // (code not shown) i, j isoftype Num

i <- 1 loop exitif(i > N) server <- (i * NUM_SERVERS) DIV N i <- 1 loop exitif(i > N) server <- (i * NUM_SERVERS) DIV N j <- 1 loop exitif(j > N) RThread(server, inner_prod(a, b, c, i, j )) j <- j + 1 endloop i <- i + 1 endloop Parallel_Wait(NUM_SERVERS) // Output c here endalgorithm // P_Demo

Questions? Questions?