Cluster Computing An Introduction 金仲達國立清華大學資訊程學系 king cs

Cluster Computing: An Introduction 金仲達國立清華大學資訊程學系 king@cs. nthu. edu. tw

Clusters Have Arrived 1

What is a Cluster? A collection of independent computer systems working together as if a single system o Coupled through a scalable, high bandwidth, low latency interconnect o The nodes can exist in a single cabinet or be separated and connected via a network o Faster, closer connection than a network (LAN) o Looser connection than a symmetric multiprocessor o 2

Outline o Motivations of Cluster Computing o Cluster Classifications o Cluster Architecture & its Components o Cluster Middleware o Representative Cluster Systems o Task Forces on Cluster o Resources and Conclusions 3

Motivations of Cluster Computing 4

How to Run Applications Faster ? o There are three ways to improve performance: m Work harder m Work smarter m Get help o Computer analogy m Use faster hardware: e. g. reduce the time per instruction (clock cycle) m Optimized algorithms and techniques m Multiple computers to solve problem => techniques of parallel processing is mature and can be exploited commercially 5

Motivation for Using Clusters Performance of workstations and PCs is rapidly improving o Communications bandwidth between computers is increasing o Vast numbers of under-utilized workstations with a huge number of unused processor cycles o Organizations are reluctant to buy large, high performance computers, due to the high cost and short useful life span o 6

Motivation for Using Clusters o Workstation clusters are thus a cheap and readily available approach to high performance computing m Clusters are easier to integrate into existing networks m Development tools for workstations are mature l o Threads, PVM, MPI, DSM, C, C++, Java, etc. Use of clusters as a distributed compute resource is cost effective --- incremental growth of system!!! m Individual node performance can be improved by adding additional resource (new memory blocks/disks) m New nodes can be added or nodes can be removed m Clusters of Clusters and Metacomputing 7

Key Benefits of Clusters High performance: running cluster enabled programs o Scalability: adding servers to the cluster or by adding more o clusters to the network as the need arises or CPU to SMP High throughput o System availability (HA): offer inherent high system o availability due to the redundancy of hardware, operating systems, and applications o Cost-effectively 8

Why Cluster Now? 9

Hardware and Software Trends o Important advances taken place in the last five year m Network performance increased with reduced cost m Workstation performance improved l l l Average number of transistors on a chip grows 40% per year Clock frequency growth rate is about 30% per year Expect 700 -MHz processors with 100 M transistors in early 2000 m Availability of powerful and stable operating systems (Linux, Free. BSD) with source code access 10

Why Clusters NOW? o Clusters gained momentum when three technologies converged: m Very high performance microprocessors l workstation performance = yesterday supercomputers m High speed communication m Standard tools for parallel/ distributed computing & their growing popularity o Time to market => performance o Internet services: huge demands for scalable, available, dedicated internet servers m big I/O, big compute 11

Efficient Communication o The key enabling technology: from killer micro to killer switch m Single chip building block for scalable networks l l l high bandwidth low latency very reliable m Challenges l l l for clusters greater routing delay and less than complete reliability constraints on where the network connects into the node UNIX has a rigid device and scheduling interface 12

Putting Them Together. . . Building block = complete computers (HW & SW) shipped in 100, 000 s: Killer micro, Killer DRAM, Killer disk, Killer OS, Killer packaging, Killer investment o Leverage billion $ per year investment o Interconnecting building blocks => Killer Net o c High bandwidth c Low latency c Reliable c Commodity (ATM, Gigabit Ethernet, Myrid. Net) 13

Windows of Opportunity o The resources available in the average clusters offer a number of research opportunities, such as m Parallel processing: use multiple computers to build MPP/DSM-like system for parallel computing m Network RAM: use the memory associated with each workstation as an aggregate DRAM cache m Software RAID: use the arrays of workstation disks to provide cheap, highly available, and scalable file storage m Multipath communication: use the multiple networks for parallel data transfer between nodes 14

Windows of Opportunity o Most high-end scalable WWW servers are clusters m end services (data, web, enhanced information services, reliability) o Network mediation services also cluster-based m Inktomi traffic server, etc. m Clustered proxy caches, clustered firewalls, etc. m => These object web applications increasingly compute intensive m => These applications are an increasing part of the “scientific computing” 15

Classification of Cluster Computers 16

Clusters Classification 1 o Based m. High on Focus (in Market) performance (HP) clusters l Grand m. High challenging applications availability (HA) clusters l Mission critical applications 17

HA Clusters 18

Clusters Classification 2 o Based on Workstation/PC Ownership m. Dedicated clusters m. Non-dedicated clusters l Adaptive parallel computing l Can be used for CPU cycle stealing 19

Clusters Classification 3 o Based on Node Architecture m. Clusters of PCs (Co. Ps) m. Clusters of Workstations (COWs) m. Clusters of SMPs (CLUMPs) 20

Clusters Classification 4 o Based on Node Components Architecture & Configuration: m. Homogeneous clusters l All nodes have similar configuration m. Heterogeneous clusters l Nodes based on different processors and running different OS 21

Clusters Classification 5 o Based on Levels of Clustering: m Group clusters (# nodes: 2 -99) l. A set of dedicated/non-dedicated computers --mainly connected by SAN like Myrinet m Departmental clusters (# nodes: 99 -999) m Organizational clusters (# nodes: many 100 s) m Internet-wide clusters = Global clusters (# nodes: 1000 s to many millions) l Metacomputing 22

Clusters and Their Commodity Components 23

Cluster Computer Architecture 24

Cluster Components. . . 1 a Nodes Multiple high performance components: m PCs m Workstations m SMPs (CLUMPS) m Distributed HPC systems leading to Metacomputing o They can be based on different architectures and running different OS o 25

Cluster Components. . . 1 b Processors o There are many (CISC/RISC/VLIW/Vector. . ) m Intel: Pentiums, Xeon, Merced…. m Sun: SPARC, ULTRASPARC m HP PA m IBM RS 6000/Power. PC m SGI MPIS m Digital Alphas o Integrating memory, processing and networking into a single chip m IRAM (CPU & Mem): (http: //iram. cs. berkeley. edu) m Alpha 21366 (CPU, Memory Controller, NI) 26

Cluster Components… 2 OS o State of the art OS: m Tend to be modular: can easily be extended and new subsystem can be added without modifying the underlying OS structure m Multithread has added a new dimension to parallel processing m Popular OS used on nodes of clusters: l l l Linux Microsoft NT SUN Solaris IBM AIX …. . (Beowulf) (Illinois HPVM) (Berkeley NOW) (IBM SP 2) 27

Cluster Components… 3 High Performance Networks Ethernet (10 Mbps) o Fast Ethernet (100 Mbps) o Gigabit Ethernet (1 Gbps) o SCI (Dolphin - MPI- 12 usec latency) o ATM o Myrinet (1. 2 Gbps) o Digital Memory Channel o FDDI o 28

Cluster Components… 4 Network Interfaces Dedicated Processing power and storage embedded in the Network Interface o An I/O card today o Tomorrow on chip? o Mryicom Net 160 MB/s Myricom NIC P M I/O bus (S-Bus) 50 MB/s M $ P Sun Ultra 170 29

Cluster Components… 4 Network Interfaces o Network interface card m Myrinet has NIC m User-level access support: VIA m Alpha 21364 processor integrates processing, memory controller, network interface into a single chip. . 30

Cluster Components… 5 Communication Software o Traditional OS supported facilities (but heavy weight due to protocol processing). . m Sockets o (TCP/IP), Pipes, etc. Light weight protocols (user-level): minimal Interface into OS m User must transmit directly into and receive from the network without OS intervention m Communication protection domains established by interface card and OS m Treat message loss as an infrequent case m Active Messages (Berkeley), Fast Messages (UI), . . . 31

Cluster Components… 6 a Cluster Middleware Resides between OS and applications and offers an infrastructure for supporting: m Single System Image (SSI) m System Availability (SA) o SSI makes collection of computers appear as a single machine (globalized view of system resources) o SA supports check pointing and process migration, etc. o 32

Cluster Components… 6 b Middleware Components o Hardware m DEC Memory Channel, DSM (Alewife, DASH) SMP techniques o OS/gluing layers m Solaris o MC, Unixware, Glunix Applications and Subsystems m System management and electronic forms m Runtime systems (software DSM, PFS etc. ) m Resource management and scheduling (RMS): l CODINE, LSF, PBS, NQS, etc. 33

Cluster Components… 7 a Programming Environments o Threads (PCs, SMPs, NOW, . . ) m POSIX Threads m Java Threads o MPI m Linux, NT, on many Supercomputers PVM o Software DSMs (Shmem) o 34

Cluster Components… 7 b Development Tools? o Compilers m C/C++/Java/ RAD (rapid application development tools): GUI based tools for parallel processing modeling o Debuggers o Performance monitoring and analysis tools o Visualization tools o 35

Cluster Components… 8 Applications Sequential o Parallel/distributed (cluster-aware applications) m Grand challenging applications o l Weather Forecasting l Quantum Chemistry l Molecular Biology Modeling l Engineering Analysis (CAD/CAM) l ………………. m Web servers, data-mining 36

Cluster Middleware and Single System Image 37

Middleware Design Goals o Complete transparency m Let l o users see a single cluster system Single entry point, ftp, telnet, software loading. . . Scalable performance m Easy l o growth of cluster no change of API and automatic load distribution Enhanced availability m Automatic l recovery from failures Employ checkpointing and fault tolerant technologies m Handle consistency of data when replicated. . 38

Single System Image (SSI) A single system image is the illusion, created by software or hardware, that a collection of computers appear as a single computing resource o Benefits: o m Usage of system resources transparently m Improved reliability and higher availability m Simplified system management m Reduction in the risk of operator errors m User need not be aware of the underlying system architecture to use these machines effectively 39

Desired SSI Services o Single entry point m telnet cluster. my_institute. edu m telnet node 1. cluster. my_institute. edu Single file hierarchy: AFS, Solaris MC Proxy o Single control point: manage from single GUI o Single virtual networking o Single memory space - DSM o Single job management: Glunix, Condin, LSF o Single user interface: like workstation/PC windowing environment o 40

SSI Levels o Single system support can exist at different levels within a system, one is able to be built on another Application and Subsystem Level Operating System Kernel Level Hardware Level 41

Availability Support Functions o Single I/O space (SIO): m Any node can access any peripheral or disk devices without the knowledge of physical location. o Single process space (SPS) m o Any process can create processes on any node, and they can communicate through signals, pipes, etc, as if they were one a single node Checkpointing and process migration m Saves the process state and intermediate results in memory or disk; process migration for load balancing o Reduction in the risk of operator errors 45

Relationship among Middleware Modules 46

Strategies for SSI o Build as a layer on top of existing OS (e. g. Glunix) m Benefits: l Makes the system quickly portable, tracks vendor software upgrades, and reduces development time l New systems can be built quickly by mapping new services onto the functionality provided by the layer beneath, e. g. Glunix/Solaris-MC o Build SSI at the kernel level (True Cluster OS) m Good, but can’t leverage of OS improvements by vendor m e. g. Unixware and Mosix (built using BSD Unix) 47

Representative Cluster Systems 48

Research Projects of Clusters Beowulf: Cal. Tech, JPL, and NASA o Condor: Wisconsin State University o DQS (Distributed Queuing System): Florida State U. o HPVM (High Performance Virtual Machine): UIUC& UCSB o Gardens: Queensland U. of Technology, AU o NOW (Network of Workstations): UC Berkeley o PRM (Prospero Resource Manager): USC 49 o

Commercial Cluster Software Codine (Computing in Distributed Network Environment): GENIAS Gmb. H, Germany o Load. Leveler: IBM Corp. o LSF (Load Sharing Facility): Platform Computing o NQE (Network Queuing Environment): Craysoft o RWPC: Real World Computing Partnership, Japan o Unixware: SCO o Solaris-MC: Sun Microsystems o 50

Comparison of 4 Cluster Systems 54

Task Forces on Cluster Computing 55

IEEE Task Force on Cluster Computing (TFCC) http: //www. dgs. monash. edu. au/~rajkumar/tfcc/ http: //www. dcs. port. ac. uk/~mab/tfcc/ 56

TFCC Activities Mailing list, workshops, conferences, tutorials, webresources etc. o Resources for introducing the subject in senior undergraduate and graduate levels o Tutorials/workshops at IEEE Chapters o …. . and so on. o o Visit TFCC Page for more details: m http: //www. dgs. monash. edu. au/~rajkumar/tfcc/ 57

Efforts in Taiwan PC Farm Project at Academia Sinica Computing Center: http: //www. pcf. sinica. edu. tw/ o NCHC PC Cluster Project: http: //www. nchc. gov. tw/project/pccluster/ o 58

NCHC PC Cluster o A Beowulf class cluster 59

System Hardware 5 Fast Ethernet switching hubs 60

System Software 61

Conclusions o Clusters are promising and fun m Offer incremental growth and match with funding pattern m New trends in hardware and software technologies are likely to make clusters more promising m Cluster-based HP and HA systems can be seen everywhere! 62

The Future Cluster system using idle cycles from computers will continue o Individual nodes will have of multiple processors o Widespread usage of Fast and Gigabit Ethernet and they will become de facto network for clusters o Cluster software bypass OS as much as possible o Unix-based OS are likely to be most popular, but the steady improvement and acceptance of NT will not be far behind o 63

The Challenges o Programming m enable applications, reduce programming effort, distributed object/component models? o Reliability (RAS) m programming o effort, reliability with scalability to 1000’s Heterogeneity m performance, configuration, architecture and interconnect Resource Management (scheduling, perf. pred. ) o System Administration/Management o Input/Output (both network and storage) o 64

Pointers to Literature on Cluster Computing 65

Reading Resources. . 1 a Internet & WWW Computer architecture: m http: //www. cs. wisc. edu/~arch/www/ o PFS and parallel I/O: m http: //www. cs. dartmouth. edu/pario/ o Linux parallel processing: m http: //yara. ecn. purdue. edu/~pplinux/Sites/ o Distributed shared memory: m http: //www. cs. umd. edu/~keleher/dsm. html o 66

Reading Resources. . 1 b Internet & WWW Solaris-MC: m http: //www. sunlabs. com/research/solaris-mc o Microprocessors: recent advances m http: //www. microprocessor. sscc. ru o Beowulf: m http: //www. beowulf. org o Metacomputing m http: //www. sis. port. ac. uk/~mab/Metacomputing/ o 67

Reading Resources. . 2 Books In Search of Cluster m by G. Pfister, Prentice Hall (2 ed), 98 o High Performance Cluster Computing m Volume 1: Architectures and Systems m Volume 2: Programming and Applications l Edited by Rajkumar Buyya, Prentice Hall, NJ, USA. o Scalable Parallel Computing m by K Hwang & Zhu, Mc. Graw Hill, 98 68 o

Reading Resources. . 3 Journals o “A Case of NOW”, IEEE Micro, Feb 1995 m by o “Fault Tolerant COW with SSI”, IEEE Concurrency m by o Anderson, Culler, Paterson Kai Hwang, Chow, Wang, Jin, Xu “Cluster Computing: The Commodity Supercomputing”, Journal of Software Practice and Experience m by Mark Baker & Rajkumar Buyya 69

Cluster Computing An Introduction 金仲達 國立清華大學資訊 程學系 king cs

Cluster Computing An Introduction 金仲達國立清華大學資訊程學系 king cs