Скачать презентацию Parallel Cluster and Grid Computing By P S Скачать презентацию Parallel Cluster and Grid Computing By P S

48295f2ad26b7ddacded0937e781ce79.ppt

  • Количество слайдов: 92

Parallel, Cluster and Grid Computing By P. S. Dhekne, BARC dhekne@barc. gov. in August Parallel, Cluster and Grid Computing By P. S. Dhekne, BARC [email protected] gov. in August 23, 2006 Talk at SASTRA 1

High Performance Computing • Branch of computing that deals with extremely powerful computers and High Performance Computing • Branch of computing that deals with extremely powerful computers and the applications that use them • Supercomputers: Fastest computer at any given point of time • HPC Applications: Applications that cannot be solved by conventional computers in a reasonable amount of time August 23, 2006 Talk at SASTRA 2

Supercomputers • Characterized by very high speed, very large memory • Speed measured in Supercomputers • Characterized by very high speed, very large memory • Speed measured in terms of number of floating point operations per second (FLOPS) • Fastest Computer in the world: “Earth Simulator” (NEC, Japan) – 35 Tera Flops • Memory in the order of hundreds of gigabytes or terabytes August 23, 2006 Talk at SASTRA 3

HPC Technologies • Different approaches for building supercomputers – Traditional : Build faster CPUs HPC Technologies • Different approaches for building supercomputers – Traditional : Build faster CPUs • Special Semiconductor technology for increasing clock speed • Advanced CPU architecture: Pipelining, Vector Processing, Multiple functional units etc. – Parallel Processing : Harness large number of ordinary CPUs and divide the job between them August 23, 2006 Talk at SASTRA 4

Traditional Supercomputers • Eg: CRAY • Very complex architecture • Very high clock speed Traditional Supercomputers • Eg: CRAY • Very complex architecture • Very high clock speed results in very high heat dissipation and advanced cooling techniques (Liquid Freon / Liquid Nitrogen) • Custom built or produced as per order • Extremely expensive • Advantages: Program development is conventional and straight forward August 23, 2006 Talk at SASTRA 5

Alternative to Supercomputer • Parallel Computing: the use of multiple computers or processors working Alternative to Supercomputer • Parallel Computing: the use of multiple computers or processors working together on a single problem; harness large number of ordinary CPUs and divide the job between them – each processor works on its section of the problem – processors are allowed to exchange Sequential 1 Parallel 1 cpu 1 2500 2501 5000 cpu 2 5001 cpu 3 7500 7501 10000 cpu 4 10000 information with other processors via fast interconnect path • Big advantages of parallel computers: 1. total computing performance multiples of processors used 2. total very large amount of memory to fit very large programs 3. Much lower cost and can be developed in India August 23, 2006 Talk at SASTRA 6

Types of Parallel Computers • The parallel computers are classified as – – • Types of Parallel Computers • The parallel computers are classified as – – • shared memory distributed memory Both shared and distributed memory systems have: 1. processors: now generally commodity processors 2. memory: now general commodity DRAM/DDR 3. network/interconnect: between the processors or memory August 23, 2006 Talk at SASTRA 7

Interconnect Method There is no single way to connect bunch of processors • The Interconnect Method There is no single way to connect bunch of processors • The manner in which the nodes are connected - Network & Topology • Best choice would be a fully connected network (every processor to every other). Unfeasible for cost and scaling reasons : Instead, processors are arranged in some variation of a grid, torus, tree, bus, mesh or hypercube. 3 -d hypercube August 23, 2006 2 -d mesh Talk at SASTRA 2 -d torus 8

Block Diagrams … Memory Interconnection Network P 1 P 2 P 3 P 4 Block Diagrams … Memory Interconnection Network P 1 P 2 P 3 P 4 P 5 Processors A Shared Memory Parallel Computer August 23, 2006 Talk at SASTRA 9

Block Diagrams … Interconnection Network P 1 P 2 P 3 P 4 P Block Diagrams … Interconnection Network P 1 P 2 P 3 P 4 P 5 M 1 M 2 M 3 M 4 M 5 A Distributed Memory Parallel Computer August 23, 2006 Talk at SASTRA 10

Performance Measurements • Speed of a supercomputer is generally denoted in FLOPS (Floating Point Performance Measurements • Speed of a supercomputer is generally denoted in FLOPS (Floating Point Operations per second) – Mega. Flops (MFLOPS), Million (106)FLOPS – Giga. FLOPS (GFLOPS), Billion (109)FLOPS – Tera. FLOPS (TFLOPS), Trillion (1012) FLOPS August 23, 2006 Talk at SASTRA 11

Sequential vs. Parallel Programming • Conventional programs are called sequential (or serial) programs since Sequential vs. Parallel Programming • Conventional programs are called sequential (or serial) programs since they run on one cpu only as in a conventional (or sequential) computer • Parallel programs are written such that they get divided into multiple pieces, each running independently and concurrently on multiple cpus. • Converting a sequential program to a parallel program is called parallelization. August 23, 2006 Talk at SASTRA 12

Terms and Definitions • Speedup of a parallel program: = Time taken on 1 Terms and Definitions • Speedup of a parallel program: = Time taken on 1 cpus / Time taken on ‘n’ cpus • Ideally Speedup should be ‘n’ August 23, 2006 Talk at SASTRA 13

Terms and Definitions • Efficiency of a parallel program: = Speedup / No. of Terms and Definitions • Efficiency of a parallel program: = Speedup / No. of processors • Ideally efficiency should be 1 (100 %) August 23, 2006 Talk at SASTRA 14

Problem areas in parallel programs • Practically, speedup is always less than ‘n’ and Problem areas in parallel programs • Practically, speedup is always less than ‘n’ and efficiency is always less than 100% • Reason 1: Some portions of the program cannot be run in parallel (cannot be split) • Reason 2: Data needs to be communicated among the cpus. This involves time for sending the data and time in waiting for the data • The challenge in parallel programming is to suitably split the program into pieces such that speedup and efficiencies approach the maximum August 23, 2006 Talk at SASTRA 15

Parallelism • Property of an algorithm that lends itself amenable to parallelization • Parts Parallelism • Property of an algorithm that lends itself amenable to parallelization • Parts of the program that has inherent parallelism can be parallelized (divided into multiple independent pieces that can execute concurrently) August 23, 2006 Talk at SASTRA 16

Types of parallelism • Control parallelism (Algorithmic parallelism): – Different portions (or subroutines/functions) can Types of parallelism • Control parallelism (Algorithmic parallelism): – Different portions (or subroutines/functions) can execute independently and concurrently • Data parallelism – Data can be split up into multiple chunks and processed independently and concurrently – Most scientific applications exhibit data parallelism August 23, 2006 Talk at SASTRA 17

Parallel Programming Models • Different approaches are used in the development of parallel programs Parallel Programming Models • Different approaches are used in the development of parallel programs • Shared Variable Model: Best suited for shared memory parallel computers • Message Passing Model: Best suited for distributed memory parallel computers August 23, 2006 Talk at SASTRA 18

Message Passing • Most commonly used method of parallel programming • Processes in a Message Passing • Most commonly used method of parallel programming • Processes in a parallel program use messages to transfer data between themselves • Also used to synchronize the activities of processes • Typically consists of send/receive operations August 23, 2006 Talk at SASTRA 19

How we started In the absence of any standardization initial parallel machines were designed How we started In the absence of any standardization initial parallel machines were designed with varied architectures having different network topologies BARC started Supercomputing development to meet computing demands of in-house users with the aim to provide inexpensive high-end computing since 1990 -91 and have built several models August 23, 2006 Talk at SASTRA 20

selection of main components • Architecture • Simple • Scalable • Processor Independent • selection of main components • Architecture • Simple • Scalable • Processor Independent • Inter Connecting Network • Scalable bandwidth • Architecture independent • Cost effective • Parallel Software Environment • User friendly • Portable • Comprehensive Debugging tools August 23, 2006 Talk at SASTRA 21

sin. Gle cl. Uste. R of an. Upam Node 0 master Node 1 slave sin. Gle cl. Uste. R of an. Upam Node 0 master Node 1 slave Node 2 slave Node 15 slave 860/XP , 50 MHZ 128 KB-512 KB cache & 64 -Mb 256 Mb memory MULTIBUS II SCSI Ethernet X 25 TC D I S K S other systems TERMINALS August 23, 2006 Talk at SASTRA 22

64 no. De an. Upam confi. GURation Y 0 1 MB II BUS 0 64 no. De an. Upam confi. GURation Y 0 1 MB II BUS 0 wscsi 1 15 wscsi 15 MB II BUS X X wscsi MB II BUS 0 August 23, 2006 1 wscsi 0 15 Y Talk at SASTRA 1 MB II BUS 15 23

8 no. De an. Upam confi. GURation August 23, 2006 Talk at SASTRA 24 8 no. De an. Upam confi. GURation August 23, 2006 Talk at SASTRA 24

ANUPAM APPLICATIONS Finite Element Analysis Protein Structures 64 -Node ANUPAM August 23, 2006 Pressure ANUPAM APPLICATIONS Finite Element Analysis Protein Structures 64 -Node ANUPAM August 23, 2006 Pressure Contour in LCA Duct Talk at SASTRA 25 3 -D Plasma Simulations

2 -D Atmospheric Transport Problem Estimation of Neutron-Gamma Dose up to 8 km from 2 -D Atmospheric Transport Problem Estimation of Neutron-Gamma Dose up to 8 km from the source Problem Specification: Simulations on BARC Computer System Cylindrical geometry Radius = 8 km Height= 8 km No. of mesh points 80, 000 No. of Energy groups 42 SN order 16 Conclusions: Use of 10 processors of the BARC computer system reduces the run time by 6 times. August 23, 2006 Talk at SASTRA 26

ot. He. R applications of an. Upam s. Ystem * Protein Structure Optimization * ot. He. R applications of an. Upam s. Ystem * Protein Structure Optimization * AB Initio Electronic Structure Calculations * Neutron Transport Calculations * AB Initio Molecular Dynamics Simulations * Computational Structure Analysis * Computational Fluid Dynamics ( ADA, LCA) * Computational Turbulent flow * Simulation Studies in Gamma-Ray Astronomy * Finite Element Analysis of Structures * Weather Forecasting August 23, 2006 Talk at SASTRA 27

Key Benefits • Simple to use • ANUPAM uses user familiar Unix environment with Key Benefits • Simple to use • ANUPAM uses user familiar Unix environment with large memory & specially designed parallelizing tools • No parallel language needed • PSIM – parallel simulator runs on any Unix based system • Scalable and processor independent August 23, 2006 Talk at SASTRA 28

Bus based architecture • Dynamic Interconnection network providing full connectivity and high speed and Bus based architecture • Dynamic Interconnection network providing full connectivity and high speed and TCP/IP support • Simple and general purpose industry back-plain bus • Easily available off-the-shelf, low cost • Multi. Bus, VME Bus, Futurebus … many solutions Disadvantages • One communication at a time • Limited scalability of applications in bus based systems • Lengthy development cycle for specialized hardware • i 860, Multibus-II reaching end of line, so radical change in architecture was needed August 23, 2006 Talk at SASTRA 29

Typical Computing, Memory & Device Attachment CPU M e m o r y B Typical Computing, Memory & Device Attachment CPU M e m o r y B u s August 23, 2006 Memory Input/Output Bus Talk at SASTRA Device Card 30

Memory Hierarchy CPU Cache Local Memory Remote Memory August 23, 2006 Talk at SASTRA Memory Hierarchy CPU Cache Local Memory Remote Memory August 23, 2006 Talk at SASTRA 31

Ethernet: The Unibus of the 80 s (UART of the 90 s) Clients compute Ethernet: The Unibus of the 80 s (UART of the 90 s) Clients compute server print server file server comm server 2 Km August 23, 2006 Talk at SASTRA 32

Ethernet: The Unibus of the 80 s • Ethernet designed for – DEC: Interconnect Ethernet: The Unibus of the 80 s • Ethernet designed for – DEC: Interconnect VAXen, terminals – Xerox: enable distributed computing (SUN Micro) • Ethernet evolved into a hodge podge of nets and boxes • Distributed computing was very hard, evolving into – – expensive, assymmetric, hard to maintain, client server for a Vendor. IX apps are bound to a configuration & Vendor. IX! network is NOT the computer • Internet model is less hierarchical, more democratic August 23, 2006 Talk at SASTRA 33

Networks of workstations (NOW) • New concept in parallel computing and parallel computers • Networks of workstations (NOW) • New concept in parallel computing and parallel computers • Nodes are full-fledged workstations having cpu, memory, disks, OS etc. • Interconnection through commodity networks like Ethernet, ATM, FDDI etc. • Reduced Development Cycle, mostly restricted to software • Switched Network topology August 23, 2006 Talk at SASTRA 34

Typical ANUPAM x 86 Cluster NODE-01 NODE-02 NODE-03 NODE-04 NODE-05 NODE-06 NODE-07 NODE-08 UPLINK Typical ANUPAM x 86 Cluster NODE-01 NODE-02 NODE-03 NODE-04 NODE-05 NODE-06 NODE-07 NODE-08 UPLINK FAST ETHERNET SWITCH FILE SERVER NODE-09 NODE-10 NODE-11 NODE-12 NODE-13 NODE-14 NODE-15 NODE-16 CAT-5 CABLE August 23, 2006 Talk at SASTRA 35

ANUPAM - Alpha • Each node is a complete Alpha workstation with 21164 cpu, ANUPAM - Alpha • Each node is a complete Alpha workstation with 21164 cpu, 256 MB memory, Digital UNIX OS etc. • Interconnection thru ATM switch with fiber optic links @ 155 Mbps August 23, 2006 Talk at SASTRA 36

PC Clusters : Multiple PCs • Over the last few years, computing power of PC Clusters : Multiple PCs • Over the last few years, computing power of Intel PCs have gone up considerably (from 100 MHz to 3. 2 GHz in 8 years) with fast, cheap network & disk (in built ) • Intel processors beating conventional RISC chips in performance • PCs are freely available from several vendors • Emergence of free Linux as a robust, efficient OS with plenty of applications • Linux clusters (use of multiple PCs) are now rapidly gaining popularity in academic/research institutions because of low cost, high performance and availability of source code August 23, 2006 Talk at SASTRA 37

Trends in Clustering is not a new idea, it has become affordable, can be Trends in Clustering is not a new idea, it has become affordable, can be build easily (plug&play) now. Even Small colleges have it. August 23, 2006 Talk at SASTRA 38

Cluster based Systems ` Clustering is replacing all traditional Computing platforms and can be Cluster based Systems ` Clustering is replacing all traditional Computing platforms and can be configured depending on the method and applied areas LB Cluster - Network load distribution and LB HA Cluster - Increase the Availability of systems HPC Cluster (Scientific Cluster) - Computation-intensive Web farms - Increase HTTP/SEC Rendering Cluster – Increase Graphics speed HPC : High Performance Computing HA : High Availability LB : Load Balancing August 23, 2006 Talk at SASTRA 39

Computing Trends • It is fully expected that the substantial and exponential increases in Computing Trends • It is fully expected that the substantial and exponential increases in performance of IT will continue for the foreseeable future ( at least next 50 years) in terms of – CPU Power ( 2 X – every 18 months) – Memory Capacity (2 X – every 18 months) – LAN/WAN speed (2 X – every 9 months) – Disk Capacity (2 X – every 12 months) • It is expected that all computing resources will continue to become cheaper and faster, though not necessarily faster than the computing problems we are trying to solve. August 23, 2006 Talk at SASTRA 40

Processor Speed Comparison SPECint_base 2000 Sr. No. Processor SPECfp_base 2000 1 Pentium-III, 550 MHz Processor Speed Comparison SPECint_base 2000 Sr. No. Processor SPECfp_base 2000 1 Pentium-III, 550 MHz 231 191 2 Pentium-IV, 1. 7 GHz 574 591 3 Pentium-IV, 2. 4 GHz 852 840 4 Alpha, 833 MHz (64 bit) 511 571 5 Alpha, 1 GHz (64 bit) 621 776 6 Intel Itanium-2, 900 MHz (64 bit) 810 1356 August 23, 2006 Talk at SASTRA 41

Technology Gaps • Sheer CPU speed is not enough • Matching of Processing speed, Technology Gaps • Sheer CPU speed is not enough • Matching of Processing speed, compiler performance, cache size and speed, memory size and speed, disk size and speed, and network size and speed, interconnect & topology is also important • Application and middleware software also adds to performance degradation if not good August 23, 2006 Talk at SASTRA 42

Interconnect-Related Terms • Most critical component of HPC still remains to be interconnect technology Interconnect-Related Terms • Most critical component of HPC still remains to be interconnect technology and network topology • Latency: – Networks: How long does it take to start sending a "message"? Measured in microseconds- startup time – Processors: How long does it take to output results of some operations, such as floating point add, divide etc. , which are pipelined? ) • Bandwidth: What data rate can be sustained once the message is started? Measured in Mbytes/sec or Gbytes/sec August 23, 2006 Talk at SASTRA 43

High Speed Networking • Network bandwidth is improving – LAN are having 10, 1000, High Speed Networking • Network bandwidth is improving – LAN are having 10, 1000, 10000 Mbps – WAN are based on ATM with 155, 622, 2500 Mbps • With constant advances Communication technology in Information & - Processors and Networks are merging into one infrastructure - System Area or Storage Area Networks (Myrinet, Craylink, Fiber channel, SCI etc): Low latency, High Bandwidth, Scalable to large numbers of nodes August 23, 2006 Talk at SASTRA 44

Processor Interconnect Technology Sr. No. Communication Technology Bandwidth MBits/sec Latency time Microseconds 1 Fast Processor Interconnect Technology Sr. No. Communication Technology Bandwidth MBits/sec Latency time Microseconds 1 Fast Ethernet 100 60 2 Gigabit Ethernet 1000 100 3 SCI based Woulfkit 2500 1. 5 -14 4 Infini. Band 10, 000 <400 nsec 5 Quadric Switch 2500 2 -10 6 August 23, 2006 10 -G Ethernet 10, 000 Talk at SASTRA <100 45

Interconnect Comparison Feature Fast Ethernet Gigabit SCI Latency 88. 07µs 44. 93µs (16. 88 Interconnect Comparison Feature Fast Ethernet Gigabit SCI Latency 88. 07µs 44. 93µs (16. 88 µs) 5. 55 µs (1. 61 µs) Bandwidth 11 Mbytes/Sec 90 Mbytes/Sec 250 Mbytes/Sec v Across Machines v Two processes in a given machine August 23, 2006 Talk at SASTRA 46

Switched Network Topology • Interconnection Networks such as ATM, Ethernet etc. are available as Switched Network Topology • Interconnection Networks such as ATM, Ethernet etc. are available as switched networks • Switch implements a dynamic interconnection network providing all-to-all connectivity on demand • Switch allows multiple independent communications simultaneously • Full duplex mode of communication • Disadvantages: Single point of failure, Finite capacity • Disadvantages: Scalability, Cost for higher node count August 23, 2006 Talk at SASTRA 47

Scalable Coherent Interface (SCI) • High Bandwidth, Low latency SAN interconnect for clusters of Scalable Coherent Interface (SCI) • High Bandwidth, Low latency SAN interconnect for clusters of workstations (IEEE 1596) • Standard for point to point links between computers • Various topologies possible: Ring, Tree, Switched Rings, Torus etc. • Peak Bandwidth: 667 MB/s, Latency < 5 microseconds August 23, 2006 Talk at SASTRA 48

ANUPAM-Xeon Performance Gigabit Ethernet SCI Total latency ( s) Latency within node ( s) ANUPAM-Xeon Performance Gigabit Ethernet SCI Total latency ( s) Latency within node ( s) Total Latency ( s) Latency within node ( s) 44. 93 16. 88 5. 55 1. 61 August 23, 2006 Talk at SASTRA 49

ANUPAM P-Xeon Parallel Supercomputer No. Of nodes : Compute Node: 128 Processor: Dual Intel ANUPAM P-Xeon Parallel Supercomputer No. Of nodes : Compute Node: 128 Processor: Dual Intel Pentium Xeon @ 2. 4 GHz Memory : 2 GB per node File Server: Dual Intel based with RAID 5, 360 GB Interconnection networks: 64 bit Scalable Coherent Interface (2 D Torus connectivity) Software: Linux, MPI, PVM, ANULIB Anulib Tools: PSIM, FFLOW, SYN, PRE, S_TRACE Benchmarked Performance: 362 GFLOPS for High Performance Linpack August 23, 2006 Talk at SASTRA 50

ANUPAM clusters ANU 64 ASHVA Sustained speed on 84 P-III processors: 15 GFLOPS Year ANUPAM clusters ANU 64 ASHVA Sustained speed on 84 P-III processors: 15 GFLOPS Year of introduction : - 2001 ARUNA Year: 2002, Sustained speed on 64 P-IV cpus : 72 GFLOPS Sustained speed on 128 Xeon processors : - 365 GFLOPS Year of introduction : - 2003 August 23, 2006 Talk at SASTRA 51

ANUPAM series of super computers after 1997 ANUPAM-Pentium Date Model Node-Microprocessor Inter Comm. Mflops ANUPAM series of super computers after 1997 ANUPAM-Pentium Date Model Node-Microprocessor Inter Comm. Mflops Oct/98 4 -node PII Pentium PII/266 Mhz Ethernet/100 Mar/99 16 -node PII Pentium PII/333 Mhz Ethernet/100 1300 Mar/00 16 node PIII Pentium PIII/550 Mhz Gigabit Eth. 3500 May/01 84 node PIII Pentium PIII/650 Mhz Gigabit Eth. 15000 June/02 64 Node PIV Pentium PIV/1. 7 GHz Giga & SCI August/03 128 Node-Xeon Pentium/Xeon 2. 4 Ghz Giga & SCI August 23, 2006 Talk at SASTRA 248 72000 362000 52

Table of comparison ( with precise ( 64 bit ) computations ) Program name Table of comparison ( with precise ( 64 bit ) computations ) Program name T-80 (24 Hr- forecast) 1 + 4 Anupam Alpha 14 minutes 1 + 8 Anupam Alpha 11 minutes Cray XMP 216 12. 5 minutes All timings are Wall clock times August 23, 2006 Talk at SASTRA 53

BARC’s New Super Computing Facility External View of New Super Computing Facility 512 Node BARC’s New Super Computing Facility External View of New Super Computing Facility 512 Node ANUPAM-AMEYA q BARC’s new Supercomputing facility was inaugurated by Honorable PM, Dr. Manmohan Singh on 15 th November 2005. q A 512 -node ANUPAM-AMEYA Supercomputer was developed with a speed of 1. 7 Teraflop for HPC benchmark. q A 1024 node Supercomputer ( ~ 5 Tera flop) is planned during 2006 -07 q Being used by in-house users August 23, 2006 Talk at SASTRA 54

Support equipment • Terminal servers – Connect serial consoles from 16 nodes onto a Support equipment • Terminal servers – Connect serial consoles from 16 nodes onto a single ethernet link – Consoles of each node can be accessed using the terminal servers and management network • Power Distribution Units – Network controlled 8 outlet power distribution unit – Facilities such as power sequencing, power cycling of each node possible – Current monitoring • Racks – 14 racks of 42 U height, 1000 mm depth, 600 mm width August 23, 2006 Talk at SASTRA 55

Software components • Operating System on each node of the cluster is Scientific Linux Software components • Operating System on each node of the cluster is Scientific Linux 4. 1 for 64 bit architecture – Fully compatible with Redhat Enterprise Linux – Kernel version 2. 6 • ANUPRO Parallel Programming Environment • Load Sharing and Queuing System • Cluster Management August 23, 2006 Talk at SASTRA 56

ANUPRO Programming Environment • ANUPAM supports following programming interfaces – – MPI PVM Anulib ANUPRO Programming Environment • ANUPAM supports following programming interfaces – – MPI PVM Anulib BSD Sockets • Compilers – Intel Fortran Compiler – Portland Fortran Compiler • Numerical Libraries – BLAS (ATLAS and MKL implementations) – LAPACK (Linear Algebra Package) – Scalapack (Parallel Lapack) • Program development tools – MPI performance monitoring tools (Upshot, Nupshot, Jumpshot) – ANUSOFT tool suite (FFLOW, S_TRACE, ANU 2 MPI, SYN) August 23, 2006 Talk at SASTRA 57

Load Sharing and Queuing System • • • Torque based system resource manager Keep Load Sharing and Queuing System • • • Torque based system resource manager Keep track of available nodes in the system Allot nodes to jobs Maintain job queues with job priority, reservations User level commands to submit jobs, delete jobs, find out job status, find out number of available nodes • Administrator level commands to manage nodes, jobs and queues, priorities, reservations August 23, 2006 Talk at SASTRA 58

ANUNETRA : Cluster Management System • Management and Monitoring of one or more clusters ANUNETRA : Cluster Management System • Management and Monitoring of one or more clusters from a single interface • Monitoring functions: – Status of each node and different metrics (load, memory, disk space, processes, processors, traffic, temperature and so on) – Jobs running on the system – Alerts to the administrators in case of malfunctions or anomalies – Archival of monitored data for future use • Management functions: – Manage each node or groups of nodes (reboots, power cycling, online/offline, queuing and so on) – Job management August 23, 2006 Talk at SASTRA 59

Metric View on Ameya Node View on Ameya August 23, 2006 Talk at SASTRA Metric View on Ameya Node View on Ameya August 23, 2006 Talk at SASTRA 60

SMART: Self Monitoring And Rectifying Tool • Service running on each node which keeps SMART: Self Monitoring And Rectifying Tool • Service running on each node which keeps track of things happening in the system – Hanging jobs – Services terminated abnormally • SMART takes corrective action to remedy the situation and improve availability of the system August 23, 2006 Talk at SASTRA 61

Accounting System • Maintains database entries for each and every job run on the Accounting System • Maintains database entries for each and every job run on the system – Job ID, user name, number of nodes, – Queue name, API (mpi, pvm, anulib) – Submit time, start and end time – End status (finished, cancelled, terminated) • Computes system utilization • User wise, node wise statistics for different periods of time August 23, 2006 Talk at SASTRA 62

Accounting system – Utilization plot August 23, 2006 Talk at SASTRA 63 Accounting system – Utilization plot August 23, 2006 Talk at SASTRA 63

Other tools • Console logger – Logs all console messages of each node into Other tools • Console logger – Logs all console messages of each node into a database for diagnostics purposes • Sync tool – Synchronizes important files across nodes • Automated backup – Scripts for taking periodic backups of user areas onto the tape libraries • Automated installation service – Non-interactive installation of node and all required software August 23, 2006 Talk at SASTRA 64

Parallel File System (PFS) v v PFS gives a different view of I/O system Parallel File System (PFS) v v PFS gives a different view of I/O system with its unique architecture and hence provides an alternative platform for development of I/O intensive applications Data(File) striping in a distributed environment. v Supports collective I/O operations. v Interface as close to a standard LINUX interface as possible. Fast access to file data in parallel environment irrespective of how and where file is distributed. Parallel file scatter and gather operations v v August 23, 2006 Talk at SASTRA 65

Architecture of PFS Manager PFS DAEMON M I T SERVER D I T MIT Architecture of PFS Manager PFS DAEMON M I T SERVER D I T MIT Request I/O Manager Data I/O DAEMON LI T NODE 1 August 23, 2006 I/O DAEMON LI T NODE 2 NODE N Talk at SASTRA 66

Complete solution to scientific problems by exploiting parallelism for v Processing ( parallelization of Complete solution to scientific problems by exploiting parallelism for v Processing ( parallelization of computation) v I/O ( parallel file system) v Visualization (parallelized graphic pipeline/ Tile Display Unit) August 23, 2006 Talk at SASTRA 67

Domain-specific Automatic Parallelization (DAP) Domain: a class of applications such as FEM applications Experts Domain-specific Automatic Parallelization (DAP) Domain: a class of applications such as FEM applications Experts use domain-specific knowledge DAP is a combination of expert system and parallelizing compiler features: interactive process, experience. Key based heuristic techniques, and visual environment August 23, 2006 Talk at SASTRA 68

Operation and Management Tools • Manual installation of all nodes with O. S. , Operation and Management Tools • Manual installation of all nodes with O. S. , compilers, Libraries etc is not only time consuming it is tedious and error prone • Constant monitoring of hardware/networks and software is essential to report healthiness of the system while running 24/7 operation • Debugging and communication measurement tools are needed • Tools are also needed to measure load, free CPU, predict load, checkpoint restart, replace failed node etc. We have developed all these tools to enrich ANUPAM software environment August 23, 2006 Talk at SASTRA 69

Limitations of Parallel Computing • Programming so many nodes concurrently remains a major barrier Limitations of Parallel Computing • Programming so many nodes concurrently remains a major barrier for most applications - Source code should be known & parallisable - Scalable algorithm development is not an easy task - All resources are allotted for a single job - User has to worry about message passing, synchronization and scheduling of his job - 15% users only require these solutions, rest can manage with normal PCs Fortunately lot of free MPI codes and even parallel solvers are now available Still there is large gap between technology & usage as parallel tools are not so user friendly August 23, 2006 Talk at SASTRA 70

Evolution in Hardware • Compute Nodes: – Intel i 860 – Alpha 21 x Evolution in Hardware • Compute Nodes: – Intel i 860 – Alpha 21 x 64 – Intel x 86 • Interconnection Network: – Bus : Multi. Bus-II, Wide SCSI – Switched Network: ATM, Fast Ethernet, Gigabit Ethernet – SAN: Scalable Coherent Interface August 23, 2006 Talk at SASTRA 71

Evolution in Software • Parallel Program Development API: – ANULIB (Proprietary) to MPI (Standard) Evolution in Software • Parallel Program Development API: – ANULIB (Proprietary) to MPI (Standard) • Runtime environment – I/O restricted to master only (860) to Full I/O (Alpha and x 86) – One program at a time (860) to Multiple Programs to Batch operations • Applications – In-house parallel to Ready made parallel applications – Commercially available parallel software August 23, 2006 Talk at SASTRA 72

Issues in building large clusters • Scalability of interconnection network • Scalability of software Issues in building large clusters • Scalability of interconnection network • Scalability of software components – Communication Libraries – I/O Subsystem – Cluster Management Tools – Applications • Installation and Management Procedures • Troubleshooting Procedures August 23, 2006 Talk at SASTRA 73

Other Issues in operating large clusters • Space Management – Node form factor – Other Issues in operating large clusters • Space Management – Node form factor – Layout of the nodes – Cable routing and weight • Power Management • Cooling arrangements August 23, 2006 Talk at SASTRA 74

The P 2 P Computing • Computing based on P 2 P architecture allows The P 2 P Computing • Computing based on P 2 P architecture allows to share distributed resources with each other with or without the support from a server. • How do you manage under utilized resources? - It is seen that utilization of desktop PC is typically <10 %, and this percentage is decreasing even further as PCs are becoming more powerful - Large organizations must be having more than thousand PCs, each delivering > 20 MFlops and this power is growing with every passing day ……Trick is to use Cycle Stealing mode - Each PC now has about 20 Gbyte disc capacity 80 Gb X 1000 = 80 Terabyte storage space is available ; Very large File storage – How do you harness power of so many PCs in a large organization? ……. . Issue of “Owership” hurdle , to be resolved – Latency & bandwidth of LAN environment is quite adequate for P 2 P computing………Space management no problem; use PCs wherever they are!! August 23, 2006 Talk at SASTRA 75

INTERNET COMPUTING • • • Today you can’t run your jobs on the Internet INTERNET COMPUTING • • • Today you can’t run your jobs on the Internet Computing using idle PC’s, is becoming an important computing platform ([email protected], Napster, Gnutella, Freenet, Ka. Zak) – www is now a promising candidate for core component of wide area distributed computing environment. – Efficient Client/server models & protocols – Transparent networking, navigation & GUI with multimedia access & dissemination for data visualization – Mechanism for distributed computing such as CGI. Java With improved performance (price/performance) & the availability of Linux, Web Services ( SOAP, WSDL, UDDI, WSFL), COM technology it is easy to develop loosely coupled distributed applications August 23, 2006 Talk at SASTRA 76

Difficulties in present systems – As technology is constantly changing there is a need Difficulties in present systems – As technology is constantly changing there is a need for regular upgrade/enhancement – Cluster/Servers are not fail safe and fault tolerant. – Many systems are dedicated to a single application, thus idle when application has no load – Many clusters in the organization remain idle – For operating a computer centre 75 % cost come from environment upkeep, staffing, operation and maintenance. – Computers, Networks, Clusters, Parallel Machines and Visual systems are not tightly coupled by software; difficult for users to use it August 23, 2006 Talk at SASTRA 77

Analysis – a very general model Can we tie all components tightly by software? Analysis – a very general model Can we tie all components tightly by software? PCs, SMPs Clusters RAID Disks Problem Solving Environment Menu -Template - Solver - Pre & Post - Mesh High Speed Network Visual Data Server Computer Assisted Science & Engineering CASE August 23, 2006 Talk at SASTRA 78

GRID CONCEPT User Access Point Result Resource Broker Grid Resources August 23, 2006 Talk GRID CONCEPT User Access Point Result Resource Broker Grid Resources August 23, 2006 Talk at SASTRA 79

Are Grids a Solution? “Grid Computing” means different things to different people. Goals of Are Grids a Solution? “Grid Computing” means different things to different people. Goals of Grid Computing Technology Issues Reduce computing costs Clusters Increase computing resources Internet infrastructure Reduce job turnaround time MPP solver adoption Enable parametric analyses Administration of desktop Reduce Complexity to Users Use middleware to automate Increase Productivity Virtual Computing Centre “Dependable, consistent, pervasive access to resources” August 23, 2006 Talk at SASTRA 80

What is needed? Computational Resources Reply ISP Clusters Choice MPP Workstations MPI, PVM, Condor. What is needed? Computational Resources Reply ISP Clusters Choice MPP Workstations MPI, PVM, Condor. . . Matlab Mathematica C, Fortran Java, Perl Java GUI Request Client - RPC like August 23, 2006 Talk at SASTRA G a t e k e e p e r Broker Scheduler Database 81

Why Migrate Processes ? LOAD BALANCING Reduce average response time Speed up individual jobs Why Migrate Processes ? LOAD BALANCING Reduce average response time Speed up individual jobs Gain higher throughput MOVE PROCESS CLOSER TO ITS RESOURCES Use resources effectively Reduce network traffic INCREASE SYSTEMS RELIABILITY MOVE PROCESS TO A MACHINE HOLDING / CONFIDENTIAL DATA August 23, 2006 Talk at SASTRA 82

FILE-SERVER PR-JOB 1 Process PR-JOB 3 PR-JOB 2 Process PR-JOB 3 Process PR-PARL Process FILE-SERVER PR-JOB 1 Process PR-JOB 3 PR-JOB 2 Process PR-JOB 3 Process PR-PARL Process PR-JOB 1 PR-PARL PR-JOB 2 Process PR-PARL Process FILE-SERVER PR-PARL Process August 23, 2006 PR-PARL Process Talk at SASTRA Process 83

What does the Grid do for you? • You submit your work • And What does the Grid do for you? • You submit your work • And the Grid – Finds convenient places for it to be run – Organises efficient access to your data • Caching, migration, replication – Deals with authentication to the different sites that you will be using – Interfaces to local site resource allocation mechanisms, policies – Runs your jobs, Monitors progress, Recovers from problems, Tells you when your work is complete • If there is scope for parallelism, it can also decompose your work into convenient execution units based on the available resources, data distribution August 23, 2006 Talk at SASTRA 84

Main components User Interface (UI): (UI) The place where users logon to the Grid Main components User Interface (UI): (UI) The place where users logon to the Grid Resource Broker (RB): Matches the user requirements with the available (RB) resources on the Grid Information System: Characteristics and status of CE and SE System (Uses “GLUE schema”) Computing Element (CE): A batch queue on a site’s computers where (CE) the user’s job is executed Storage Element (SE): provides (large-scale) storage for files (SE) August 23, 2006 Talk at SASTRA 85

Typical current grid • Virtual organisations negotiate with sites to agree access to resources Typical current grid • Virtual organisations negotiate with sites to agree access to resources • Grid middleware runs on each shared resource to provide INTERNET – Data services – Computation services – Single sign-on • Distributed services (both people and middleware) enable the grid E-infrastructure is the key !!! August 23, 2006 Talk at SASTRA 86

Biomedical applications August 23, 2006 Talk at SASTRA 87 Biomedical applications August 23, 2006 Talk at SASTRA 87

Earth sciences applications • Earth Observations by Satellite – Ozone profiles • Solid Earth Earth sciences applications • Earth Observations by Satellite – Ozone profiles • Solid Earth Physics – Fast Determination of mechanisms of important earthquakes • Hydrology – Management of water resources in Mediterranean area (SWIMED) • Geology – Geocluster: R&D initiative of the Compagnie Générale de Géophysique A large variety of applications is the key !!! EGEE tutorial, Seoul 88

GARUDA • Department of Information Technology (DIT), Govt. of India, has funded CDAC to GARUDA • Department of Information Technology (DIT), Govt. of India, has funded CDAC to deploy computational grid named GARUDA as Proof of Concept project. • It will connect 45 institutes in 17 cities in the country at 10/100 Mbps bandwidth. August 23, 2006 Talk at SASTRA 90

Other Grids in India • EU-India. Grid (ERNET, C-DAC, BARC, TIFR, SINP, PUNE UNIV, Other Grids in India • EU-India. Grid (ERNET, C-DAC, BARC, TIFR, SINP, PUNE UNIV, NBCS) • Coordination with Geant for Education Research • DAE/DST/ERNET MOU for Tier II LHC Grid (10 Univ) • BARC MOU with INFN, Italy to setup Grid research Hub • C-DAC’s GARUDA Grid • Talk about Bio-Grid and Weather-Grid August 23, 2006 Talk at SASTRA 91

Summary • There have been three generations of ANUPAM, all with different architectures, hardware Summary • There have been three generations of ANUPAM, all with different architectures, hardware and software • Usage of ANUPAM has increased due to standardization in programming models and availability of parallel software • Parallel processing awareness has increased among users • Building parallel computers is a learning experience • Development of Grid Computing is equally challenging August 23, 2006 Talk at SASTRA 92

THANK YOU August 23, 2006 Talk at SASTRA 93 THANK YOU August 23, 2006 Talk at SASTRA 93