cbc139d9660751b0a6c83045a7a1e315.ppt
- Количество слайдов: 34
Grid
Introduction to Grid Computing q What is a Grid – an integrated advanced cyber infrastructure that delivers: o Computing capacity o Data capacity o Communication capacity q Why? There are many applications that are characterized as follows: o Large varied distributed collaborations need to work together o Need lots of cycles, storage (we are talking about teraflops, terabytes) o Need to share results, codes, parameter files, …
Grid Motivation Grid Computing was originally about extending scientific computing on single machines to distributed systems q Despite the improvement in raw computing power, storage capacity, communication it is difficult to keep up with the increased demand from the types of applications being developed. q
Grid Motivation q Scientific Applications o Analysis of large data volumes from different sources. o Lots of computation needed to model an aspect of the natural world o Often requires substantially different types of computational resources o Projected data is measured in petabytes; Lots of storage
Grid Motivation q Astronomy o Digital sky surveys q Medical data o X-Ray, mammography data, etc. (many petabytes) o Digitizing patient records (ditto) q Molecular genomics and related disciplines o Human Genome, other genome databases o Proteomics (protein structure, activities, …) o Protein interactions, drug delivery q Virtual Population Laboratory (proposed) o Simulate likely spread of disease outbreaks q q Brain scans (3 -D, time dependent) Climate studies
Grid Motivation q In the business world, companies want to integrate, manage and analyze large volumes of data o Example: An insurance company mines data from partner hospitals for fraud detection
Grid Motivation q q Could buy additional machines There is a lot of computing power that is unutilized or underutilized most of the time How can applications take advantage of the multiple resources available in an effective manner? A grid is intended for allowing the sharing, selection and aggregation of a wide variety of geographically dispersed resources owned by different organizations (virtual organizations)
Emergence of the Virtual Organization “Resource sharing & coordinated problem solving in dynamic … virtual organizations” “The Anatomy of the Grid”, Foster, Kesselman, Tuecke, 2001
Other Distributed Infrastructures Road, rail, telephones, power, banking, water, electrial q All started locally, then regionally, then nationally, and then internationally q Provide reliable relatively low cost access to a standardized service q Available to the masses q
Electrical Power Grid q q q q q Single entity providing power Relatively efficient, low cost, reliable US Grid links 10 K generators Complex physical connections and trading mechanisms Components heterogeneous and operated/owned by different companies Consumers differ in amount of power they use, the quality of service they require, and the price they will pay Economics important: grid driven by economic factors. Reserve capacities, trading power. Politics important: success depended on regulatory, political and institutional developments as much as technical innovation Control important: infrastructure for monitoring, management and control
Emergence of the Virtual Organization q Commonalities o Need to discover and share resources o Do not necessarily trust all other participants o Not just about document exchange; Also about remote software, computers, data, sensors, etc; o Resource sharing is conditional and the conditions are dynamic —Can only use resources for a limited class of problems or at certain times of the day.
What is a Grid Checklist (Foster) q Coordinates distributed resources using non-centralized control mechanisms. o A grid integrates and coordinates resources and users that live within different administrative domains —E. g. . , different administrative units of the same company or different company o Addresses the issue of security, policy, payment, membership
What is a Grid Checklist (Foster) q Uses standard, open, general-purpose protocols and interfaces o A grid is built from multi-purpose protocols and interfaces that address such fundamental issues as authentication, authorization, resource discovery, and resource access. q Deliver nontrivial qualities of service o Resources should be used in a coordinated fashion to deliver various quality of service o Quality of service is usually defined in metrics such as response time, throughput, availability, etc;
Grid vs. Internet? q We’ve had computers connected by networks for 20 years q The Grid brings additional notions o Virtual Organizations o Infrastructure to enable computation to be carried out across these —Authentication, monitoring, information, resource discovery, status, coordination, etc q Can I just plug my application into the Grid? o No! Much work to do to get there!
Are these Grids? q Cluster Management Systems o Examples: Sun’s Sun Grid Engine, Platform’s Loadsharing facility o These can be installed on a parallel computer or in a local area network o Can deliver a quality of service o Each may be an important component of a Grid, but by itself does not constitute a Grid
Are these Grids? q Multi-site scheduler o Example: Platform’s multicluster scheduler o Yes: Not terribly sophisticated but it is a grid q Gnuetella o Maybe – Is it too specialized. o Is it open or is it a standard? q q WWW Foster’s checklist more clearly applies to largescale Grid deployments: o Data Grid: Gri. Phy. N, PPDG, EU Data. Grid, i. VDGL, Data. TAG, NASA’s Information Power Grid o Tera. Grid: Used to link major US academic sites
Advantages of Grid Computing q Uses resources scattered across the world o Access to more computing power o Better access to data o Utilize unused cycles q Facilitates Virtual Organizations (VO) o Groups of organizations that use the Grid to share resources
Online Access to www. globus. org Scientific Instruments Advanced Photon Source wide-area dissemination real-time collection archival storage desktop & VR clients with shared controls tomographic reconstruction DOE X-ray grand challenge: ANL, USC/ISI, NIST, U. Chicago
www. globus. org Data Grids for High Energy Physics ~PBytes/sec Online System ~100 MBytes/sec ~20 TIPS There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~622 Mbits/sec or Air Freight (deprecated) France Regional Centre Spec. Int 95 equivalents Offline Processor Farm There is a “bunch crossing” every 25 nsecs. Tier 1 1 TIPS is approximately 25, 000 Tier 0 Germany Regional Centre Italy Regional Centre ~100 MBytes/sec CERN Computer Centre Fermi. Lab ~4 TIPS ~622 Mbits/sec Tier 2 ~622 Mbits/sec Institute ~0. 25 TIPS Physics data cache Institute ~1 MBytes/sec Tier 4 Physicist workstations Caltech ~1 TIPS Tier 2 Centre Tier 2 Centre ~1 TIPS Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Image courtesy Harvey Newman, Caltech
NEES (Network for Earthquake Engineering Simulation) Collaboration U. Nevada Reno www. neesgrid. org
www. globus. org Home Computers Evaluate AIDS Drugs q. Community = o 1000 s of home computer users o Philanthropic computing vendor (Entropia) o Research group (Scripps) q. Common goal= advance AIDS research
Sharcnet SHARCNET is a high performance scientific computing project involving the University of Western Ontario, University of Guelph, Mc. Master University, the University of Windsor and Wilfred Laurier University. SHARCNET provides UWO researchers with world-class computing capabilities. As of November 2001, the computer cluster at the University of Western Ontario was the fastest computer at a Canadian University and the 12 th fastest in any University in North America. http: //www. sharcnet. ca SHARCnet South Western Ontario Cluster of Clusters or “Super Cluster” University of Guelph Wilfred Laurier University Ultra high speed fiber optic networking University of Windsor Waterloo London Guelph Hamilton Mc. Master University The University of Western Ontario
Example Grids NSF PACI Grid Nordu. Grid
Ideal Grid-based Scientific Computation q User submits request through GUI o Application o Operating System and other requirements o Input data q q Grid finds and allocates resources to satisfy request Grid monitors request processing o Moves job when resources fail or are too busy q Grid notifies user when results are available
Example q q Assume a source file Main. F on machine A, an input file on machine B. Main. F is written using MPI, it will need around 4 GB of core memory to run, it will take several hours to complete, and will produce a large output file. What functionality is needed?
Issues q q q q How to select a machine to run it on? How to provide an executable which can run on that machine? How to move the input file? How to start the executable? How to monitor the job? When does it start? When does it finish? How to move the output file back? What about security? How do we know if it didn’t work and how it failed?
How to Select a Machine q What properties of a machine are we interested in? o What resources does my executable require? — 4 GB memory, “several hours of compute time” — Enough diskspace for the output o What kind of environment do I need on the machine? o o — OS limitations? — MPI? (Which version? ), Fortran? What resources am I authorized to run on? How quickly will it run? How much will it cost/what is my allocation there? How to find all this information? What should the user provide?
More Complicated q q q What if the program might need to read in data kept on machine C while it is running? What about distributing across processors on different machines? What if I have a lot of interconnected programs? How do I find the output file afterwards? What if it doesn’t work?
Common Features Needed by Grid Systems q. Resource registry is an information source that allows entities to publish and update information about the resource they wish to share Figure from Sean Norman’s reading course presentation
Common Features Needed by Grid Systems q. Client is typically an agent acting on behalf of the user o Acquires resources requested by the user by consulting resource registries o Submits an allocation request to the resource manager(s) responsible for the desired resources
Common Features Needed by Grid Systems q. If request can be accommodated, resource manager(s) update status information for acquired resources in resource registries q. Client then sends the appropriate executables and input data to the allocated resources and receives a reference to the execution in return
Common Features Needed by Grid Systems q. Reference allows the client to monitor the execution of a job and inquire about its status q. Client may also receive the results of the job once its execution is complete
Some Solutions q. Middleware Toolkits: not all speak (or spoke) Globus: o o Condor Globus Toolkit Legion/Avaki Condor (now Sun Grid Engine) q. Higher Level Toolkits (build on Globus) o Java. Co. G o Grid. Portal Toolkit, Grid Portal Development Toolkit (GPDK) o Condor-G o SGE


