
8cc307f717e81781842346a172f5538a.ppt
- Количество слайдов: 41
From Clusters to Grids October, 2003 – Linkoping, Sweden Andrew Grimshaw Department of Computer Science, Virginia CTO & Founder Avaki Corporation
Agenda • Grid Computing Background • Legion • Existing Systems & Standards • Summary 2
Grid Computing 3
First: What is a Grid System? A Grid system is a collection of distributed resources connected by a network Examples of Distributed Resources: n Desktop n Handheld hosts n Devices with embedded processing resources such as digital cameras and phones n Tera-scale supercomputers 4
What is a Grid? A grid is all about gathering together resources and making them accessible to users and applications. A grid enables users to collaborate securely by sharing processing, applications, and data across heterogeneous systems and administrative domains for collaboration, faster application execution and easier access to data. • Compute Grids • Data Grids 5
What are the characteristics of a Grid system? Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Mutually Distrustful Organizations & Individuals Different Security Requirements & Policies Required Different Resource Management Policies Potentially Faulty Resources Geographically Separated Resources are Heterogeneous 6
What are the characteristics of a Grid system? Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Mutually Distrustful Organizations & Individuals Different Security Requirements & Policies Required Different Resource Management Policies Potentially Faulty Resources Geographically Separated Resources are Heterogeneous 7
Technical Requirements of a Successful Grid Architecture n n n Simple Secure Scalable Success requires an integrated Extensible solution Site Autonomy AND Persistence & I/O flexible policy Multi-Language Legacy Support Single Namespace Manage Complexity!! Transparency Heterogeneity Fault-tolerance & Exception Management 8
Implication: Complexity is THE Critical Challenge How should complexity be addressed? 9
High-level versus low-level solutions As Application Complexity Increases, Differences Between the Systems Increase Dramatically High Tim e& Cos t High R ss tne us ob Low Low Sockets & Shells Integrated Solution High A low-level or “socket & shell” is high in robustness and low in time anddevelop. An integrated approach is low in robustness & high in cost to 10
The Importance of Integration in a Grid Architecture n If separate pieces are used, then the programmer must integrate the solutions. n If all the pieces are not present, then the programmer must develop enough of the missing pieces to support the application. Bottom Line: Both raise the bar by putting the cognitive burden on the programmer. 11
Misconceptions about Grids • Simple cycle aggregation • State of the state is essentially scheduling and queuing for CPU cluster management • These definitions are selling short the promise of Grid technology • AVAKI believes grids are not just about aggregating and scheduling CPU cycles but also … • Virtualizing many types of resources, internally and across domains • Empowering anyone to have secure access to any and all resources through easy administration 12
Compute Grids Categories • Sons of SETI@home • United Devices, Entropia, Data Synapse • Low-end, desktop cycle aggregation • Hard sell in corporate America • Cluster Load Management • LSF, PBS, SGE • High end, great for management of local clusters but not well proven in multi-cluster environments • As soon as you go outside of the local cluster to cross-domain multi-cluster, the game changes dramatically with the introduction of three major issues: • Data • Security • Administration To address these issues, you need a fully-integrated solution, or a toolkit to build one 13
Typical Grid Scenarios Global Grids Enterprise Grids • Multiple enterprises, owners, platforms, domains, file systems, locations, and security policies • Legion, Avaki, Globus • Single enterprise; multiple owners, platforms, domains, file systems, locations, and security policies • SUN SGE EE, Platform Multi-cluster Cluster & Departmental Grids Desktop Cycle Aggregation • Single owner, platform, domain, file system and location • SUN SGE, Platform LSF, PBS • Desktop only • United Devices, Entropia, Data Synapse 14
What are grids being used for today? • Multiple sites with multiple data sources (public and private) • Need secure access to data and applications for sharing • Have partnership relationships with other organizations: internal, partners, or customers • Computationally challenging applications • Distributed R&D groups across company, networks and geographies • Staging large files • Want to utilize and leverage heterogeneous compute resources • Need for accounting of resources • • Need to handle multiple queuing systems Considering purchasing compute cycles for spikes in demand 15
Legion 16
Legion Grid Software Legion Grid Capabilities § § § § Wide-area data access Distributed processing Global naming Policy-based administration Resource accounting Fine-grained security Automatic failure detection and recovery Users Legion G R I D Wide-area access to data, processing and application resources in a single, uniform operating environment that is secure and easy to administer Load Mgmt & Queuing Server Data Partner Applications Server Data Department A Desktop Server Data Department B Cluster Application Vendor 17
Legion Combines Data and Compute Grid Users Applications Legion G R I D Load Mgmt & Queuing Server Data Partner Application Server Data Department A Desktop Server Data Department B Cluster Application Vendor 18
The Legion Data Grid 19
Data Grid Capabilities § § § Federates multiple data sources Provides global naming Works with local and virtual file systems – NFS, XFS, CIFS Accesses data in DAS, NAS, SAN Uses standard interfaces Caches data locally Server Data Partner Application Users Applications Legion G R I D Wide-area access to data at its source location based on business policies, eliminating manual copying and errors caused by accessing out -of-date copies Server Data Department A Desktop Server Data Department B Cluster Application Vendor 20
Data Grid Share Legion Data Grid transparently handles client and application requests, maps them to the global namespace, and returns the data Users Applications Data mapped to Grid namespace via Legion Export. Dir Linux Informatics Partner NT Headquarters Solaris Research Center Solaris Tools Vendor 21
Data Grid Access • Access files using standard NFS protocol or Legion commands Users - NFS security issues eliminated - Caches exploit semantics • • Applications Access Point Access files using global name Access based on specified privileges PM-1 sequence_a Informatics Partner Fine-grained Security Cluster sequence_b HQ - 1 Headquarters Server RD - 2 sequence_c Research Center App_A Cluster BLAST Tools Vendor 22
Data Grid Access using virtual NFS Complexity = Servers + Clients • Clients mount grid • Servers share files to grid • Clients access data using NFS protocol • Wide-area access to data outside administrative domain Legion-NFS Fine-grained Security sequence_a Partner sequence_c Department A Department B 23
Keeping Data in the grid • Legion storage servers • Data is copied into Legion storage servers that execute on a set of hosts. • The particular set hosts used is a configuration option - here five hosts are used • Access to the different files is completely independent and asynchronous • Very high sustained read/write bandwidth is possible using commodity resources / a d Local Disk b e Local Disk f Local Disk c g Local Disk h Local Disk 24
I/O Performance Read performance in NFS, Legion-NFS, and Legion I/Olibraries. The x axis indicates the number of clients that simultaneously perform 1 MB reads on 10 MB files, and the y axis indicates total read bandwidth. All results are the average of multiple runs. All clients on 400 MHZ Intel’s, NFS server on 800 MHZ Intel server. 25
Data Grid Benefits • Easy, convenient, wide-area access to data – regardless of location, administrative domain or platform • Eliminates time-consuming copying and obtaining accounts on machines where data resides • Provides access to the most recent data available • Eliminates confusion and errors caused by inconsistent naming of data • Caches remote data for improved performance • Requires no changes to legacy or commercial applications • Protects data with fine-grained security and limits access privileges to those required • Eases data administration and management • Eases migration to new storage technologies 26
The Legion Compute Grid 27
Compute Grid Capabilities § § § Job scheduling and priority-based queuing Easy integration with third party load management and queuing software Automatic staging of data and applications Efficient processing of both sequential and parallel applications Failure detection and recovery Usage accounting Server Data Partner Application Users Applications Legion G R I D Wide-area access to processing resources based on business policies, managing utilization of processing resources for fast, efficient job completion Server Data Department A Desktop Server Application Department B Cluster Application Vendor 28
Compute Grid Access • The grid: Users L ocates resources A uthenticates and grants access privileges S tages applications and data D etects failures and recovers W rites output to specified location A ccounts for usage Applications Login/Submission Scheduling, Queuing, Usage Management, Accounting, Recovery Fine-grained Security NT Server PM-1 Data Informatics Partner Cluster HQ - 1 Data Headquarters Solaris Server RD - 2 Data Research Center App_A Linux Cluster BLAST Tools Vendor 29
Tools - All are cross-platform • • MPI P-space studies - multi-run Parallel C++ Parallel object-based Fortran • CORBA binding • Object migration • Accounting • legion_make - remote builds • Fault-tolerant MPI libraries • post-mortem debugger • “console” objects • parallel 2 D file objects • Collections 30
One Favorite 31
Related Work 32
Related Work • • • Avaki All distributed systems literature Globus AFS/DFS LSF, PBS, …. Global Grid Forum - OGSA 33
Avaki Company Background • Grid Pioneers - a Legion spin-off • Over $20 M capitalization • The only commercial grid software provider with a solution that addresses data access, security, and compute power challenges • Standards efforts leader Customers Partners Standards Organizations 34
AFS/DFS comparison with Legion Data Grid • • • AFS presumes that all files kept in AFS - no federation with other file systems. Legion allows data to be kept in Legion, or in an NFS, XFS, PFS, or Samba file system. AFS presumes all sites using Kerberos and that realms “trust” each other Legion assumes nothing about local authentication mechanism and there is no need for cross-realm trust AFS semantics are fixed - copy on open - Legion can support multiple semantics. Default is Unix semantics. AFS volume oriented (sub-tree’s) - Legion can be volume oriented or file oriented AFS caching semantics not extensible - Legion caching semantics are extensible 35
Legion & Globus GT 2 • Projects with many common goals: • • • Metacomputing (or the “Grid”) Middleware for wide-area systems Heterogeneous resource sets Disjoint administrative domains High-performance, large-scale applications 36
Legion Specific Goals • • • Shared collaborative environment including shared file system Fault-tolerance and high-availability Both HPC applications and distributed applications Complete security model including access control Extensible Integrated - create a meta-operating system 37
Many “Similar” Features • Resource Management Support • Message-passing libraries • e. g. , MPI • Distributed I/O Facilities • Globus GASS/remote I/O vs. Avaki Data Grid • Security Infrastructure 38
Globus • The “toolkit” approach • Provide services as separate libraries • E. g. Nexus, GASS, LDAP • Pros: • Decoupled architecture • easy to add new services into the mix • Low buy-in: use only what you like! • In practice all the pieces use each other • Cons: • No unifying abstractions • very complex environment to learn in full • composition of services difficult as number of services grows • Interfaces keep changing due to ever evolving design • Does not cover space of problems 39
Standards: GGF Background: • Grid standards are now being developed at the Global Grid Forum (GGF) • In-development standard, Open Grid Services Infrastructure (OGSI) will extend Web Services (SOAP/XML, WSDL, etc. ) • Names and a two level name scheme • Factories and lifetime management • Mandatory set of interfaces, e. g. , discovery interfaces • OGSA – Open Grid Services Architecture • Over-arching architecture • Still in development 40
Summary • • Grids are about resource federation and sharing Grids are here today. They are being used in production computing in industry to solve real problems and provide real value. • • • We believe that users want high-level abstractions - and don’t want to think about the grid. • • Need low activation energy and legacy support There a number of challenges to be solved - and different applications and organizations want to solve them differently • • • Compute Grids Data Grids Policy heterogeneity Strong separation of policy and mechanism Several areas where really good policies are still lacking • • • Scheduling Security and security policy interactions Failure recovery (and the interaction of different policies) 41
8cc307f717e81781842346a172f5538a.ppt