d010367c25196fbbcf2dbda6c2df193d.ppt
- Количество слайдов: 67
Universitatea “Politehnica“ din Bucuresti Facultatea de Automatica si Calculatoare Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs. pub. ro “The Internet is about getting computers to talk together; grid computing is about getting computers to work together. ” (from IBM’s Grid definition)
Outline of the presentation n n What is Grid computing – sorting out the alphabet soup. Impact of Grid computing to science. CERN as a driving force in Grid computing. Grids – where to ?
What is Grid ? n n n Many definitions of Grid computing Term coined as analogy to electrical power grid According to Ian Foster, the “father of grid computing”, the term grid has been hijacked to “embrace everything from advanced networking to artificial intelligence” Marketers are applying grid labels to all sorts of products and services, adding to the confusion and hype “From the wide ranging definitions of Grid, to the volume of standards bodies and organizations -- it can be a real challenge to distinguish the significant developments from the hype. ” (Ian Foster, 2005) Electrical Power Grid Computing Grid Pervasive Everywhere, wall socket Everywhere, any “net-thing” Transparent Power “just happens” Infrastructure Power stations, transformers, powerlines, transmission hubs CPUs, servers, storage, networks, archives, middleware Utility Pay-for-use, accounting, reporting, settlement Pay-for-use, accounting, reporting, metering
Ian Foster’s Evolving definitions GGF: “A system that is way of managing and dynamically sharing SUN: “A concerned with the integration, virtualization, and “A computational services and resources in a infrastructure disparate sets and software “ A n management of grid is a hardwareof resources”distributed, that Grid is a large, heterogeneous, system heterogeneous environment that supports collections of users provides dependable, pervasive, and inexpensive access to high. A hardware and software infrastructure that coordinates resources spread over wide connects ares “ enddistributed capabilities” storage devices, traditional computing computers, IDC: “Set of independent organizations) across into and resources (virtual computers combined databases, and “ A Grid unified systemorganizational domains thatorganizations). ” isèsoftware is through Kesselman, a network, allows. Blueprint a heterogeneous system (real and system administrative and large, heterogeneous, is managed “ AIan Foster a Carl systems software “The GRID: Grid applications through editors, and for a by to share Infrastructure”, Morgan-Kaufman New Computing and use resources, multiple entities distributed resource management software networking technologies” that allows sharing and coordinating resources Core. GRID: “A fully distributed, dynamically reconfigurable, scalable 1999. A dependable, universal information underin. Publishers, administrative policies, various autonomous infrastructure that and a dependable set protocols, application IBM: infrastructure consistsofpervasive enables more n “The builds on the power andof open standards grid “ability, using a of the Internet and manner “ tooffering a transparentaccess to applications and kits to efficient provide location independent, access to the user “ and provide programming to gain and software development data, and protocols, interfaces, pervasive, reliable, secure efficient computation, collaboration, and communication access to processing power, of services encapsulating and virtualizing resources a coordinated set storage capacity, resource location/access” authentication, authorization, and a vast array of other (computing power, storage, Kesselman, over the“The Anatomy of the Grid: è 2001: Foster, resources Tuecke: etc. ) in order to generate computing instruments, data, Internet” knowledge. ” Enabling Scalable Virtual Organizations”, Grid computing is a network of computation: tools and http: //www. globus. org/research/papers. html protocols for coordinated resource sharing and problem n “The “a collection among pooled distributed, heterogeneous, Gartner: grid integratesof resources owned by multiple solving services across assets dynamic ‘virtual organizations’ a way as to allow them organizations coordinated in suchformed from the multiple resources Application processing, distributed across disparate withinto single enterprise and/or from external resource sharing and a locations, solve a single common problem. ” and interconnected through a shared network service provider relationships in both e-business and e-science” such as the Internet “ A Grid is a heterogeneous system spread over a wide geographical area, which allows multiple entities to share and use resources, under various administrative policies, offering a transparent access to the user, through the use of consistent access protocols and interfaces “ è 2002: Foster, Kesselman, Nick, Tuecke: “The Physiology of the Grid”, http: //www. globus. org/research/papers/ogsa. pdf
Why so many definitions? n n Computer science and software engineering sometimes do not have definitions as strict as those in the fields of physics or mathematics – this “lack of definitions” leads to many Grid researchers or people working with Grid technology having different views on what a Grid is. Hardware discrepancies: for some a local cluster with a middleware system on top is a Grid whereas others believe that a wide-are network connection has to be involved. Software problems: What actually makes a piece of software a “Grid software”? Is any kind of middleware using Grid security already Grid software? Due to the recent advanced in Web and Grid service technologies, where to draw the line between Web services and Grid services?
So what is Grid after all? In this Soup of grid definitions there are two that were widely accepted by the community: I. Foster, Research view: n “A Grid is a system that (1) coordinates resources that are not subject to centralized control (2) using standard, open, general-purpose protocols and interfaces (3) to deliver nontrivial qualities of service” A. Grimshaw, Industry view: n “From a hardware perspective, a Grid is a collection of distributed resources connected by a network. From a user perspective a Grid gathers together resources and makes them accessible in a secure manner to users and applications” n
Describing the elephant A Grid infrastructure must provide a set of technical capabilities: 1. Resource modeling – describes available resources, their capabilities, and the relationships between them to facilitate discovery, provisioning, and quality of service management. 2. Monitoring and notification – provides visibility into the state of resources to enable discovery and maintain quality of service. Logging of significant events and state transitions is also needed to support accounting and auditing functions. 3. Allocation – Assures quality of service across an entire set of resources for the lifetime of their use by an application. This is enabled by negotiating the required level(s) of service and ensuring the availability of appropriate resources through some form of reservation—essentially, the dynamic creation of a service-level agreement. 4. Provisioning, life-cycle management, and decommissioning - enables an allocated resource to be configured automatically for application use, manages the resource for the duration of the task at hand, and restores the resource to its original state for future use. 5. Accounting and auditing - tracks the usage of shared resources and provides mechanisms for transferring cost among user communities and for charging for resource use by applications and users. ” 6. In addition to that security is an important aspect. n Foster, Tuecke, “Describing the elephant: the different faces of IT as services”, ACM Queue, 2005.
The two key Grid computing groups n The Globus Alliance (www. globus. org) è Composed of people from: Argonne National Labs, University of Chicago, University of Southern California Information Sciences Institute, University of Edinburgh and others. è OGSA/I standards initially proposed by the Globus Group è Based off papers “Anatomy of the Grid” & “Physiology of the Grid” n The Global Grid Forum (www. ggf. org) è First meeting in June of 1999, Based off the IETF charter è Heavy involvement of Academic Groups and Industry (e. g. IBM Grid Computing, HP, United Devices, Oracle, UK e-Science Programme, US DOE, US NSF, Indiana University, and many others) è Meets three times annually è Solicits involvement from industry, research groups, and academics
More on Grids n n The Grid relies on advanced software, called middleware, which ensures seamless communication between different computers and different parts of the world. The Grid search engine finds the data the scientist needs, but also the data processing techniques and the computing power to carry them out. It then distribute the computing task to wherever in the world there is spare capacity, and send the result to the scientist. Why use the Grids? è Industrial and academic partners form an “extended enterprise” in which resources are intrinsically distributed, and only partially shared. è Partners may be prepared to share data, but not the hardware and proprietary software that produces the data.
Why Grid computing now? Let us look at the evolution of ICT
Grid-like Vision n n In 1969, Leonard Kleinrock, one of the chief scientists of the original ARPA project which seeded the Internet, wrote: è "As of now, computer networks are still in their infancy, but as they grow up and become sophisticated, we will probably see the spread of "computer utilities", which, like present electric and telephone utilities, will service individual homes and offices across the country“ Despite major advances in hardware and software systems over the past 35 years, we are yet to realize this vision. How far are we still from delivering computing as a utility? è Let us look into the ICT evolution and project the future.
COMPUTING Computing and Communication Technologies Evolution: 1960 -2010! * HTC * Mainframes * Minicomputers * PCs * Workstations * P 2 P * Grids * PC Clusters * Crays * XEROX PARC worm * PDAs * MPPs * Computing as Utility * WS Clusters Communication * e-Science * TCP/IP * Sputnik 1960 * Ethernet * Email 1970 1975 Centralised * W 3 C * HTML * Mosaic * Internet Era * ARPANET * e-Business * IETF 1980 * WWW Era 1985 Control 1990 * Web Services * XML 1995 2000 Decentralised * Social. Net 2010
Computing is Scaling: Towards Inter-Planetary Level S E R V I C E S 2100 2100 2100 + Administrative Barriers P E R F O R M A N C E • Individual • Group • Department • Campus • State • National • Globe • Inter Planet • Universe Personal Device SMPs or Super. Computers Local Cluster Enterprise Cluster/Grid Global Grid Inter Planet Grid
A little bit more… n Benefits for Science: è More effective and seamless collaboration of dispersed communities, both scientific and commercial è Ability to run large-scale applications comprising thousands of computers, for wide range of applications. è Transparent access to distributed resources from your desktop, or even your mobile phone è The term “e-Science” has been coined to express these benefits – the application domain “Science” of Grid & Web n Impact : e-Science From the EPSRC e-Science web site: "In the future, e-Science will refer to the large-scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists. "
Healthy, Wealthy, and Wise? n n n n n e-Health: electronic patient records, distributed and/or remote diagnosis, collaborative surgical planning. e-Business: streamline, distribute, and enhance business processes. e-Commerce: use the Grid as a marketplace for both traditional and innovative goods and services. e-Learning: remove barriers to education and training. Grid applications for Science: Medical/Healthcare (imaging, diagnosis and treatment ). Bioinformatics (study of the human genome and proteome to understand genetic diseases). Nanotechnology (design of new materials from the molecular scale). Engineering (design optimization, simulation, failure analysis and remote Instrument access and control). Natural Resources and the Environment (weather forecasting, earth observation, modeling and prediction of complex systems)
Grid and Web Services Convergence n n Grid Started far apart in applications & technology Web Definition of Web Service Resource Framework (WSRF) makes explicit distinction between “service” and stateful entities acting upon service i. e. the resources Means that Grid and Web communities can move forward on a common base!!! GT 1 GT 2 Have been converging OGS i WS-* L, WSD XML SOAP HTTP BPEL WSRF WS-I Compliant Technology Stack
Grid and Web Services n n The Globus Grid Forum (GGF) standard was (2004) divided into: Open Grid Services Architecture (OGSA) è Defines standard mechanisms for creating, naming, and discovering Grid service instances. è Addresses architectural issues relating to interoperable Grid services. è An open, service-oriented architecture (SOA): resources as first-class entities, dynamic service/resource creation and destruction è Built on a Web service infrastructure è Resource virtualization at the core è Build grids from small number of standards-based components (replaceable, coarse-grained) è Customizable: Support for dynamic, domain-specific content… within the same standardized framework è Described in “The Physiology of the Grid” http: //www. globus. org/research/papers/ogsa. pdf Open Grid Services Infrastructure (OGSI) è It was based upon Grid Service specification. It specifies the way clients interact with a grid service (service invocation management, data interface, security interface, . . . ). è In the new draft (2005 -06) some mandatory specifications of OGSI are merged with OGSA and new WSRF is introduced (GT 4) WSRF : Web Services Resource Framework : defines a generic and open framework for modeling and accessing stateful resources using web services
The core elements of the Open Grid Services Architecture This layer eliminated in recent version of standard
Pre-GT 4
GT 4
Virtualizing Resources Access Type-specific interfaces Common Interfaces Computers Storage Sensors Applications Information Web services Resource-specific Interfaces Resources
A Service-Oriented Grid Brokering Service CPU Resource Compute Service Data Service Application Service Printer Service Notify Virtualized resources Registry Service Advertise Grid middleware services Job-Submit Service
Global Grid Community
CERN? § CERN is the world's largest particle physics centre § Particle physics is about: § elementary particles which all matter in the Universe is made of § fundamental forces which hold matter together § Particles physics requires: § special tools to create and study new particles § With its 27 km circumference, the LHC accelerator will be the largest superconducting installation in the world. CERN is: -~ 2500 staff scientists (physicists, engineers, …) - Some 6500 visiting scientists (half of the world's particle physicists) They come from 500 universities representing 80 nationalities.
Computing @ CERN n n n • • Latest trend is to federate national Grids to achieve a global Grid infrastructure – High Energy Physics is a driving force to this. High-throughput computing based on reliable “commodity” technology LHC Data Analysis requires a computing power equivalent to ~100, 000 of today’s fastest PC processors ! More than 2500 dual processor PCs About 3 million Gigabytes of data on disk and tapes PROBLEM: nowhere near enough! SOLUTION: use the Grid to unite computing resources of particle physics institutes around the world. CERN leads two major global Grid projects: • • n n WLCG: World-wide LHC Computing Grid Collaboration EGEE: Enabling Grid for E-scienc. E project for all sciences WLCG: All the Institutions participating in the provision of the Worldwide LHC Computing Grid with a Tier-1 and/or Tier-2 Computing Centre form the WLCG Collaboration. The LHC Computing Grid project launched a service with 12 sites in 2003. Today 200 sites in 30 countries with 16, 000 PCs.
Computing @ CERN n The LCG architecture consists of an agreed set of services and applications running on the Grid infrastructures provided by the LCG partners. è These infrastructures at the present consist of those provided by the Enabling Grids for E-scienc. E (EGEE) project in Europe, the Open Science Grid (OSG) project in the U. S. A. and the Nordic Data Grid Facility in the Nordic countries. l l n n Grid 3 was the start-up of OSG The LCG Project builds and maintains computing infrastructure for LHC experiments Original (’ 02) LCG plan: “The LCG is not a middleware project” Was to be delivered. . . too little, too late Feature set, performance, scalability disappointing New (’ 04) plan: Middleware “re-engineering” as part of the LCG program, in collaboration with EGEE
EGEE-II: Fast description of the project n n n n n EGEE launched in 2004, already supports 20 applications in six scientific domains (biomedicine, geophysics, quantum chemistry…) EGEE brings together scientists and engineers of 90 institutions In over 30 countries worldwide To provide seamless GRID infrastructure for e-Science Available 24 h/day x 7 days/week Funded by EU (European Commission) Two original scientifically fields: HEP and Life Sciences; but it integrates many other fields: from Geology up to Computing Chemistry Infrastructure: 30. 000 CPUS , 5 PBbytes storage, 200 sites in 39 countries, 60 Virtual Organizations Maintains 10. 000 concurrent jobs on average
Computing @ CERN
Three Generations of Grid • Local “metacomputers“ – Distributed file systems – Site-wide single sign-on • "Metacenters" explore inter-organizational integration • Totally custom-made, top-to-bottom: !proofs of concept re he • Utilize software servicese r and communications protocols a developed by grid projects: e W – Condor, Globus, UNICORE, Legion, etc. • Need significant customization to deliver complete solution • Interoperability is still very difficult! • Common interface specifications support interoperability of discrete, independently developed services • Competition and interoperability among applications, toolkits, and implementations of key services Standardization is key for third-generation grids! Source: Charlie Catlett
Grids – Where to ? n n The commercial interest in Grids systems and related technologies is increasing. Companies such as Sun Microsystems, IBM, Oracle, Intel, Microsoft, HP show particular interest in getting a piece of the $12 billion market predicted by IPC for 2007 (according to IDC).
Grids – Where to ? n After the year 2007, business popularity of Grid computing is expected to accelerate: Billions è Especially, the financial services and ERP services is expected to take major parts in the expense (Source: Insight Research Corp. )
Grids – Where to ? n n n An interesting prediction (the 451 Group analysts) is that grid technology will be slowly absorbed into enterprise fabrics… One consequence for grid computing might be that term grid computing "will become both more relevant and less used […] It will be more relevant as grids are used to support far more than HPC tasks, but less used as vendors seek to be associated with far more activity, and far higher up the stack, than grid computing. " IBM and Oracle could drop "grid" from their products in favour of a broader term, while Microsoft has made it very clear that it will not use the term “grid”. In the new era of Grid computing grids must support automated data, storage and service activities just as capably as handling computational tasks. These challenges are being addressed by a new paradigm called “Grid 2. 0”
Grids – Where to ? n n n Grid 1. 0 – concerned with the virtualization, aggregation and sharing or compute resources Grid 2. 0 – focused on the virtualization, aggregation and sharing of all compute, storage, network and data resources The key term is “virtualization” (encapsulation behind a common interface of diverse implementations) is being driven by the need to various enterprises to create a virtual resource market to allocate resources based on business demand. Virtualization introduces a layer of abstraction: instead of having to snoop out what resources are available and try to adapt a problem to use them, a user can describe a resource environment (virtual workspace) and expect it to be deployed on the grid. The mapping between the physical resources and the virtual workspace will be handled using virtual machines, virtual appliances, distributed storage facilities and network overlays (“virtual grids”). The promise is that in Grid 2. 0 the resources will be easier to define, test, install, transport and adjust on demand.
Web 2. 0: Evolution Towards a Read/Write Platform Web 1. 0 Web 2. 0 (1993 -2003) (2003 - beyond) Pretty much HTML pages viewed through a browser Web pages, plus a lot of other “content” shared over the web, with more interactivity; more like an application than a “page” “Read” Mode “Write” & Contribute “Page” Primary Unit of content “Post / record” “static” State “dynamic” Web browser Viewed through… Browsers, RSS Readers, anything “Client Server” Architecture “Web Services” Web Coders Content Created by… Everyone “geeks” Domain of… “mass amatuerization”
Web 2. 0 By Example Web 1. 0 n Napster n Britannica On Line n Akamai n MP 3. com n Double Click n Content Management Web 2. 0 n Google n Wikipedia n Bit. Torrent n i. TUNES or Napster n Adsense n Wikis Tim O’Reilly
Google Earth™ a Mega API for Web 2. 0 Illustrates the Benefits of SOA and GRID with a Web 2. 0 Delivery Model • Distributed, re-usable core services on shared infrastructure • Shared data • Exposed interfaces • Application is streamed to client and works offline
Google Earth™ a Mega API for Web 2. 0
Wikipedia is a Collaborative Dictionary Being Edited in Realtime by Anyone
Grid 2. 0 Emerging Grid 2. 0* SOA Software Services with SLA & Qo. S Metrics P Virtualized Compute, Storage, Network, Data P Service Oriented P Policy Driven Automation P Distributed across firewalls Grid 1. 0 Virtualization Compute Intensive Cycle Aggregation Consolidation of Resources P Parallel, stateless, stateful and transactional apps *The 451 Group: 'grid 2. 0' is focused on the virtualization, aggregation and sharing of all compute, storage, network and data resources. It is both Service-oriented and automated.
Virtualization n n Virtualization covers both, data (flat files, databases etc. ) and computing resources. Grid as workflow virtualization — the Grid computing services are used to execute and manage processes across multiple compute platforms. Data Grid as data virtualization — the management of shared collections independently of the remote storage systems where the data is stored. Semantic Grid as information virtualization — the ability to reason on inferred attributes from multiple independent information repositories. Name space virtualization, logical names for resources, users, files, and metadata that are independent of the name spaces used on the remote resource. Trust virtualization, the ability to manage authentication and authorization independently of the remote resource. Constraint virtualization, the ability to manage access controls independently of the remote resource. Access virtualization, the ability to port an arbitrary access mechanism on top of the Grid middleware. For Data Grids, this is the ability to support access through multiple loadable libraries, Java, Digital libraries, workflow actors, Web browsers, etc. Network virtualization, the ability to manage transport in the presence of network devices such as firewalls, load levelers, private virtual networks. This typically requires multiple protocols to support client-initiated versus server-initiated I/O, bulk operations versus single-file operations. Latency management, the ability to minimize the number of messages sent over wide area networks. Examples include execution of procedures at the remote resource when the complexity (ratio of operations to bytes transmitted) is sufficiently small. The standard case is data filtering or sub-setting. Federation, the ability to interoperate across multiple grid environments. This requires the ability to share logical name spaces, and Shibboleth-style authentication. Grids establish trust mechanisms to allow assertions about the authenticity of an individual to be verified from the “home” Grid.
So, are we there yet ? n Will the Grid be available to all of you ? Hard to predict… Jules Piccard, a professor at the University of Basel, installed the first telephone in the city, around 1880, between his home and his institute. He showed it proudly to other scientists and got the comment: “Looks very good, but I doubt it will ever have any practical use”. "The world will only need five computers" attributed to Thomas J. Watson, IBM "There is absolutely no need for a computer in the home" attributed to Ken Olsen, DEC (once a leading minicomputer manufacturer) "640 kilobytes is all the memory you will ever need" attributed to Bill Gates, Microsoft
So, are we there yet ? n The complete success of the Grid hype depends on at least three conditions: n The Grid can be considered a success when there are no more “Grid papers”, but only a footnote in the work that states, “This work was achieved using the Grid”. The Grid can be considered a success when supercomputer centres don't give a user the choice of using their machines or using the Grid, they just use the Grid. The Grid can be considered a success when a Super. Computing demo can be run any time of the year. n n We are not yet there…
What’s holding us ? n n n n n Organizational politics act very much like a barrier to implementing Grid computing: “server-hugging” – organizations have a sense of ownership over the resources bought or allocated for their use. unrealistic expectations from Grid computing – marketing departments have run amuck and have marketed the grid “nirvana” and not the grid that exists and is possible today. perceived loss of control or access over resources. loss or reduction of budget dollars. lack of data security among departments. fear of external data leaks, reduced priority of projects - sometimes users believe that they need dedicated IT resources to complete their work accurately and efficiently. risks associated with enterprise-wide deployment - how do different geographies and cultures come together to agree on global priorities, configurations, standards, and policies.
In the end… n n n One of the biggest fears for Grid computing is that it might be seen as today’s sexy technology that will quickly get replaced by tomorrow’s sexy technology. The Grid researchers and technologists have to start to point to results/applications that utilize the Grid to solve problems or enable new applications that would have be unachievable without the Grid. Contemporary Grid implementations are still far from initially described image and from being widely adopted.
Grid computing in pictures n Thanks to Grid. Cafe (http: //gridcafe. web. cern. ch/gridcafe - i strongly recommend that you also visit this link), it is now MOVIE time.
Thank you ! Questions? Observation?
Additional slides
Grid characteristics n n n Collaboration - Grid is sharing of resources in a distributed fashion. A Grid spans multiple administrative domains seamlessly. Aggregation - A Grid is more than the sum of all parts. A Grid aggregates many resources and therefore provides an aggregation of the capacity of the individual resources into a higher capacity virtual resource. The capability of individual resources is preserved. As a consequence, from a global standpoint the Grid enables running larger applications faster (aggregation capacity), while from a local standpoint the Grid enables running new applications Virtualization – Grid services are often provided with a certain interface that hides the complexity of the underlying resources. Virtualization provides an abstract “layer” between clients and resources, Therefore, a Grid provides the ability to virtualize the sum of parts into a singular wide-area programming model.
Grid characteristics n n Service orientation - Grids provide services, following the concept of a service orient architecture. In the widest sense all large scale collections of services can be viewed as Grids. Heterogeneity - A Grid typically consists of heterogeneous computing resources, i. e. there is a variety of different hardware and software components with different performance and latency characteristics. Decentralized control - components are under control of multiple entities, i. e. the key difficulties in Grids lay exactly in not having a single ‘owner’ of the whole system. One of the requirements of a Grid is the use of distributed control mechanisms Standardization and interoperability - A Grid promotes standard interface definitions for services that need to interoperate to create a general distributed infrastructure to fulfill users’ tasks and provide user level utilities. Grid is exposing the need for increased levels of integration of distinct technologies and for increased agreements in the standardization of services. The success of the implementation of the Grid very much depends on these aspects. Furthermore, the Grid should provide uniform access to heterogeneous resources through virtualization.
Grid characteristics n n n Access transparency - The Grid should allow its users to access the computing infrastructure without having to be intimately aware of the underlying architecture or network topology]. This is sometimes considered the most distinctive aspect of Grid Computing, that is, the levels of transparency provided for the end-user, through the virtualization of resources. Scalability - Even if Grid implementations and infrastructures sometimes do not solve a new problem, it is often the scale of data, resources and users that contributes to the additional complexity of a Grid. Reconfigurability - A Grid should be “dynamically reconfigurable” (Core. GRID definition). Security - Grid security is one of the first things that real Grid users have to deal with and therefore is essential for any Grid software system that spans multiple administrative domains. Application support – Applications should also be part of the Grid and the whole Grid environment (where for environment I mean the hardware, middleware, and applications) should be data-driven. In particular, it should be able to react to changes of the system and application behaviors captured by application and system data.
Grid characteristics n n Computing model - a Grid supports several computational models (e. g. , batch, interactive, distributed and parallel computing. . . ). Licensing model - Since Grids originate from the academic community, there is a global emphasis on open source software, which is also followed by several companies that are involved in Grid development. Procedures and policies - Grid users and service providers interact with each other in a similar way like on the open market where certain rules have to be followed. Therefore, procedure and polices need to be in place to allow for (coordinated) sharing of resources. Auditing - Tracking the usage of shared resources and providing mechanisms for transferring cost among user communities and for charging for resource use by applications and users.
Comparison of Middleware Technologies Middleware Property UNICORE Globus Legion Gridbus Focus High level Programming models Low level services High level Programming models Abstractions and market models Category Mainly uniform job submission and monitoring Generic computational Architecture Vertical multi tiered system Layered and modular toolkit Vertically integrated system Layered component and utility model Implementation Model Abstract Job Object Hourglass model at system level Object-oriented metasystem Hourglass model at user level Implementation Technologies Java C and Java C++ C, Java, C# and Perl Runtime Platform Unix and Windows with. NET Programming Environment Workflow environment Replacement libraries for Unix & C libraries. Special MPI library (MPICH –G), Co. G (Commodity Grid) kits in Java, Python, CORBA, Matlab, Java Server Pages, Perl and Web Services Legion Application Programming Interfaces (API). Command line utilities Broker Java API XML-based parameter-sweep language Grid Thread model via Alchemi. Some Users and Applications Euro. Grid], Grid Interoperability Project, Open. Mol. Grid and Japanese NAREGI. App. Le. S, Ninf], Nimrod-G, NASA IPG, Condor-G, Gridbus Broker, UK e. Science Project], Gri. Phy. N], and EU Data Grid. NPACI Testbed, Nimrod-L, and NCBio. Grid. Additionally, it has been used in the study of axially symmetric steady flow and protein folding applications. e. Physics, Belle Analysis Data Grid], Neuro. Grid], Natural Language Engineering, Hydro. Grid, and Amsterdam Private Grid].
Globus Toolkit Components
Globus Common Runtime n n n Python Web Services Core è Allows one to create WSRFcompliant web services in Python C Web Services Core è Allows one to create WSRFcompliant web services in C Java Web Services Core è WSRF APIs in Java C common libraries è C Abstraction layer for Globus data types, libc, etc XIO : Extensible IO è Superset of basic file I/O library l è open/close/read/write/etc Supports multiple wire protocols transparently l TCP/UDP/File/HTTP/GSI/GS SAPI_FTP/telnet/queuing
Globus XIO n n Open file for reading/writing è in. File = fopen( file, “w+” ) Do some reading… è fread(buffer, 1, sizeof(buffer ), in. File) Do some writing… è fprintf(in. File, “%sn”, “HELLO!” ) Close the file è fclose( in. File ) Student Workstation [Denton, TX] Disk Storage [Mountain View, CA]
Globus Information Services n n Web. MDS è Allows one to view monitoring information about grid resources from a web browser Index è Collects monitoring and discovery information grid resources. è Publishes the information to a single point so other resources/peoples can discover resources Trigger è Collects various pieces of data from grid resources è Can be configured to perform actions è Ex : when a disk is 80% full, send an email to the administration staff Monitoring and Discovery [MDS 2] è Provides method to publish and discover resources on the grid. è Also allows the collection of resource status and configuration information è Deprecated component
Globus Indexing Service Resource Query Computational Resources Globus Index Archival Resources Database Resources Storage Resources
Globus Execution Management n n n Grid Resource Allocation and Management [GRAM] è WSRF compliant device to submit, monitor, and cancel jobs on grid computing resources è Not a scheduler, but rather communicates with other, local schedulers Pre-WS Grid Resource Allocation and Management è Same as above… not a web service Community Scheduler Framework è WSRF compliant meta-scheduler. è Actually schedules jobs to other batch scheduler. è Supports advanced reservations and advanced scheduling policies Grid Telecontrol Protocol è WSRF protocol for telecontrol è Ex : focusing an electron microscope remotely Workspace Management è Dynamically create and manage workspaces
Globus GRAM User Job Submission Site A GRAM PBS Site B SGE Site D LSF Condor Site C Geographically Disparate Computational Resources
Globus Data Management n n n Data Replication è Allows for the local replication of pertinent data across grid environments è Commonly used files are replicated locally to reduce transfer delays OGSA-DAI è Supports the exposure of data resources such as relational databases or XML databases onto the grid è Single point of query for multiple databases Reliable File Transfer è Handles third-party messages to control Grid. FTP transfers è Submission of transfer requests Grid. FTP è High performance, reliable data transfer protocol for high bandwidth, wide area networks. è Used to perform the data transfer test that became the LAN Speed Record Replica Location è Allows discovery and registration of data replicas on the grid è Maintains the correlation between logical names and target names
Traditional FTP n n n Gigabit Ethernet n Single Client/Server connection Single data stream Limited by computational resources and network bandwidth Data Channel Very inefficient
Globus Grid. FTP Single Server/Client Multiple Server/Client [single file] Storage Data Channel Gigabit Ethernet Each machine transfers ¼ of the file Storage
Globus Security n n n Community Authorization è Allows a virtual organization to express policy regarding resources across sites è Despite the local authorization, granting and revoking access to resources is possible Delegation è Allows the sharing of a single credential across multiple invocations of services è I need to submit multiple jobs, now I can use the same certificate for each one Authentication/Authorization è Message & Transport Level Security l è Authorization Framework l n n SSL/TLS/X. 509 encryption standard for message traffic Provides multiple different authorization mechanisms : gridmap, SAML, NIS, PAM, LDAP Pre-WS Authentication/Authorization Credential Management è Simple. CA l è Simplified certificate authority My. Proxy l Online credential repository for X. 509 proxy credentials
Globus Authentication User Authentication Globus Authentication NIS Site A : Washington DC Credential Repository PAM Site C : Custom Authentication LDAP Site B : UNT LDAP Student Login
Replication Transfer Data Services Execution Mgmt Services Execution Workflow Mgmt Workload Execution Mgmt Planning Reservation Configuration Deployment Job Mgmt Context Services VO Mgmt Integration Policy Mgmt Information Services Access Context Services Data Services Info Services Execution Mgmt Services Monitoring Infra Services Self Mgmt Services Rsrc Mgmt Provisioning Services Security Services Resource Mgmt Services Heterogeneity Mgmt Authentication Optimization Security Services Authorization Service Level Attainment Integrity Boundary Traversal Qo. S Mgmt Event Discovery Logging Mgmt WSRF WSN WSDM Naming Infrastructure Services Self Mgmt Services
CERN
CERN
d010367c25196fbbcf2dbda6c2df193d.ppt