- Количество слайдов: 41
Web Service Grids for i. SERVO International Workshop on Geodynamics: Observation, Modeling and Computer Simulation University of Tokyo Japan October 14 2004 Geoffrey Fox Community Grids Lab Indiana University gcf@indiana. edu
e-Infrastructure e-Infrastructure builds on the inevitable increasing performance of networks and computers linking them together to support new flexible linkages between computers, data systems and people • Grids and peer-to-peer networks are the technologies that build e-Infrastructure • e-Infrastructure called Cyber. Infrastructure in USA We imagine a sea of conventional local or global connections supported by the “ordinary Internet” • Phones, web page accesses, plane trips, hallway conversations • Conventional Internet technology manages billions of broadcast or low (one client to Server) or broadcast links On this we superimpose high value multi-way organizations (linkages) supported by Grids with optimized resources and system support and supporting virtual (electronic) enterprises • Low multiplicity fully interactive real-time sessions • Resources such as databases supporting (larger) communities
Web services • Web Services build loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles. • Web Services interact by exchanging messages in SOAP format • The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.
What is a Grid? • You won’t find a clear description of what is Grid and how does differ from a collection of Web Services – I see no essential reason that Grid Services have different requirements than Web Services – Geoffrey Fox, David Walker, e-Science Gap Analysis, June 30 2003. Report UKe. S-2003 -01, http: //www. nesc. ac. uk/technical_papers/UKe. S-2003 -01/index. html. – Notice “service-building model” is like programming language – very personal! • Grids were once defined as “Internet Scale Distributed Computing” but this isn’t good as Grids depend as much if not more on data as well as simulations • So Grids can be termed “Internet Scale Distributed Services” and represent a way of collecting services together to solve problems where special features and quality of service needed.
Community Resources Grid Community databases have analogy to Television and the News Web that allow individuals to communicate instantly with each other via Web Pages and Headline News acting as proxies N resources deposit information and N can view – Complexity O(N)
Large and Small Grids N resources in a community (N is billions for the world and 1000 -10000 for many scientific fields) Communities are arranged hierarchically with real work being done in “groups” of M resources – M could be 10100 in e-Science Metcalfe’s law: value of network grows like square of number of nodes M – we call Grids where this true Metcalfe or M 2 Grids Nature of Interaction depends on size of M or N • Shared Information O(N) Complexity Grids for largish N • Complexity M 2 Metcalfe Grids for smaller M < N Grids must merge with peer-to-peer networks to support both Complexity O(N) and M 2 Systems
M 2 Interactions • Superimpose M 2 “Grids” on the sea (heatbath) of O(N) “ordinary” interactions
Repositories Federated Databases Database Sensors Streaming Data Field Trip Database Sensor Grid Database Grid Research Compute Grid Data Filter Services Research Simulations SERVOGrid ? GIS Discovery Grid Services Analysis and Visualization Portal Geoscience Research and Education Grids Education Customization Services From Research to Education Grid Computer Farm
Grids and Earthquake Science • Complexity N ≈ 1000 to 10000 Community resources building – – – Thousands of Data Servers of raw and curated data Services filtering and mining data Simulation Services Visualization Services Geographical Information Services Registry and metadata Services • These services can support several communities – National and International earth science researchers – Emergency response and critical infrastructure planning and management • Web Services will harmonize different countries (SERVO to i. SERVO) • Web Services will harmonize members of a community and between communities with common resources – Curation will bring data to interoperable certified form • National and International research collaborations analyzing particular ideas with many M 2 Complexity Grids – Typically many closely knit groups of say around M=10 -100 people and services
(i)SERVO Web (Grid) Services • Programs: All applications wrapped as Services using proxy strategy • Job Submission: supports remote batch and shell invocations – Used to execute simulation codes (VC suite, Geo. FEST, etc. ), mesh generation (Akira/Apollo) and visualization packages (RIVA, GMT). • File management: – Uploading, downloading, backend crossloading (i. e. move files between remote servers) – Remote copies, renames, etc. • Job monitoring • Workflow: Apache Ant-based remote service orchestration (NCSA) – Move towards a BPEL framework (can still implement with ANT) • Database services: support SQL queries – Expect Simpler version of OGSA-DAI (“Web Service-DAI”) Grid Database • Data services: support interactions with XML-based fault and surface observation data. – For simulation generated faults (i. e. from Simplex) – XML data model being adopted for common formats with translation services to “legacy” formats. – Migrating to Geography Markup Language (GML) descriptions.
Integration of Services Use OGCE Grid Portal Architecture to allow importing of existing Grid Services and their user interfaces Can expect GGF activities like OGSA to define/refine interfaces and projects around the world to produce more powerful services which can easily be added replacing existing services Geoscience Education Grid by transformations on research grid Emergency Response and Planning Grids by adding real-time control/collaboration and GIS tools • These additions common to all crises Service-1 Service-N GUI-1 Aggregation Portal GUI-N
Each Service has its own portlet Individual portlet for the Proxy Manager Use tabs or choose different portlets to navigate through interfaces to different services 2 Other Portlets OGCE Consortium
Key Grid Features of i. SERVO • The service model avoids a lot of the security complications that have caused trouble in other simulation based Grids – We don’t support from the portal general computer logins – you can run Geofest and not rm –r * • Geographical Information Systems is key set of generally useful service • Currently largely file based but streams will become more important – Data moves directly between services and is not necessarily written to and read from files – Must support high performance (fast) streams File based Stream based Filter Service
Dat a a Dat Grid Dat a F ilter Grid Data Assimilation Analysis Control Visualize Data Deluged Science Computing Architecture Dat a Distributed Filters massage data For simulation r ilter F O t an he Se d r G rv We rid ic b es HPC Simulation Filte ata D F ilter er Filt OGSA-DAI Grid Services Which is better use of money More compute nodes Or more Sensors?
Geographical Information Service (GIS) Data Formats and Services Open. GIS Consortium (OGC) is an international group for defining GIS data formats and services. Main data format language is the XML-based GML. • Subdivided into schemas for drawing maps, representing features, observations, … First Step: design GML schemas and build specialized Web Services for GPS and Earthquake data. OGC also defines services. • Services include Web Features Services, Web Map Services, Next Step: Implement OGC compatible Web Services for this problem i. e. build a GIS Grid • Also build services to interact with Quake. Tables Fault DB.
Quake. Tables+OGC Web Map Service Demo Intend to build OGC compatible map and feature services supporting high performance simulations
Grid Information Service Integrating GIS Web and Feature Services • Need to support dynamic feature services with different access restrictions (especially in i. SERVO) and with high performance streams WMS UDDI IS WFS california fault data california river data @gridnode 1 @gridnode 2 WFS california boundary data @gridnode 3
Different Performance Issues for i. SERVO • All systems are built of interlinked entities – Nature, Society, Grids and Parallel computing all link entities by messages • Most(all) complex systems have a hierarchical architecture – Grids link large macroscopic systems including sensors, databases, parallel computers – Parallel Computers consists of many desktop size nodes – Nodes have hierarchical memory structure with many cache levels • Systems have dimension d ≈ 2 to 3 • Communication bandwidth into a system of complexity C is proportional to C(1 -1/d) (Bandwidth/C α C-1/d) – C(Grid Resource) = M C(Desktop) where M ≈ 1 to 1000 is typical number of nodes in simulation resource • Parallel Computers need gigabit or better internal node bandwidth and node to node latency of around a microsecond • Grids will have terabit bandwidth but latency is AT BEST a millisecond (nodes next to each other) and is better considered as 100 milliseconds or greater across countries – Need to improve Web Service technology as science needs more bandwidth than business!
Two ways of Linking Modules Method based linkage of classic programming Module B Module A Method Calls. 001 to 1 millisecond Message based Grid and Service linkage Service B Messages Service A 0. 1 to 1000 millisecond latency
Grid Programming Model Fortran, C++, Java (Method based) Application (level 1 Programming) Application Semantics (Metadata, Ontology) Level 2 “Programming” Semantic Web (Message based) Systems Metadata (Context, State) Basic WS-* Infrastructure Web Service 1 WS 2 WS 3 Workflow (level 3) Programming Of Services AND Streams BPEL, HPSearch (Message based) All SERVOGrid capabilities are built as Web Services with 3 level programming model WS 4
What is a Simple Service? • Take any system – it has multiple functionalities – We can implement each functionality as an independent distributed service – Or we can bundle multiple functionalities in a single service • Whether functionality is an independent service or one of many method calls into a “glob of software”, we can always make them as Web services by converting interface to WSDL • Simple services are gotten by taking functionalities and making as small as possible subject to “rule of millisecond” – Distributed services incur messaging overhead of one (local) to 100’s (far apart) of milliseconds to use message rather than method call – Use compiled integration of functionalities ONLY when require <1 millisecond interaction latency – Latency not bandwidth is criterion
• • • Grids of Simple Services Link via methods messages streams Services and Grids are linked by messages Internally to service, functionalities are linked by methods A simple service is the smallest Grid We are familiar with method-linked hierarchy Lines of Code Methods Objects Programs Packages Methods CPUs Services Clusters MPPs Databases Sensor Federated Databases Sensor Nets Component Grids Compute Resource Grids Data Resource Grids Overlay and Compose Grids of Grids
Component Grids? • So we build collections of Web Services which we package as component Grids – Visualization Grid – Sensor Grid – Utility Computing Grid – Person (Community) Grid – Earthquake Simulation Grid – Control Room Grid – Crisis Management Grid • We build bigger Grids by composing component Grids using the Service Internet and Service Programming
Earthquake CIGrid … Electricity CIGrid … Earthquake Services Collaboration Grid Sensor Grid Registry Security Portals GIS Grid Data Access/Storage Core Grid Services Notification Workflow Flood CIGrid Flood Services and Filters Visualization Grid Compute Grid Metadata Messaging Physical Network Critical Infrastructure (CI) Grids built as Grids of Services
i. SERVO Strategy • Agree on what (type of) resources and capabilities need to put on the ISERVO Grid – Computers, instruments, databases, visualization, maps, job submittal …. • Agree on interfaces to resources from OGSA-DAI (databases) to particular data structures (GML/Open. GIS) – specify in XML • Implement Resources and Capabilities as Services – User Interface should be a portlet that can be integrated by the portal into web interface • Make certain overarching Grid capabilities such as workflow, federation and metadata are sufficient • SERVO Grid is a prototype of this strategy using several US sites rather than several countries – Can be naturally extended to i. SERVO, education, emergency response by extending resources • Web Service Architecture ensures continued interoperability and extensibility
Further i. SERVO Challenges • Make everything a Service • Understand algorithms and implementation for data assimilation • Agree on security and access control policies • Think about Data Curation – Set up policies for observational data and criteria for inclusion in i. SERVO data repositories • Think about Data Provenance – Generate and maintain metadata describing ownership, origins and transformations – Applies to both “experimental data” and results from simulations (visualizations) • Curation and Provenance change in research methodologies and requires funding! • Education and Emergency Response/Planning interesting offshoots of i. SERVO
Architecture of (Web Service) Grids built from Web Services communicating through an overlay network built in SOFTWARE on the “ordinary internet” at the application level • A new Internet built with SOAP messages replacing TCP pockets Grids provide the special quality of service (security, performance, fault-tolerance) and customized services needed for “distributed complex enterprises” • Developing Web Service compatible high bandwidth streaming transports We need to work with Web Service community as they debate the 60 or so proposed Web Service specifications • • Use Web Service Interoperability WS-I as “best practice” Must add further specifications to support high performance Database “Grid Services” for N plus N case Streaming support for M 2 case
Importance of SOAP • SOAP defines a very obvious message structure with a header and a body • The header contains information used by the “Internet operating system” – Destination, Source, Routing, Context, Sequence Number … • The message body is only used by the application and will never be looked at by “operating system” except to encrypt, compress it etc. • Much discussion in field revolves around what is in header! – e. g. WSRF adds a lot to header
Web Services • Java is very powerful partly due to its many “frameworks” that generalize libraries e. g. – Java Media Framework – Java Database Connectivity JDBC • Web Services have a correspondingly collections of specifications that represent critical features of the distributed operating systems for “Grids of Simple Services” – Some 60 active WS-* specifications for areas such as – a. Core Infrastructure Specifications – b. Service Discovery – c. Security – d. Messaging – e. Notification – f. Workflow and Coordination – g. Characteristics – h. Metadata and State – i. User Interfaces
WS-I Interoperability • Critical underpinning of Grids and Web Services is the gradually growing set of specifications in the Web Service Interoperability Profiles • Web Services Interoperability (WS-I) Interoperability Profile 1. 0 a. " http: //www. ws-i. org. gives us XSD, WSDL 1. 1, SOAP 1. 1, UDDI in basic profile and parts of WS-Security in their first security profile. • We imagine the “ 60 Specifications” being checked out and evolved in the cauldron of the real world and occasionally best practice identifies a new specification to be added to WS-I which gradually increases in scope – Note only 4. 5 out of 60 specifications have “made it” in this definition
Web Services Grids and WS-I+ • WS-I Interoperability doesn’t cover all the capabilities need to support Grids • WS-I+ is designed to minimal extension of WS-I to support “most current” Grids: it adds support for – Enhanced SOAP Addressing (WS-Addressing) – Fault tolerant (reliable) messaging – Workflow as in IBM-Microsoft standard BPEL • Security and Notification best practice and support will probably get added soon – There are Web Service frameworks here but various IBM v Microsoft v Globus differences to be resolved • UK OMII Open Middleware Infrastructure Institute is adopting this approach to support UK e-Science program – http: //www. omii. ac. uk/
Application Specific Grids Generally Useful Services and Grids Workflow WSFL/BPEL Service Management (“Context etc. ”) Service Discovery (UDDI) / Information Service Internet Transport Protocol Service Interfaces WSDL Base Hosting Environment Protocol HTTP FTP DNS … Presentation XDR … Session SSH … Transport TCP UDP … Network IP … Data Link / Physical Higher Level Services Service Context Service Internet Bit level Internet (OSI Stack) Layered Architecture for Web Services and Grids
Working up from the Bottom We have the classic (CISCO, Juniper …. ) Internet routing the flood of ordinary packets in OSI stack architecture Web Services build the “Service Internet” or IOI (Internet on Internet) with • Routing via WS-Addressing not IP header • Fault Tolerance (WS-RM not TCP) • Security (WS-Security/Secure. Conversation not IPSec/SSL) • Information Services (UDDI/WS-Context not DNS/Configuration files) • At message/web service level and not packet/IP address level Software-based Service Internet possible as computers “fast” Familiar from Peer-to-peer networks and built as a software overlay network defining Grid (analogy is VPN) SOAP Header contains all information needed for the “Service Internet” (Grid Operating System) with SOAP Body containing information for Grid application service
Narada. Brokering Computer Minicomputer Audio/Video Conferencing Client Server Modem Web Service B Peers Narada. Brokering Broker Network Queues Firewall Stream Server-enhanced Messaging Workstation Laptop computer Peers PDA Audio/Video Conferencing Client NB supports messages and streams
Narada. Brokering and IOI • “Software Overlay Network” features • Support for Multiple Transport protocols • Support for multiple delivery mechanisms – Reliable Delivery – Exactly-once Delivery – Ordered Delivery – Optional Delivery optimization modules for different modes • Compression/Decompression of payloads with optional module • Coalescing/Fragmentation of payloads with optional module • NTP Time Service • Security Service • Performance Monitoring • Performance optimized routing with optional module • Support for WS-Reliability, WS-Reliable. Messaging and their Federation
Performance Monitoring Every broker incorporates a Monitoring service that monitors links originating from the node. Every link measures and exposes a set of metrics • Average delays, jitters, loss rates, throughput. Individual links can disable measurements for individual or the entire set of metrics. Measurement intervals can also be varied Monitoring Service, returns measured metrics to Performance Aggregator.
Fast Web Service Communication I • IOI Application level Internet allows one to optimize message streams at the cost of “startup time”, Web Services can deliver the fastest possible interconnections with or without reliable messaging • Typical results from Grossman (UIC) comparing Slow SOAP over TCP with binary and UDP transport (latter gains a factor of 1000) Pure SOAP 7020 SOAP over UDP Binary over UDP 5. 60
Fast Web Service Communication II • Mechanism only works for streams – sets of related messages • SOAP header in streams is constant except for sequence number (Message ID), time-stamp. . • One needs two types of new Web Service Specification • “WS-Stream. Negotiation” to define how one can use WS-Policy to send messages at start of a stream to define the methodology for treating remaining messages in stream • “WS-Flexible. Representation” to define new encodings of messages
Fast Web Service Communication III • Then use “WS-Stream. Negotiation” to negotiate stream in Tortoise SOAP – ASCII XML over HTTP and TCP – – Deposit basic SOAP header through connection – it is part of context for stream (linking of 2 services) – Agree on firewall penetration, reliability mechanism, binary representation and fast transport protocol – Naturally transport UDP plus WS-RM • Use “WS-Flexible. Representation” to define encoding of a Fast transport (On a different port) with messages just having “Flexible. Representation. Context. Token”, Sequence Number, Time stamp if needed – RTP packets have essentially this structure – Could add stream termination status • Can monitor and control with original negotiation stream • Can generate different streams optimized for different end-points
IU SERVO Grid Contributions • Narada. Brokering provides streaming support – Fault Tolerance – Support for High Performance Streams – Basic Dynamic Information Environment – Notification • Good progress with GIS Grid with OGC compatible Web Map and Web Feature Services linked to pervasive Grid Information and workflow services • HPSearch provides programming and management model for streams and services – Supports multi-scale iterations (moving between different models implemented as different services) workflow and data assimilation