Скачать презентацию Data Grid Services and Pipelines Arun Jagatheesan Architect Скачать презентацию Data Grid Services and Pipelines Arun Jagatheesan Architect

31d2f26bb5deb82bfb45f4b5cf8fc6d2.ppt

  • Количество слайдов: 35

Data Grid Services and Pipelines Arun Jagatheesan Architect & Technical Lead, SDSC Matrix arun@sdsc. Data Grid Services and Pipelines Arun Jagatheesan Architect & Technical Lead, SDSC Matrix arun@sdsc. edu NPACI Summer Computing Institute August 18, 2003, San Diego National Partnership for Advanced Computational Infrastructure University of Florida San Diego Supercomputer Center

Credit / Acknowledgements • Participants • • • Allen Ding Lucas Gilbert Reena Mathew Credit / Acknowledgements • Participants • • • Allen Ding Lucas Gilbert Reena Mathew Erik Vandiekieft (IBM) Xi Cynthia Sheng • Well Wishers • Reagan Moore & SRB Team • Kim Baldridge • YOU !!! • Sponsors • NSF Gri. Phy. N, NSF SCEC, NPACI REU, NIH BIRN National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 2

Lecture Outline • Concepts • Distributed Data Management • Process Flow Pipelines • Web Lecture Outline • Concepts • Distributed Data Management • Process Flow Pipelines • Web Services; Grid Services • Theory • Data Grid Language (DGL) • Practice (Hands-on) • SDSC Matrix • Web Demo • Matrix Java API National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 3

Grid as Utility Computing National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Grid as Utility Computing National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 4

Logical Layers (bits, data, information, . . ) Semantic data Organization (with behavior) my. Logical Layers (bits, data, information, . . ) Semantic data Organization (with behavior) my. Active. Neuro. Collection patient. Records. Collection Virtual Data Transparency image. cgi image. wsdl image. sql Data Replica Transparency image_0. jpg…image_100. jpg Interorganizational Information Storage Management Data Identifier Transparency E: srb. Vaultimage. jpg /users/srb. Vault/image. jpg Select … from srb. mdas. td where. . . Storage Location Transparency Storage Resource Transparency National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 5

Is that all? We need more Hey, Who is this Guy? National Partnership for Is that all? We need more Hey, Who is this Guy? National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 6

Data Discovery New data Digital entities updates relationships among data in collections Meta-data Services Data Discovery New data Digital entities updates relationships among data in collections Meta-data Services invoked to analyze new relationships Services DGMS applications get notified of state updates State National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 7

Distributed Data Management • Data collecting • Sensor systems, object ring buffers • Data Distributed Data Management • Data collecting • Sensor systems, object ring buffers • Data organization • Collections, manage data context • Data sharing • Data grids, manage heterogeneity Services, Data flow pipeline Management • Data publication • Digital libraries, support discovery • Data preservation • Persistent archives, manage technology evolution • Data analysis • Processing pipelines, choreograph data and knowledge extraction • Data mediation • Semantic data, mappings between data, information, knowledge National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 8

Data process-flow pipelines Input Compute Coordinated execution amongst flows Research Archive Digital Library National Data process-flow pipelines Input Compute Coordinated execution amongst flows Research Archive Digital Library National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 9

Web Services • Web Page (HTML) • Web Service • Searched and used by Web Services • Web Page (HTML) • Web Service • Searched and used by human being • Any computer • Useful for dissemination of information on any topic • Searched and used by computer programs • Any programming language, OS etc • Useful for dissemination of services for any topic • WSDL • HTML – describe data layout • XML/ WSDL – Web Service Description • SOAP (HTTP/SMTP) – Transport/Access • SOAP (HTTP/SMTP) • HTTP – transport data • UDDI - Discover • UDDI • Google – discover data National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 10

Lecture Outline • Concepts • Distributed Data Management • Process Flow Pipelines • Web Lecture Outline • Concepts • Distributed Data Management • Process Flow Pipelines • Web Services; Grid Services • Theory • Data Grid Language (DGL) • Practice (Hands-on) • SDSC Matrix • Web Demo • Matrix Java API National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 11

Need for Standard DGL SQL DDL, DML, DQL Database (DBMS) 121. Event DGL XML Need for Standard DGL SQL DDL, DML, DQL Database (DBMS) 121. Event DGL XML based, Invoke Operations Subset Xquery Process flow DGMS Hits. sql University of Gators 121. Event Thit. xml National Lab National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 12

Data Grid Language • XML based asynchronous protocol • Describe data sets, collections, datagrid Data Grid Language • XML based asynchronous protocol • Describe data sets, collections, datagrid operations, . . . • Access and Manage data grids, data-flow pipelines • Query on data resource (based on W 3 C XQuery) • Facilitates Grid Workflow • Sharing of granular state information about execution of each datagrid operation amongst different processes or services National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 13

Data Grid Request (DReq) • Datagrid Request • Asynchronous requests for data/process-flow in datagrids Data Grid Request (DReq) • Datagrid Request • Asynchronous requests for data/process-flow in datagrids • Requests are either a Transaction or a Status Query • • Each Transaction consists of one or more Flows Each Flow consists of one ore more datagrid operations Datagrid operation = data transformation or data query A flow can be executed sequential or parallel National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 14

Data Grid Request National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University Data Grid Request National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 15

Data Grid Response • Datagrid Response • Either Transaction Acknowledgement or Status Response • Data Grid Response • Datagrid Response • Either Transaction Acknowledgement or Status Response • Status Response contains the results of a Transaction • Response could be received at any granular level • Status response is used for coordination of flows and interprocess notifications National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 16

Data Grid Response (DRes) National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grid Response (DRes) National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 17

Lecture Outline • Concepts • Distributed Data Management • Process Flow Pipelines • Web Lecture Outline • Concepts • Distributed Data Management • Process Flow Pipelines • Web Services; Grid Services • Theory • Data Grid Language (DGL) • Practice (Hands-on) • SDSC Matrix • Web Demo • Matrix Java API National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 18

“Lets play who wants to be a coder” Now its your turn to take “Lets play who wants to be a coder” Now its your turn to take the red pill from Matrix It gets interesting from here, lets us all do coding National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 19

SDSC Matrix Architecture SOAP Service Wrapper Abstraction JAXM Wrapper OGSA RPC-Style for SOAP Event SDSC Matrix Architecture SOAP Service Wrapper Abstraction JAXM Wrapper OGSA RPC-Style for SOAP Event Publish Subscribe, Notification JMS Messaging System Matrix Data Grid Request Processor Transaction Handler Flow Handler and Execution Manager Status Query Handler XQuery Processor Termination Handler Matrix Agent Abstraction SRB Agents OGSA Agent Pipeline Query Processor WSDL Agent Data flow pipeline Meta data Manager Persistence (Store) Abstraction JDBC In Memory Store National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 20

Lesson – 1 : Data Grid Request Create Data Grid Request and its components Lesson – 1 : Data Grid Request Create Data Grid Request and its components National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 21

Learn it your self : Task - 1 • Create Flow(0) in a Data Learn it your self : Task - 1 • Create Flow(0) in a Data Grid Request [DGREQ] • • • Create a simple Data Grid Request using Web Demo Add Flow Make it Sequential Add Step Create Collection Name : Click on Flow 0 again, to add one more step in this Flow 0 Create Container Name : Click on DGRequest link to see Flow 0 with 2 steps National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 22

Learn it your self : Task - 2 • Create Flow(1) in a Data Learn it your self : Task - 2 • Create Flow(1) in a Data Grid Request [DGREQ] • • • Click on DGRequest link to see Flow 0 with 2 steps Click on Add Flow Make it of type parallel Add Step Rename Collection Old Collection : to new name Click on Flow 1 link, to add one more step in this Flow 1 Create Collection Name : Click on DGRequest link to see 2 Flows with 2 steps each National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 23

Learn it your self : Task - 3 • Add Doc Meta for [DGREQ] Learn it your self : Task - 3 • Add Doc Meta for [DGREQ] • Click on DOCMETA • Fill your name (optional) • Press >> to save details • Doc Meta is just for reference. • The Author is the process which created the request. The Author could have created the request for another user National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 24

Learn it your self : Task - 4 • Add USERINFO for [DGREQ] • Learn it your self : Task - 4 • Add USERINFO for [DGREQ] • • Click on USERFINO Add user id : Organization: Challege Response: Home Directory Press >> to save National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 25

Learn it your self : Task - 5 • Add VOINFO for [DGREQ] • Learn it your self : Task - 5 • Add VOINFO for [DGREQ] • • Click on VOINO Add Server : Port: <5544> Click >> to save this in our demo • VO Info is for Virtual Organization Information National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 26

Learn it your self : Task - 5 • Send Data Grid Request • Learn it your self : Task - 5 • Send Data Grid Request • First check if all components are ready • We just learnt the components of a DReq. • They all must be [Y] in demo, indicating they are ready • Click Send • If all the components are ok, the Data Grid Request is shown in XML • Click Send DGReq National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 27

Lesson – 2 : Data Grid Acknowledgement, Status Get Data Grid Acknowledgement, Send Status Lesson – 2 : Data Grid Acknowledgement, Status Get Data Grid Acknowledgement, Send Status Request, Receive Status Response National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 28

Data Grid Acknowledgement • Data Grid Requests responded asynchronously • Data Grid Acknowledgement • Data Grid Acknowledgement • Data Grid Requests responded asynchronously • Data Grid Acknowledgement • Transaction ID to get status and result of DGReq • All valid results are responded by this acknowledgement before they are processes • Clients use this Acknowledgement Transaction ID • The ID may be passed to third parties which can subscribe to these events (Grid Process Pipelines) National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 29

Data Grid Status Req and Response • Transaction ID used to find status • Data Grid Status Req and Response • Transaction ID used to find status • Later versions can use publish/subscribe • Third party subscription also possible National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 30

Lesson – 3 : Query Data XQuery National Partnership for Advanced Computational Infrastructure San Lesson – 3 : Query Data XQuery National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 31

XQuery • W 3 C’s long waited answer – next SQL? • As always, XQuery • W 3 C’s long waited answer – next SQL? • As always, SDSC and our group lead the way • A subset of Xquery on Data Grid has been implemented • Built our own Xquery parser • Demo CDL (in house project for NPACI Chemistry Digital Library) National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 32

Lesson – 4 : Java API for Matrix National Partnership for Advanced Computational Infrastructure Lesson – 4 : Java API for Matrix National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 33

Demo Java Program • Remember, its for programmatic exchange of state information for coordinated Demo Java Program • Remember, its for programmatic exchange of state information for coordinated execution of data flow pipelines • Java API. Sample Program • Just download this zip file • Unzip the file • Type rundemo. bat • Type rundemoquery. bat National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 34

Summary • Coordinated execution of process-flow pipelines in Grid Environment necessary • Data Grid Summary • Coordinated execution of process-flow pipelines in Grid Environment necessary • Data Grid Language in Data Grid like a SQL for databases • SDSC Matrix Process flow pipelines • Dynamic control of SRB and other services • Discovery of process based on the data • Check out our latest release • Imagine what we can do for your project National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center University of Florida 35