Скачать презентацию Data Grids Reagan W Moore San Diego Supercomputer Скачать презентацию Data Grids Reagan W Moore San Diego Supercomputer

5606d4a56ac5201b663596b2ff464b7c.ppt

  • Количество слайдов: 15

Data Grids Reagan W. Moore San Diego Supercomputer Center 9500 Gilman Drive, La Jolla, Data Grids Reagan W. Moore San Diego Supercomputer Center 9500 Gilman Drive, La Jolla, CA 92093 -0505 Phone: 858 534 -5073 FAX: 858 534 -5152 E-mail: moore@sdsc. edu http: //www. npaci. edu/DICE/ National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Topics • Data Grid Requirements – Data management – Automation – Latency hiding • Topics • Data Grid Requirements – Data management – Automation – Latency hiding • Current technology – Distributed collections / digital libraries / data grids • State of the art systems – Virtual data grids / persistent archives – Emerging Standards National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Data Management Environments • Code development – Collaboration, check-out, versioning • Run-time execution – Data Management Environments • Code development – Collaboration, check-out, versioning • Run-time execution – High performance access, locking, latency hiding, automation, archival storage • Publication – Discovery, consistency, persistent archives • Are the capabilities required by all three environments compatible? National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Data Requirements are Met by Collection Technology • Provide three levels of abstraction for Data Requirements are Met by Collection Technology • Provide three levels of abstraction for data, information, and knowledge management (bits, tagged attributes, relationships) • Automate access through use of information discovery on logical collections that span storage systems • Manage latency by streaming, caching, replication, aggregation, remote proxies, staging • Provide a persistent environment by building a consistent environment over evolving technology National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Current Technology • Logical data collections – Storage Resource Broker / Metadata Catalog • Current Technology • Logical data collections – Storage Resource Broker / Metadata Catalog • Abstract data management by building a data handling system that interoperates with storage systems (file systems, archives, databases) • Abstract information management by building information catalog management that interoperates with information repositories (databases) National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

SDSC Storage Resource Broker & Meta-data Catalog Application Resource, User Defined C, C++, Linux SDSC Storage Resource Broker & Meta-data Catalog Application Resource, User Defined C, C++, Linux I/O Unix Shell Java, NT Prolog Web Browsers Predicate SRB MCAT Archives Dublin Core HPSS, ADSM, HRM Uni. Tree, DMF File Systems Databases Unix, NT, Mac OSX Third-party copy Remote Proxies DB 2, Oracle, Postgres Data. Cutter Application Meta-data National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Information Management Projects • Digital Libraries – – – NSF Digital Library Initiative, Phase Information Management Projects • Digital Libraries – – – NSF Digital Library Initiative, Phase II - UCSB, Stanford NLM Digital Embryo digital library - GMU NPACI Digital Sky - Caltech 2 MASS sky survey California Digital Library - AMICO NSF National SMETE Digital Library - UCAR / DLESE • Grid Environments – – NASA Information Power Grid - NASA Ames DOE Data Visualization Corridor - LLNL DOE Particle Physics Data Grid - Babar NSF Grid Physics Network - U Fl • Persistent Archives – NARA Persistent Archive – NHPRC - Scalable archives National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Data Grids Data Grid - links multiple data collections Separate name spaces Separate administration Data Grids Data Grid - links multiple data collections Separate name spaces Separate administration domains Heterogeneous database instances Stage data from collection into the data grid Database A Data grid Database B The data grid is itself a collection that provides mechanisms to hide latency and provide a global namespace National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

State-of-the-art Data Management • Provide knowledge management abstraction – Abstract the processes that create State-of-the-art Data Management • Provide knowledge management abstraction – Abstract the processes that create the derived data product (Virtual data grid) – Abstract the collection formation used to organize the derived data products (Persistent Archive) • A persistent archive is a virtual data grid in which the derived data products are data collections National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Standards • Object Management Group - OMG – Model Driven Architecture for platform independent Standards • Object Management Group - OMG – Model Driven Architecture for platform independent models of services • Platform dependent models transform an abstract representation into CORBA, Java, C, …. • Builds upon Uniform Modeling Language (UML) • Manages life cycle for software services – Common Warehouse Metamodel • Provides abstract representation for collections that can be used to migrate collections to alternate databases • Builds upon a subset of UML National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Standards • World Wide Web Consortium - W 3 C – Semantic Web for Standards • World Wide Web Consortium - W 3 C – Semantic Web for natural language queries to collections. – Builds upon the DARPA Agent Markup Language for services, and logic manipulation languages (DAML-L, OIL) – Uses Resource Description Framework and XML National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Standards • ISO – Topic maps manage relationships between concept spaces and collection attributes Standards • ISO – Topic maps manage relationships between concept spaces and collection attributes – Provide mechanisms to manage semantic interoperability • Global Grid Forum – Provides authentication systems, data handling systems, execution environments National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Knowledge Based Data Grids Knowledge Repository for Rules Access Services Rules - KQL Knowledge Knowledge Based Data Grids Knowledge Repository for Rules Access Services Rules - KQL Knowledge Relationships Between Concepts Management XTM DTD Ingest Services Knowledge or Topic-Based Query / Browse Attributes Semantics Information Repository SDLIP Information XML DTD (Model-based Access) Attribute- based Query Fields Containers Folders Storage (Replicas, Persistent IDs) National Partnership for Advanced Computational Infrastructure Grids Data MCAT/HDF (Data Handling System - SRB) Feature-based Query San Diego Supercomputer Center

Data Intensive Computing Environment Group Staff Students - GSRA • • • • • Data Intensive Computing Environment Group Staff Students - GSRA • • • • • Reagan Moore Chaitan Baru Sheau Yen Charles Cowart Amarnath Gupta George Kremenek Bertram Ludäscher Richard Marciano Arcot Rajasekar Abe Singer Michael Wan Ilya Zaslavsky Bing Zhu Martin Kuhl Liying Sui Yang Yu Valter Crescenzi Students - Undergrad Interns • • • Peter Shin Roman Olshanowsky Shabbar Tambawala Pratik Mukhopadhyay +/- NN National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center

Further Information http: //www. npaci. edu/DICE National Partnership for Advanced Computational Infrastructure San Diego Further Information http: //www. npaci. edu/DICE National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center