07445fbec6c3c4746ceec0b32fd24a45.ppt
- Количество слайдов: 45
GEON IT Advances: ⁃ Data Integration Bertram Ludäscher Kai Lin ⁃ GEON Workbench Ilkay Altintas ⁃ Scientific Workflows Efrat Jaeger San Diego Supercomputer Center University of California, San Diego CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org
The Problem: Scientific Data Integration or: … from Questions to Queries … CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 2
Information Integration Challenges: S 4 Heterogeneities • Systems Integration – platforms, devices, data & service distribution, APIs, protocols, … Grid middleware technologies + e. g. single sign-on, platform independence, transparent use of remote resources, … • Syntax & Structure – heterogeneous data formats (one for each tool. . . ) – heterogeneous data models (RDBs, OODBs, XMLDBs, flat files, …) – heterogeneous schemas (one for each DB. . . ) Database mediation technologies + XML-based data exchange, integrated views, transparent query rewriting, … • Semantics – fuzzy metadata, terminology, “hidden” semantics, implicit assumptions, … Knowledge representation & semantic mediation technologies + “smart” data discovery & integration + e. g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy anyways! CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 3
Information Integration Challenges: S 5 Heterogeneities • Synthesis of analysis pipelines, integrated apps & data products, … – How to make use of these wonderful things & put them together to solve a scientist’s problem? Scientific Problem Solving Environments GEON Portal and Workbench (“scientist’s view”) + ontology-enhanced data registration, discovery, manipulation + creation and registration of new data products from existing ones, … GEON Scientific Workflow System (“engineer’s view”) + for designing, re-engineering, deploying analysis pipelines and scientific workflows; a tool to make new tools … + e. g. , creation of new datasets from existing ones, dataset registration, … CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 4
Ontology-Enabled Application Example: Geologic Map Integration domain knowledge n Show io tat n ese GY r rep OLO e dg NT le ow GE O Kn A formations where AGE = ‘Paleozic’ (without age ontology) Nevada CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Show formations where AGE = ‘Paleozic’ (with age ontology) +/- a few hundred million years www. geongrid. org 5
Querying by Geologic Age … CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 6
Querying by Geologic Age: Result CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 7
Querying by Chemical Composition … (GSC) CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 8
Querying by Chemical Composition: Results Note the fine differences in shades of gray: DO know: It’s NOT there! DON’T know! (not registered) OK – we got to work on the color coding ; -) CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 9
Querying w/ British Rock Classification (BRC) Uses a GSC BRC inter-ontology articulation mapping CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 10
British Rock Classification Query: Results Uses a GSC BRC inter-ontology articulation mapping CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 11
The Query: Show sedimentary rocks The Puzzle: Find the 17 differences in the results… but first: what states are we looking at? CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 12
Sedimentary Rocks: BGS Ontology CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 13
Sedimentary Rocks: GSC Ontology CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 14
Need for Knowledge-enabled Integration • A geologist analyzing chemical data from a pluton finds no recognizable correlation between variables. – What possible scenarios can he examine to understand this heterogeneity? • Measured ages also show a scatter – What is the significance of the observed spread in measure time? Knowledge Representation Research: • concept maps & ontologies • process maps & ontologies • semantic types • … to facilitate (even) “smarter” tools Data. Tables Geo. Chem. DB Geol. Age. DB CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 15
A Prerequisite: Resource Registration (1 a) Register ontologies – geologic age; rock classifications (GSC, BGS), seismology; … (1 b) optionally: register inter-ontology articulations – e. g. GSC ontology BGS ontology (2 a) Item-level dataset registration – ADN metadata; other controlled vocabularies & ontologies geologic age timescale (USGS), SWEET (NASA), …) (e. g. (2 b) Item-detail registration – e. g. associate values in a column with a concept (3) Use ontology-based query UI / application – e. g. query by geologic age and chemical composition CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 16
Demonstration Preview NOTE: A technology demonstration, not a content demonstration (vocabulary, ontology, maps, …) 1. Ontology Registration (geologic. Age. owl) 2. Dataset Registration (my. Shape. Files. zip) 3. Item-Level Association (1 2) 4. GEONsearch • 5. metadata, spatial, temporal, concept-based GEONworkbench • use of workspace e. g. composing new maps from existing ones … resume with GEON workflow overview CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 17
Demonstration Preview Client Access (via web services) User Access (via Portal) my. Ontology. owl metadata my. Dataset. foo metadata Resource. Registration GEON Catalog Other distributed apps Kepler, DLESE, … Search condition(s) spatial temporal concept GEONsearch GEONworkbench GEON Workspace (user) SRB GEONmiddleware User actions add delete manipulate Log external services Gazetteer, DLESE, … CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Geologic Age, Chronos, … www. geongrid. org 18
Dataset to Ontology Registration (Item-level) Domain Knowledge Ontologies Arizona CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 19
GEON Search: Concept-based Querying Portal Demonstration CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 20
Scientific Problem Solving Environments • GEON Portal and Workbench (“scientist’s view”) previous demonstration – a workbench for using existing/integrated tools • Kepler Workflow System (“engineer’s view”) – for (semi-)automating “scientific workflows” and “analysis pipelines” – a tool for making and deploying new tools – some features: • • … low-level plumbing to high-level conceptual flows … connect reusable components (“actors”, “boxes”) to form apps abstraction via nesting of subworkflows into composite actors deploy automated workflows on the Grid and/or with custom Uis – demonstrations available (“Kepler 2 Go-1. ” CD for Summer Institute) CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 21
A Kepler Scientific Workflow component (actor) libraries canvas for design and execution monitoring CYBERINFRASTRUCTURE FOR THE GEOSCIENCES inline documentation www. geongrid. org 22
GEON Dataset Extraction & Processing Translating query xml response to web service world. Image xml input format. XML SOAP response Look Inside Sample CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 23
GEON Dataset Registration Annotation form CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 24
GEON Dataset Registration ADN metadata Metadata display Registering validation CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 25
Putting it all together … CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 26
GEON Workflows & KEPLER HPC workflow http: //kepler-project. org CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 27
Using Kepler for Geological Data Integration Workflows Ilkay Altintas presenting joint GEON work of: Efrat Jaeger Bertram Ludäscher Kai Lin Ashraf Memon San Diego Supercomputer Center University of California, San Diego CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org
Some Requirements for a Scientific Workflow System (1/2) • …it should work… (No kidding!) USER REQUIREMENTS: • Design tools-- especially for non-expert users • Ease of use-- fairly simple user interface having more complex features hidden in the background • Reusable generic features – Generic enough to serve to different communities but specific enough to serve one domain (e. g. geosciences) • Extensibility for the expert user-- almost a visual programming interface • Registration and publication of data products and “process products” (=workflows); provenance CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 29
Some Requirements for a Scientific Workflow System (2/2) TECHNICAL REQUIREMENTS: • Error detection and recovery from failure – Logging information for each workflow • Allow data-intensive and compute-intensive tasks (Maybe at the same time) – HPC+X (From Dr. Berman’s last GSM talk) • • Allow status checks and on the fly updates Visualization… Semantics and metadata… Ask the experts in this room Certification, trust, security… CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 30
Kepler is… • … a scientific workflow system • … a cross-project collaboration New contributing partners: • • Cheminformatics: Resurgence (Kim Baldridge et al. ) Life Sciences: EOL (Mark Miller et al. ) Data Mining: SKIDL (Tony Fountain et al. ) Neuroinformatics: BIRN (coming…) • … an emerging open source tool for “scientific discovery workflows” Kepler 1. 0 alpha release Summer Institute CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 31
Some Recent Actor Additions Queries & Transformations Generic WS Invocation SQL Queries Command. Line Execution SMTP-based messaging Browser-based user interface File Transfer SRB Access Globus Job Execution Real-time data streaming CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 32
Web Services Actors (WS Harvester) 1 2 4 3 ”Minute-made” (MM) WS-based application integration • Similarly: MM workflow design & sharing w/o implemented components CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 33
GEON Contributions to Kepler • System demonstration - Using Kepler Features • GEON workflows in detail - - Dataset Registration Model Processing Datasets on the Fly and Registering with the GEONworkbench CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 34
Conclusions • Evolving system – GEON is a significant contributor – Plans for new generic and project-specific extensions • Second alpha release available as CD – Installers for Windows, Linux, Mac. OSX – Daily version tests and JWS installer generation • User manuals and developer documentation is coming soon! • More: next week during the Summer Institute … Kepler project website: http: //kepler-project. org Thanks! CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 35
E N D GEON IT Advances: Bertram Ludäscher ⁃ Data Integration Kai Lin ⁃ GEON Workbench Ilkay Altintas ⁃ Scientific Workflows Efrat Jaeger San Diego Supercomputer Center UC San Diego CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org
Related Publications • Semantic Data Registration and Integration • • • On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B. Ludäscher, 16 th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21 -23 June 2004, Santorini Island, Greece. A System for Semantic Integration of Geologic Maps via Ontologies, K. Lin and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003. Towards a Generic Framework for Semantic Registration of Scientific Data, S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003. The Role of XML in Mediated Data Integration Systems with Examples from Geological (Map) Data Interoperability, B. Brodaric, B. Ludäscher, and K. Lin. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003. Semantic Mediation Services in Geologic Data Integration: A Case Study from the GEON Grid, K. Lin, B. Ludäscher, B. Brodaric, D. Seber, C. Baru, and K. A. Sinha. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003. • Query Planning and Rewriting • • • Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher, Proc. 23 rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004. Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan Nash and B. Ludäscher. , 9 th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS 2992. Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20 th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April 2004. CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 37
Related Publications • Scientific Workflows • • Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16 th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21 -23 June 2004, Santorini Island, Greece. Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Workflow in Grid Systems (GGF 10), Berlin, March 9 th, 2004. An Ontology-Driven Framework for Data Transformation in Scientific Workflows, S. Bowers and B. Ludäscher, Intl. Workshop on Data Integration in the Life Sciences (DILS'04), March 25 -26, 2004 Leipzig, Germany, LNCS 2994. A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the 2 nd Intl. Conference on Web Services (ICWS), San Diego, California, July 2004. CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 38
Additional Material (for questions etc) CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org
Multi-Hierarchical Rock Classification System (GSC) … a target ontology (after conversion to OWL) for geologic map registration … Genesis Fabric Composition Texture CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 40
Inside Ontology-Enabled Map Integration User: “Show formations from Cenozoic!” Age Ontology Cenozoic Query Rewriting Quaternary Tertiary Tkgm Quaternary Q … …… PERIOD select FORMATION where AGE=“Tertiary” or AGE=“Quaternary” Qg Quaternary … … … Twp Tertiary … … … Twl Tertiary … … … FORMATION ABBREV Tertiary Arizona PERIOD LITHOLOGY Montana West Tkgm Q Qg Map Rendering Twp Twl … Color Definition CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 41
Data Source Wrapping and Integration ABBREV Colorado Utah Nevada Wyoming New Mexico Montana East … PERIOD Age … NAME Arizona Formation … PERIOD Age … TYPE Formation … PERIOD Age … Composition … Fabric … Texture … Formation Livingston formation … Age Tertiary. Cretaceous AGE Idaho LITHOLOGY … NAME … FORMATION … Age Formation … FMATN … TIME_UNIT NAME Formation … … Composition PERIOD Age … … Fabric FORMATION PERIOD Formation … Age … Texture FORMATION AGE Montana West LITHOLOGY andesitic sandstone … CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 42
Gravity Modeling Design Workflow • Idea: Comparing observed & synthetic gravity models • Steps: – Extracting and merging gravity depths from heterogeneous data sources for a Lat/Lon bounding box (databases, web services). – Projecting and interpolating data sources into the same coordinate systems. – Differencing observed and synthetic models. – Displaying Differential raster image. CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 43
Grid Interpolation • Interpolating queried gravity data on the grid and displaying it using a color schema. • Currently IDW interpolation algorithm supported. Future plans: Minimum Curvature, TIN, Kriging and Spline. • Output: either ascii x, y, z, p or ESRI ascii grid format. • Display: using global mapper service. CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 44
Gravity Modeling Design Workflow CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www. geongrid. org 45
07445fbec6c3c4746ceec0b32fd24a45.ppt