17d91524dee5ba75edcf83f63d09adc5.ppt
- Количество слайдов: 33
The NERC Metadata Gateway: a product of the NERC Data. Grid Bryan Lawrence (on behalf of a big team) + + +[ BADC, BODC, CCLRC, PML and SOC ]=
Outline • Introduction to NERC, the NERC Data Centres, and NCAS • The NERC Data. Grid Project – Key Components: • Data Tools, Data Discovery, {Access Control} – NDG Information Environment • Key Standards Structures: the ISO Family • From CSML, {MOLES}, DIF to ISO 19139 (Num. Sim) • Distributed Content Search – Why we did it this way – Our Discovery Architecture • NDG Discovery – Now … and – The Future – The “New NERC Metadata Gateway” • ISO 19139 Best Practice • Summary TECO-WIS, Nov 2006
Some Introductions • NERC: The Natural Environment Research Council – The major player in UK environmental research – Is both a funding agency, and a conglomeration of “centres”: internal “research” institutes, • The British Oceanographic Data Centre (BODC) is part of one of the internal institutes. And external “collaborative” centres, which include: • The Plymouth Marine Laboratory • The National Oceanographic Centre, Southampton • The National Centre for Atmospheric Science, NCAS, mostly embedded in Universities, but part of which is • the British Atmospheric Centre (BADC) which is embedded in the • CCLRC: Council for the Central Laboratories of the Research Councils – Is about to be replaced by a new entity, which might be called the “Large Facilities Research Council” • NERC has seven discipline based designated data centres (including the BODC and BADC), and requires as much integration of data access as possible. – From discovery to utilisation, from genomics to ecology, from oceanography to atmospheric science, from antarctic science to British geology … TECO-WIS, Nov 2006
Complexity + Volume + Remote Access = Grid Challenge British Atmospheric Data Centre NCAR http: //ndg. nerc. ac. uk British Oceanographic Data Centre TECO-WIS, Nov 2006
If it’s not obvious • Lots of organisations – Varying membership, and trust internally and between each other is not consistent. • Lots of priorities – Not all organisations are “about” data • Different internal storage structures – Data stored in variety of databases and filesystems. – Some things well documented, but not automated – Some things automated, but information content is sparse … • Integrating data access non-trivial And none of that includes the important relationships with customers and collaborators! TECO-WIS, Nov 2006
Key Components Discovery Tools • Discovery Portal – Metadata Search – Direct Links to Data and Services Data Tools • Slice and Dice • Visualisation • Manipulation Access Control • Systems are resource limited • Data may access may be restricted by license Metadata Structures to support all the above TECO-WIS, Nov 2006
Standards Landscape Or two: • ISO TC 211 Standards, e. g – ISO 19101: Geographic – ISO 19103: Geographic language – ISO 19107: Geographic – ISO 19108: Geographic – ISO 19109: Geographic schema – ISO 19111: Geographic by coordinates – ISO 19115: Geographic information – Reference model information – Conceptual schema information – Spatial schema information – Temporal schema information – Rules for application information – Spatial referencing information – Metadata • Open Geospatial Consortium Specs – Geographic Markup Language, a toolkit for building data descriptions – WMS, WCS, WFS, WPS: the Web (Map, Coverage, Feature, and Processing) services. TECO-WIS, Nov 2006
Standards • ISO 19101: Geographic information – Reference model …in a defined logical structure… …delivered through services… …and described by metadata. A geospatial dataset… …consists of features and related objects… TECO-WIS, Nov 2006
Data Description Standards • Geographic ‘features’ – “abstraction of real world phenomena” [ISO 19101] – Type or instance – Encapsulate important semantics in universe of discourse – “Something you can name” • Application schema – Defines semantic content and logical structure – ISO standards provide toolkit: • • spatial/temporal referencing geometry (1 -, 2 -, 3 -D) topology dictionaries (phenomena, units, etc. ) – GML – canonical encoding [from ISO 19109 “Geographic information – Rules for Application Schema”] TECO-WIS, Nov 2006
CSML: Climate Science Modelling Language • Fully Featured GML Application Schema, with extensions for – External binary data (Grib, net. CDF etc) – Irregular Grids, “Proper” vertical coordinate systems (both activities now on OGC and ISO standards tracks) • V 1. 0 included seven feature types and provided only “data” modelling. • V 1. 0 CSML tooling includes a scanner (creates CSML from net. CDF files), and a parser (instantiates python objects which can be manipulated scientifically (based on the XML CSML documents). TECO-WIS, Nov 2006
Marine. XML Testbed For each XSD (for the source data) there is an XSLT to translate the data to the Feature Types (FT) defined by CSML. The FT’s and XSLT are maintained in a ‘Marine. XML registry’ Data from different parts of the marine community conforming to a variety of schema (XSD) XSD XML Biological Species Phenomena in the XSD must have an associated portrayal The FTs can then be translated to equivalent FTs for display in the ECDIS system S 52 Portrayal Library XSD XML Chl-a from Satellite XSLT XML Parser XSD Measured Hydrodynamics XML XSLT Marine XML GML (NDG) XML Feature Types XSD XML Modelled Hydrodynamics XSD XML S-57 v 3 GML The result of the translation is an encoding that contains the Feature described using marine data in S-57 v 3. 1 Application weakly typed (i. e. Schema can be imported generic) Features and are equivalent to the same features in CSML’ Slide adapted from Kieran Millard (AUKEGGS, 2005) XSLT SENC See. My. DENC XSLT ECDIS acts as an example client for the data. TECO-WIS, Nov 2006 Data Dictionary Features in the source XSD must be present in the data dictionary.
The Concept of re-using Features Here structured XML is converted to plain ascii HTML warning service Here the required is text in the formsame XML ‘on pages are generated converted to be XML can alsothe SENC for athe fly’ numerical model format used SVG converted to in a to proprietary graphically display datatool for viewing electronic navigation charts. All this requires agreement on standards Slide adapted from Kieran Millard (AUKEGGS, 2005) TECO-WIS, Nov 2006
CSML Round Tripping - 1 Managing semantics New Dataset conceptual model Conforms to 101 010 UGAS produces V 1. 0 (Python, Complete) Application parser
CSML Round Tripping - 2 Managing data - 1 V 1. 0 V 2 in development CF Dataset GML app schema scanner 101 010 produces XML CF V 1. 0 V 2 in development Application parser
CSML 2: Structure “Affords” Behaviour Moving beyond GML, but staying in the ISO Frame! ISO 19123 coverage class ‘Affordance’ modelled with UML <
CSML 2: Related to new OGC Observations and Measurements Spec An Observation is an Event whose result is an estimate of the value of some Property of the Feature-of-interest, obtained using a specified Procedure TECO-WIS, Nov 2006
Managing Data 2 CF Dataset 101 010 scanner
The Most Important Decision What is a dataset? Granularity too coarse: can’t find what you want – not enough information exposed. Granularity too fine: can’t find what you want – buried in unordered results. TECO-WIS, Nov 2006
Distributed Query Options: • Harvest or Crawl • Distribute Query to known targets versus harvest from known targets and do local query – Timeliness versus Responsiveness Decision: • NDG Discovery based on Open Archives Initiative Protocol for Metadata Harvesting – Additional Partners include NCAR, MPI-WDCC, TPAC, UK-MDIP TECO-WIS, Nov 2006
Discovery Metadata Usage XML: Metadata store: can support a limited variety of different xml schema provided WS-interface understands them (need unique xquery for each method, schema pair) TECO-WIS, Nov 2006
Metadata Formats Currently Supporting • NASA Global Change Master Directory: Directory Interchange Format (DIF) Experimenting with: • Vanilla ISO 19139 • Dublin Core • UK Gemini V 1 format Will support following ISO profiles for harvest: • (eventually) UK Gemini profile • WMO profile • IOC profile • (whenever) US FGDC profile ALL SIMULTANEOUSLY: XML Database plus appropriate xqueries TECO-WIS, Nov 2006
Simulation in the context of ISO 19139: Num. Sim NDG Products: Num. Sim TECO-WIS, Nov 2006
Num. Sim Example TECO-WIS, Nov 2006
Firefox Search Plugin TECO-WIS, Nov 2006
International Discovery - Climate TECO-WIS, Nov 2006
NDG “New Interface” TECO-WIS, Nov 2006
Within Record Scrolling Down TECO-WIS, Nov 2006
New Interfaces Simple Advanced Issues: • Times (forecast, paleo etc) • BBOX (near poles and dateline) • Semantic Vocabulary matching (exploiting a new NDG web-service providing thesaurus content, and ontology mapping) (No CSS as yet) TECO-WIS, Nov 2006
ISO • Metadata extensions and profiles TECO-WIS, Nov 2006
ISO 19139 Background: • Designed to exploit as much as possible of the xml-schema machinery • Not designed for Humans! Advice: • Use in conjunction with a clear concept of why it’s being used: • Decide on dataset granularity, and use other metadata schema to describe how to use content (“A” metadata; e. g. an application schema of GML). • Devise a profile with utility then: restrict, restrict. Document. Register. TECO-WIS, Nov 2006
On Restriction ISO 19139 is also about INTEROPERABILITY! • Don’t follow the ISO 19139 advice and produce a new schema! • Ensure that your profile instances are valid vanilla ISO 19139 • Restrict content out-of-band, e. g. schematron, etc. • Agree on how you’re going to deploy ISO 19139 TECO-WIS, Nov 2006
On Extension ISO 19139 is also about INTEROPERABILITY! Do follow the ISO 19139 advice and produce a new schema! • Do what you need for your community, but: • Design so that code expecting ISO 19139 instances can parse yours! • Make it easy for third party code to ignore your content! TECO-WIS, Nov 2006
Summary • NDG dealing with heterogeneous environment • Successful deployment of OAI with discovery metadata (There are some issues differentiating between model simulations and ordering response sets) • Directly linking to and exploiting GML application schema • Web Service backends make deployment easier. • Communities need to be very careful how they deploy ISO 19139 TECO-WIS, Nov 2006


