77eac46d32453906a0f99a41ed278011.ppt
- Количество слайдов: 29
Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart http: //www. marine. csiro. au/datacentre/
The Australian Mar. LIN Connection • Great Minds Think Alike !!! – Almost simultaneous emergence of UK and Australian Mar. LIN projects. – Different emphases but many overlapping problems. • Why Are We Here ? – To exchange ideas, make some new data friends and hopefully leverage off of UK developments that can also address Oz marine data issues. • Who Are We ? – CSIRO Division of Marine Research (CMR), an Australian Commonwealth Government research agency. Approximately 300 staff. One of a number of such agencies (others include AIMS, GBRMPA)
Orientation information. . . RV Franklin Oceanographic research vessel 16 million 2 km ocean territory CMR FRV Southern Surveyor Fisheries research vessel
CMR - Data Centre • Established in 1997 – 12 staff (multidisciplinary), – service Division and two ships, – focal point for promoting data management culture within CMR, • Data Management Strategy – developed in 1997, – Outlines actions that CMR must take to move its data management practices into the 21 st Century, – Covers policy, technology issues, data handling procedures, standards development/adoption - available on Data Centre web site.
What Are Some Of The Issues We Face ? • Corporate knowledge of datasets held (internal & external sourced). • Access & re-use of data generated by individuals. • • Data archiving for re-use. • • Data pricing policies. Conformance with national & international standards (data exchange, data processing, data documentation) • Contribution to national data management issues & activities. • Data management tools (availability, development for re-use, divisional software libraries) • Integration of data, records, publications and financial systems Purchase & sharing of externally sourced data. • • Coordination of external data exchange/data provision. Divisional use of WWW & database technology.
What Is Our Approach ? Data Licensing Module E. Commerce Module Standards Hyperlinked Data Files Hyperlinked Databases Hyperlinked Publications Basic WWW Metadata Directory Divisional Data Policies
Data Licensing Module E. Commerce Module Hyperlinked Databases Standards Hyperlinked Data Files Hyperlinked Publications Basic WWW Metadata Directory Client (Java Applet, or Browser) Divisional Data Policies Project Informati on Device Sources GIS Sources Profil e Data Type Video s Data Model Data Image Data Meteorologic al Server RMI HTTP Servlet (Database Access Program) Model Sources Conceptu al/ Physical Deployme nt Photo Data Catch Data Network Protocol JDBC Time Series Data Types Data Catalogu e Development Of CMR’s Research Database Spatial Option Yet to be includ Sample ed Data Sedime n ORACLE Database
Project Information Device Sources GIS Sources Time Series Data Types Profile Data Types Data Catalogue {long table indexing all features in the database} Video Data Photo Data Catch Data Model Data Image Data Meteorological Data Sample Data Sediment Data Model Sources Conceptual/ Physical Deployment
Concluding Remarks
Mar. LIN - Marine Laboratories Information Network and CAAB - Codes for Australian Aquatic Biota
Situation at CMR pre-Mar. LIN Externally sourced data Scientific publications (numerous dispersed resources) Derived products CMR -produced reference works & guides Indexes and catalogues project/ voyage/ person details Supporting information Dispersed data Centrally-held data CAAB taxonomic database
Mar. LIN metadatabase as at July 1999: showing pointers/links to ( ) or information sourced from ( Externally sourced data ) Scientific publications CMR -produced reference works & guides Derived products Indexes and catalogues project/ voyage/ person details Supporting information Dispersed data Centrally-held data CAAB taxonomic database
Mar. LIN design questions. . . • How to make data querying, entry and maintenance easily user-accessible (but maintain metadata standards)? – use www interfaces, but moderate user entries and updates • What information to store, in what manner? – use ANZLIC and “Blue Pages” elements, plus additional ones as deemed useful for Divisional needs • What metadata standards, thesauri, etc. to follow? – mostly follow ANZLIC & “Blue Pages”, with some extensions & replacements • How to handle taxon-level information? – store taxonomic codes in Mar. LIN, referenced to scientific and common names from Division’s “CAAB” taxonomic database • What about subject-based searching? – use “Mar. LIN subject categories”, developed from ASFA scheme (R)
Mar. LIN metadatabase implementation • Oracle database, with www front end and HTML forms/JAVA interfaces – www used for searching and metadata submission/ metadata update, also for most administrative functions • Relational design – common aspects to numerous records (e. g. project, voyage, person information) stored in separate tables • Data entry and update is via user logon (restricted to users on CMR computer domain) – enterer details, time, etc. are automatically logged and added to record on submission • “Submitted” records reside in separate (parallel) tables until approved by database administrator • Nightly script runs to generate CMR’s “Blue Pages” entries from Mar. LIN metadata records
Mar. LIN metadata elements # = “Blue Pages” extension to ANZLIC standard, * = new element added for Mar. LIN Dataset. . . Title * Identifier/Short Title # Data Type Custodian Organisation * Contributors * Acknowledgements # References * Publication Date Abstract * Author's Comments On-Line Links (Data, Graphics, Documentation) Location Keywords Bounding Coordinates Subject Categories and Search Words * Mar. LIN Subject Categories # Habitat Keywords # Taxonomy Keywords * CAAB Species Codes # Parameters Measured # Equipment Used # Blue Pages Themes ANZLIC Search Words Project, vessel and voyage details # Originating Project Name * Project Details # Platform/Vessel Name * Voyage Identifier * Voyage Details Data Currency and Status Date range (Beginning and End Dates) Progress Maintenance Data Access Stored Data Format(s) * Stored Data Volume * Stored Data Location * Specific Software Requirements * Stored Data Documentation Available Format Type(s) Access Constraints Data Quality Data Source, Processing, and Quality Control * GIS Datum and scale used (if relevant) Logical Consistency Report Positional Accuracy Parameter Accuracy Completeness Contact point Contact Person and Details Metadata Information * Related Mar. LIN Datasets Additional Metadata * Metadata Availability Metadata Created On/By. . . (date, person) * Metadata Last Updated On/By. . . (date, person)
Aspects of Mar. LIN “Search” interface. . .
Example search results Lists of titles Summary information Links to voyage tracks
External Mar. LIN linkages (July 1999) Blue Pages search facility Internet search engines Mar. LIN search facility Selected details exported to. . . Mar. LIN database (CMR’s records) Hyperlinks to documents, data, etc. Online link back to. . . “Blue Pages” HTML documents (many organisations’ records)
Mar. LIN continuing development. . . • Incorporate “live” links to other databases e. g. CAAB, CMR corporate databases, library systems • Increase data coverage, try to maintain currency and consistency of entries • Continue to “sell the concept” for users to document their own data • • Make a “view” of Mar. LIN records visible to ASDD • Mar. LIN v. 2 to be developed in c. 12 months … closely integrated with new Divisional data storage system (with parallel development of interfaces etc. , automated retrieval of data as well as metadata) Possible future links with metadata systems based on other standards, using “crosswalks”
Mar. LIN present ( ) and future ( Externally sourced data ) operation Scientific publications CMR -produced reference works & guides Derived products Indexes and catalogues project/ voyage/ person details Supporting information Dispersed data Centrally-held data CAAB taxonomic database
CAAB Codes for Australian Aquatic Biota http: //www. marine. csiro. au/caab/
Example CAAB codes (hammerhead sharks) (dogfishes)
CAAB rationale/ historic reasons for existence • Taxonomists needed a tool for organising specimen collections and supporting information • Field biologists needed a tool for rapid data entry (to include categories corresponding to “non orthodox groups”) • Data custodians needed a system for storing taxonrelated information in a long-term, stable form (independent of future name changes) • Use of “intelligent” codes permits rapid human- or computer-based sorting of taxa, and retrieval of supporting information
CAAB implementation • CAAB has 47 “major categories” (e. g. fish, mammals, Algae Phaeophyta, angiosperms), each with up to 999, 999 available codes for allocation to Australian aquatic taxa • Coverage of Australian fish species (c. 4, 500) is essentially complete, also some smaller groups (marine reptiles and mammals) • Other categories - populated on “as needs” basis (e. g. 300 molluscs, 350 crustaceans, 60 angiosperms - plus ongoing additions) • 2 -digit prefix (category code) and 3 -digit family code are machinesortable - e. g. : – – 37 = fish 37 001 = fish family 1 37 001001 = fish family 1 species 1 families are in contiguous blocks, e. g. families 37 005 to 37 024 are all types of sharks • Numeric code is attached to taxon, independent of changes of scientific or common name (gives relative stability for data storage) • Master CAAB database stores taxon/voucher specimen details, present and any previous scientific names, common names, comments and other information
Present usage of CAAB information CMR databases (including Mar. LIN) used in. . . CAAB taxonomic database Quoted in. . . CMR -produced reference works & guides generates. . . Other organisations’ databases CAAB - generated species lists
Intended future CAAB operation CAAB www interface CAAB taxonomic database Users’ databases Links to online information CAAB taxon-level report CAAB species lists on-line generation Additional search facilities e. g. Mar. LIN, other CMR databases, ITIS, www, etc.
CAAB continuing tasks. . . • Taxon-level information from other local databases to be incorporated into CAAB (coverage will gradually be extended to most groups of aquatic organisms) • Database structure will be improved to suit external www user access to the database • Species common names to be handled in a structured way, permitting user-definable output formats, more comprehensive searching, etc. • Hyperlinks will be incorporated, to electronic versions of available maps, images, etc. as available • On-line links to other databases from CAAB will be enabled (and vice versa)
Selected data and metadata developments elsewhere in Australia On-line data, data products, and summaries Other metadata systems On-line references Collection-based information


