07196a021738c622ec9ae3f02965561f.ppt
- Количество слайдов: 26
On-line biological data concepts at CSIRO Marine Research, Australia Tony Rees & Kim Finney Divisional Data Centre CSIRO Marine Research, Hobart, Australia http: //www. marine. csiro. au/datacentre/
Our website: http: //www. marine. csiro. au/datacentre/
Pre-existing situation at CMR (before 1997) • • Data in a variety of databases and flat files No metadata or digital documentation No web access to any data or metadata CAAB (taxon coding system) in existence but coverage patchy and compliance variable
Our implementation path Stage 1 (1997 -2000). . . • Construct a searchable, web-accessible metadata system and start population it with information - Mar. LIN v 1 • Upgrade CAAB to form a comprehensive taxon dictionary for Mar. LIN (also accessible by SQu. ID) • Build a pilot data store and visualisation system with a webdriven GUI (Java applet) - SQu. ID v 1 Stage 2 (2000 -). . . • Build SQu. ID v 2 (onwards) to become a comprehensive data store, with upgraded links to Mar. LIN and CAAB • Implement linkage between Mar. LIN and Australia-wide, distributed metadata search system Stage 3… ? ? ?
Our system overview Master data storage (includes index layer) Data directory (metadatabase) - holds info at the atomic data level - holds info at “dataset” level (e. g. survey, species range) Entry point to data Display relevant metadata Taxon dictionary Subsets of information shared with other metadata directory systems
Digression #1: Taxon matching • • Simplistic view: – text match on one field (“scientific name”) or two (genus + species) More comprehensive approach: – 10 or more fields required, e. g. in CAAB we define the following: Genus Subgenus Species Qualifier also need to flag: Subspecies - Is botanical or zoological code applicable? Variety - Species name latin or informal (“sp. A”, etc. )? Original Author/s - Has name changed from original? (even if Original Date no revising author/date stored) Revising Author/s Revision Date Authority Addendum Examples from our database: • Chlamys (Belchlamys) aktinos (Petterd, 1886) … a scallop • Ophiaster hydroideus (Lohmann) Lohmann, 1913 emend. Manton & Oates, 1983 … a coccolithophorid • Heteroclinus sp. 1 [in Gomon et al, 1994]. . Kuiter's weedfish
Taxon matching … continued • We have standardised on taxon codes, rather than names for data storage and matching … names are stored as an attribute of the code (and can be updated in the future as needed) • Our “CAAB” coding system has evolved over 20+ years earlier generations of codes are maintained on the system • New web-based access facility for retrieving latest name for a code, searching for a taxon, etc. • Same CAAB codes are also used by other marine science/fisheries agencies around Australia • Facility newly implemented in CAAB to hold ITIS codes, for cross-reference to international systems in the future
CAAB services available Applicationlevel requests • Generate scientific name, common name, current code (if applicable) for a given taxon code • Call a CAAB taxon report • Translate an ITIS number to a CAAB code (or vice versa) User searches by scientific name, common name or taxon code (or portion thereof) CAAB user interface • List taxa matching query • Retrieve current sci. name, common name(s), taxon code, taxon report • Initiate a Mar. LIN search, ITIS report, Fish. Base report • List taxa by CAAB category or family
CAAB web interface (current version)
Digression #2: taxonomy keywords • • CAAB uses “major categories” (mostly = phyla) • NASA GCMD keywords would be an OBIS option (maybe with additions to suit OBIS) - c. 50 currently relevant … could also cross-map to GEMET (EC) list (c. 200) Mar. LIN uses Australian “Blue Pages” keywords (c. 100 terms) - independent of CAAB codes (in current implementation) EARTH SCIENCE >> Biosphere >> Zoology >> Amphibians EARTH SCIENCE >> Biosphere >> Zoology >> Anemones EARTH SCIENCE >> Biosphere >> Zoology >> Arachnids EARTH SCIENCE >> Biosphere >> Zoology >> Arthropods EARTH SCIENCE >> Biosphere >> Zoology >> Birds EARTH SCIENCE >> Biosphere >> Zoology >> Centipedes EARTH SCIENCE >> Biosphere >> Zoology >> Corals EARTH SCIENCE >> Biosphere >> Zoology >> Crustaceans EARTH SCIENCE >> Biosphere >> Zoology >> Echinoderms EARTH SCIENCE >> Biosphere >> Zoology >> Fish EARTH SCIENCE >> Biosphere >> Zoology >> Flatworms EARTH SCIENCE >> Biosphere >> Zoology >> Insects EARTH SCIENCE >> Biosphere >> Zoology >> Invertebrates EARTH SCIENCE >> Biosphere >> Zoology >> Jellyfish EARTH SCIENCE >> Biosphere >> Zoology >> Mammals EARTH SCIENCE >> Biosphere >> Zoology >> Millipedes EARTH SCIENCE >> Biosphere >> Zoology >> Mollusks EARTH SCIENCE >> Biosphere >> Zoology >> Reptiles EARTH SCIENCE >> Biosphere >> Zoology >> Roundworms EARTH SCIENCE >> Biosphere >> Zoology >> Segmented Worms EARTH SCIENCE >> Biosphere >> Zoology >> Sponges EARTH SCIENCE >> Biosphere >> Zoology >> Vertebrates EARTH SCIENCE >> Biosphere >> Zoology >> Zooplankton EARTH SCIENCE >> Biosphere >> Microbiota >> Amoebae EARTH SCIENCE >> Biosphere >> Microbiota >> Bacteria EARTH SCIENCE >> Biosphere >> Microbiota >> Blue-green Algae EARTH SCIENCE >> Biosphere >> Microbiota >> Ciliates EARTH SCIENCE >> Biosphere >> Microbiota >> Coccolithophore EARTH SCIENCE >> Biosphere >> Microbiota >> Diatoms EARTH SCIENCE >> Biosphere >> Microbiota >> Flagellates EARTH SCIENCE >> Biosphere >> Microbiota >> Foraminifers EARTH SCIENCE >> Biosphere >> Microbiota >> Microalgae EARTH SCIENCE >> Biosphere >> Microbiota >> Microphyte EARTH SCIENCE >> Biosphere >> Microbiota >> Phytoplankton EARTH SCIENCE >> Biosphere >> Microbiota >> Protist EARTH SCIENCE >> Biosphere >> Microbiota >> Radiolarians EARTH SCIENCE >> Biosphere >> Microbiota >> Zooplankton EARTH SCIENCE >> Biosphere >> Vegetation >> Algae EARTH SCIENCE >> Biosphere >> Vegetation >> Flowering Plants EARTH SCIENCE >> Biosphere >> Vegetation >> Lichens EARTH SCIENCE >> Biosphere >> Vegetation >> Macroalgae EARTH SCIENCE >> Biosphere >> Vegetation >> Macrophyte EARTH SCIENCE >> Biosphere >> Vegetation >> Phytoplankton
Taxonomy keyword cross-mapping (examples) GCMD list Invertebrates Sponges Jellyfish Anemones Corals Flatworms Roundworms Segmented Worms Mollusks Arthropods Insects Arachnids Echinoderms Crustaceans Vertebrates Fish Amphibians Reptiles Birds Mammals GEMET list invertebrate … S 709 poriferan … S 744 coelenterate … S 737 coral … S 738 nematode … S 743 annelid … S 711 ++ mollusc … S 740 cephalopod … S 741 gastropod … S 742 arthropod … S 713 insect … S 719 ++ chelicerate … S 714 ++ echinoderm … S 739 crustacean … S 717 vertebrate … S 649 fish … S 754 amphibian … S 650 ++ reptile … S 691 ++ bird … S 654 ++ mammal … S 664 ++
Mar. LIN - used for data discovery • Mar. LIN - based on an Oracle database containing dataset, project, and survey descriptions, plus on-line links to data and web resources • Holds metadata according to regional (ANZLIC and “Blue Pages”) standards, with additional agency-constructed fields (“extended ANZLIC”) • Web interface for searching and metadata contribution/update, using HTML, Oracle Web Server and custom PL/SQL application • • Produces lists of datasets, or dataset reports, as requested Includes links to pre-formatted data “packets” (now) and to SQu. ID (in future), for access to the data NB: no data visualising capability, apart from “thumbnails” showing data extent
Mar. LIN - behind the scenes • Some 25+ tables, holding the following: – – text-based fields (e. g. title, abstract, contributors, references, etc. ) – – time extent, handled as earliest and latest collection date for items in the dataset keywords, handled as numeric ID’s (including taxonomic keywords) species/species groups, handled as CAAB codes spatial extent, handled as bounding coordinates (max and min. latitude and longitude) originator organisation, present custodian, survey, contact person, etc, handled as numeric ID’s • Initial search set up by keyword/ID type, spatial coordinates, time period (if desired) • Then search/browse by subject categories, keywords, taxon names, contributing project, vessel/voyage identifier, location of data, etc. • Free text search also supported
Mar. LIN search interface
Example Mar. LIN search result - by taxonomic group subject categories | custodian organisations | vessels | voyages | projects | taxonomic groups | species | habitats | parameters | equipment The following choices are presently available for Mar. LIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24, East=122) Click on any hyperlink to see the full listing for that item. Invertebrates 4. . Cephalopods 1. . . Squids 1. . Crustaceans 2. . Prawns & Shrimps 2 Fishes 4. . Breams 1. . Dories 1. . Leatherjackets 1. . Perches 3. . Redfishes 1. . Roughies 1. . Snappers 4. . Whales 1
Example Mar. LIN search result - by species subject categories | custodian organisations | vessels | voyages | projects | taxonomic groups | species | habitats | parameters | equipment The following choices are presently available for Mar. LIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24, East=122) Click on any hyperlink to see the full listing for that item. 23 636004 Nototodarus gouldi. . Gould's squid 1 28 786002 Metanephrops boschmai. . Boschma's scampi 1 28 786005 Metanephrops velutinus. . velvet scampi 1 28 821001 Ibacus alticrenatus. . deepwater bug 1 28 821002 Ibacus pubescens. . [a shovel-nosed/slipper lobster] 1 37 118001 Saurida undosquamis. . brushtooth lizardfish 3 37 118016 Saurida sp. 2 [in Sainsbury et al, 1985]. . grey lizardfish 3 37 255004 Gephyroberyx darwinii. . Darwin's roughy 1 37 258002 Beryx splendens. . alfonsino 1 (etc. )
Example Mar. LIN search result - dataset titles You searched on the following criteria: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf CAAB Species: 37 118001 - Saurida undosquamis There are 3 datasets matching your criteria in Mar. LIN at this time. Click on the dataset title to view the metadata record for any dataset. Southern Surveyor Voyage SS 02/90 - Biological Data Overview Southern Surveyor Voyage SS 04/91 - Biological Data Overview Southern Surveyor Voyage SS 08/95 - Biological Data Overview ------------------------------------
SQu. ID - data repository and visualisation tool • • Oracle relational database containing c. 45 tables (present version) • Client runs as Java applet, connects to Oracle data store by Remote Method Invocation (RMI) and JDBC • Search by spatial coordinates, time period, data “stream” … can subset by survey if desired • • Retrieve atomic-level data for inspection or upload to user’s system Holds point, poly-line, and polygon based, geo-referenced data (also time and depth referenced) Basic plotting routines provided, such as: – – – geographic distribution of data (sampling points, vessel tracks) vertical plots (e. g. temperature, salinity, oxygen vs depth) time-based plots (e. g. water temperature measurement through a voyage) pie charts for catch composition by number or weight length-frequency data, aggregated or by sex of individual • Taxon handling using CAAB codes (system includes legacy data with obsolete codes) • Links to Mar. LIN to display relevant metadata
SQu. ID user interface - version 1. 0
Example SQu. ID search result
SQu. ID atomic level data - example
Time series data in SQu. ID
SQu. ID vs Mar. LIN / CAAB - two different approaches SQu. ID - a data-rich browser environment • • Large files uploaded to the browser to allow interactive functions (zoomable maps, on-demand display of sample details, cursor tracking, browser-generated plots) Disadvantages: more complex applet to load, longer waits for queries to be serviced, performance on user’s machine may be limiting Mar. LIN & CAAB - a minimal browser environment • • • No reliance on JAVA version control, browser plugins etc, no load time at startup All processing takes place on the server (can maximise performance there) - less stringent requirements for users in hardware terms Disadvantage: less real-time interactivity provided (although some workarounds possible) … May look at a hybrid solution for SQu. ID v 2 - prioritise what level of interactivity/data upload is really needed, handle more at server level
some considerations for OBIS. . . • For agency-specific reasons, we have arrived at separate metadata/data systems. OBIS might want to integrate these two aspects more fully • Automated generation/maintenance of metadata might be possible (at least in part) and is certainly desirable • Where would OBIS metadata reside? (centrally or replicated or fully distributed? ) - Australian “ASDD” is an example of a fully distributed system, NASA “GCMD” is a centralised one • Need to decide on taxon handling for OBIS (names or codes), plus standard(s) for higher level searching • OBIS software should aim to tolerate a diversity of agencylevel systems, while encouraging/facilitating “best practice” data management
The End
CAAB web search
07196a021738c622ec9ae3f02965561f.ppt