82c4cff9f8ce307564725d480484a9fd.ppt
- Количество слайдов: 35
The OBIS Index Where we are – as at October 2003 Tony Rees – CSIRO Marine Research, Hobart for: OBIS IC meeting, Washington DC
Advance information Subject of this talk is. . . - New (mostly created within last 8 weeks, some within last 8 days) - Innovative (uses special components available only from CSIRO, plus others custom created for this project) - Powerful (offers a major ramp-up of OBIS functionality, for modest additional complexity) - Exciting (opens the possibility to many new features) - so – worth a look!
OBIS: A Distributed System Strengths of this approach. . . • Data sources remain under custodianship of OBIS contributors (no IP issues, good for community building, owners do their own QA and updates) • Portal concerns itself with technical issues, not a data manager • Portal size, resource requirements don’t increase as OBIS membership and content grow • No problems with version control
OBIS: A Distributed System Weaknesses with this approach. . . • Availability, speed of links, and speed of responses to/from contributors are critical to proper functioning of the system (compounds with increasing number of contributors) – i. e. , system response depends on factors outside OBIS’ control • Portal has no knowledge of OBIS provider content (has to do a live distributed query for every piece of information) – also, a user may search repeatedly on taxa for which no data are held in the system • No opportunity to provide value-adding, such as search by taxonomic group (as contributors do not provide this information in any enforced way) • No opportunities for advanced search functions e. g. “near match” (would be difficult to do in real-time distributed query)
One example: The “Zero Records” Problem. . . • Incorrect spelling? • No data available via OBIS for this taxon? • Data exist, but are in one of the sources which are off-line? NB, these responses are actually the slowest to generate, as well!
A solution - the OBIS Index = a reduced subset of OBIS data, stored in a standardized format, in a convenient location • Single record per species, with relevant summary information, i. e. , number of records, date range, depth range, plus “c-squares” spatial index (sufficient distribution information for “quick maps” and spatial searches) • Master genus list, with cross-references to a simple taxonomic hierarchy • Degree of QA, i. e. masking informal/unresolved taxa, and freshwater/ terrestrial species
C-squares spatial indexing. . . - Doesn’t store the point data, just a list of the squares in which data are present, for each taxon - Efficient for data reduction - Easy to store and query Choice of square size is a design decision (this index uses 0. 5 x 0. 5 deg. squares, =~ 50 km)
Index benefits. . . - Initial taxon searches and mapping take place by querying the index, not the remote data sources: • rapid response time • always complete (irrespective of whether any data sources are off line) • can return lists of multiple taxa as desired (no longer need to search for taxa sequentially) • limits user selection to a picklist of species represented in the system - no more “zero records” responses • correct user spelling not required (enter part of a name, or browse a category, or ask for “near matches”) • can return information for user’s desired taxonomic group(s) only - Use as “pre-filter” to answer many queries directly from the index, without needing to do a distributed search until actual data are required – i. e. , a 2 -stage process.
Index Development So Far. . . • Nov. 2002 – July 2003: initial concept development and refinement (Tony, Rainer, Phoebe) – incl. endorsement by OBIS IC, Mar. ’ 03 • Aug. – Sept. 2003: – design/build initial prototype Index, plus partially populate with summary data (Tony) – construct master genus list and taxonomic hierarchy (=“OBIS categories”), and tag most genera with relevant category (Tony) • Sept. 2003: circulate URL and background information to OBIS IC, TWG for comment • Sept. – Oct. 2003: – refine prototype index (Tony) – construct “crawler” and finish first-pass population of the index (Tony, Pamela) – tag remaining genera with taxonomic attribution (Tony) – build spatial search module (Tony)
Reality check – what do users need. . . Key OBIS functions: • Show/get distribution data for a desired species • Show/get species information for an area (preceded by. . . ) • List species for which data are available! (e. g. by organism type) • Show areas for which data are available! (e. g. by organism type)
Current (prototype) OBIS Index Search Interface - as at October 2003 (www. marine. csiro. au/datacentre/obis/quicksearch 1. htm)
Current OBIS Categories (Oct. 2003) - page 1 of 2 (approx. 140 in total)
Current OBIS Categories (Oct. 2003) - page 2 of 2
Example Possible Index Searches. . . “Generate List. . . ” function: previously offered? • All fishes beginning with “B. . . ”, or “Bathy. . . ” N • All whales, or decapods, or bryozoans N • All species of the genus “Raja” • All “near matches” to “Coelorhynchus” Y/ N N “Spatial Search. . . ” function: • All fishes, or hexacorals, or “any invertebrates”, or any OBIS taxa, in any 10 x 10 degree square N • All species of “Raja” in a given 10 x 10 degree square Y/N • Global distribution map for any OBIS taxonomic category can use to identify data gaps) N (e. g. (Note, could also offer searching by 5 x 5 degree square or smaller, but data are probably too patchy for this to be useful at present time)
Live OBIS Index Search Interface. . .
Costs associated with the Index. . . • Design, build costs (i. e. , person hours) – mostly done - although will be refined further (CSIRO contribution) • Hosting costs – CSIRO is happy to host, at least for present; access via web can be seamless, once integrated into the portal • Refreshing/ content maintenance costs – some person time needed, in addition to automated “crawler” – upload taxon lists from new data contributors, check for bad data, flag new genera with relevant taxonomic group as needed – crawling ideally should be repeated frequently, to keep index current • Continued development and integration into OBIS Portal – ? ?
Recap – what’s new. . . • Speed, consistency, reliability – includes no more 0 records, or “try later” messages (at least on “Stage 1” searches) • Many new functions, including – User need only enter part of a name – Can automatically correct for spelling errors – Report on multiple taxa simultaneously (tens to thousands) – Spatial searches from clickable maps – Introduction of “OBIS categories” – OBIS content available at a glance (summary statistics, spatial coverage by category) – Screening of irrelevant, and/or bad data • Expansion of ease-of-use, from expert to increasingly nonexpert users, without compromising integrity of the system.
Future tasks. . . • Include common names in search results, search interface • Auto-resolution of synonyms, variants. . . • Quick Images? Quick Species Pages? • How to embed seamlessly into Portal • Further development of CSIRO mapper, and/or c-squares enabling for other mappers? (KGS, SEAMAP. . . ) • Think about replication, system load issues • How to manage development process from here • Any overlap with GBIF activities? (OBIS is “marine component of GBIF”; GBIF has indicated interest in indexing) • Other ? ?
Summary. . . - Interesting challenge thus far! - Reasonably complex package (database, software, content building and maintenance) - Personal opinion – major step forward in OBIS functionality - Close to deployment in “production” version - How to integrate with OBIS work plan?
A peek behind the scenes – the master genus table (portion)
Search for “all whales” via OBIS Index Search Interface. . .
List for “all whales”. . . - result in <4 secs.
“Quick map” for Balaenoptera physalus (38, 000+ OBIS records) - in < 6 secs.
Full OBIS search on any species is 1 click away. . .
Also: maps are all now “active maps” – click on/near any red square initiates “live” OBIS spatial search for the relevant base data.
Another example: “near match. . . ” for a genus name – where user is unsure of correct spelling. . .
Coelorhynchus X
Search result in 2 -3 seconds. . .
Example “Spatial Search. . . ” for a category. . .
Pre-generated map presented, showing all records for category. . .
Search result in <6 secs. .
Another feature: can use stored record count information to generate OBIS summary statistics per category, e. g. :
OBIS Statistics – from master Index table
82c4cff9f8ce307564725d480484a9fd.ppt