Скачать презентацию Open Archives Initiative Where we are Where we Скачать презентацию Open Archives Initiative Where we are Where we

01828ff3f5620cbba6ece5d3aceb6b0e.ppt

  • Количество слайдов: 32

Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003

Where we are now • De facto standard for Internet information exchange • Deployed Where we are now • De facto standard for Internet information exchange • Deployed extensively and internationally – (digital) libraries – Museums – Eprint repositories – Research projects

Protocol Stability • OAI-PMH has been stable since release – No functional changes, just Protocol Stability • OAI-PMH has been stable since release – No functional changes, just typographic edits – Validation of leadership/participation model • No plans for a 3. 0 release – Core protocol will not be extended – Minor 2. x release could occur (more later) – Additional implementation guidelines (more later)

NSDL and OAI-PMH NSDL and OAI-PMH

The NSDL Context • National STEM (Science, Technology, Engineering, Mathematics, Medicine) Digital Library • The NSDL Context • National STEM (Science, Technology, Engineering, Mathematics, Medicine) Digital Library • Major National Science Foundation project targeted at the application of web and Internet to (STEM) education • $25 M over six years to over 100 projects – – Collections Services Targeted Research Core Integration

NSDL technical guidelines • Aggregation rather than collection – Core integration team will not NSDL technical guidelines • Aggregation rather than collection – Core integration team will not manage any collections • Spectrum of interoperability – Accommodate diversity of participation models – Open interfaces and standards permitting plug in of array of value-added services • One library many portals – Accommodate multiple quality and selection metrics – Tailor presentation of content and nature of services to audience needs

Spectrum of interoperability Level Agreements Example Federation Strict use of standards (syntax, semantic, and Spectrum of interoperability Level Agreements Example Federation Strict use of standards (syntax, semantic, and business) AACR, MARC Z 39. 50 Harvesting Digital libraries expose metadata; simple protocol and registry Open Archives metadata harvesting Gathering Digital libraries do not cooperate; services must seek out information Web crawlers and search engines

Translating to initial goals • This is a big task that no one has Translating to initial goals • This is a big task that no one has done before! • Work on the priorities – Focus on one point on spectrum of interoperability • Metadata harvesting • Incorporate NSF funded collections and selected other collections – Leverage existing (or at least emerging) technologies and protocols • OAI, u. Portal, Shibboleth, SDLIP, In. Query – Provide reliable base level services • Search and Discovery, Access Management, User Profiles, Exemplary Portals, Persistence • Plant some seeds for the future – Machine-assisted metadata generation – Automated collection aggregation – Web gathering strategies

Metadata Repository • Central storage of all metadata about all resources in the NSDL Metadata Repository • Central storage of all metadata about all resources in the NSDL – Defines the extent of NSDL collection – Metadata includes collections, items, annotations, etc. • MR main functions – Aggregation – Normalization – redistribution • Ingest of metadata by various means – Harvesting, manual, automatic, cross-walking • Open access to MR contents for service builders via OAI-PMH

Metadata Strategy • Collect and redistribute any native (XML) metadata format • Provide crosswalks Metadata Strategy • Collect and redistribute any native (XML) metadata format • Provide crosswalks to Dublin Core from standard formats – DC-GEM, LTSC (IMS), ADL (SCORM), MARC, FGCD, EAD • Concentrate on collection-level metadata • Use automatic generation to augment item -level metadata

Importing metadata into the MR Cleanup and crosswalks Harvest Collections Database load Staging area Importing metadata into the MR Cleanup and crosswalks Harvest Collections Database load Staging area Metadata Repository

Exporting metadata from the MR Exporting metadata from the MR

NSDL and OAI-PMH Two years later • Concepts are good, practice is hard • NSDL and OAI-PMH Two years later • Concepts are good, practice is hard • Issues – Metadata is hard • http: //www. well. com/~doctorow/metacrap. htm – XML is hard – Protocols are hard • Static repositories (more later) – IP is relevant (more later)

Some Essential Metadata Questions • Review original (DC) metadata assumptions – Metadata is essential Some Essential Metadata Questions • Review original (DC) metadata assumptions – Metadata is essential for good resource discovery – “Joe Sixpack” could create metadata • Account for current realities – 2003 is not 1994 – Google, etc. keeps getting better

Metadata Space Metadata Space

Metadata Triage Metadata Triage

Reconsidering the Dublin Core Requirement • Questions about utility of unqualified DC – The Reconsidering the Dublin Core Requirement • Questions about utility of unqualified DC – The conundrum…. • Specification too loose to serve intended interoperability goal • But more complex metadata may be too hard • Limited energy for interoperability – Data providers implement required DC at expense of better metadata • Use of protocol for purposes other than resource discovery

Rethinking record-oriented model Implications for record-oriented harvesting? ? Rethinking record-oriented model Implications for record-oriented harvesting? ?

Topology Evolution Simple Data Provider, Service Provider Topology Topology Evolution Simple Data Provider, Service Provider Topology

Topology Evolution (cont. ) Metadata Aggregator Topology Evolution (cont. ) Metadata Aggregator

Topology Evolution (cont. ) OAI-PMH p 2 p network Topology Evolution (cont. ) OAI-PMH p 2 p network

OAI-P 2 p. MH Issues • Document (metadata) location – Exploit unique identifiers, use OAI-P 2 p. MH Issues • Document (metadata) location – Exploit unique identifiers, use efficient key-based location mechanisms (distributed hash tables) • Provenance-based queries – Metadata records may go through refinement and/or translation phases as they move through value-added aggregators. – Exploit provenance guidelines • Network harvesting – Broadcast query (Gnutella) inefficient – Exploit techniques for efficient routing of queries (Ptrees)

OAI-PMH and Intellectual Property • Protocol exists in a context where information providers have OAI-PMH and Intellectual Property • Protocol exists in a context where information providers have concerns about use of intellectual property • OAI-PMH is nominally about metadata, but… – Rich metadata is an intellectual product – The protocol can be used to transmit anything (e. g. content) that can be encoded in XML – Generally metadata leads to content so….

OAI-rights effort • Goal is to investigate and develop means of expressing rights about OAI-rights effort • Goal is to investigate and develop means of expressing rights about metadata and resources in the OAI framework. • The result will be an addition to the OAI implementation guidelines that specifies mechanisms for rights expressions within OAI-PMH. – No changes to core protocol

OAI-rights Effort (cont. ) • Extensible, providing a general framework for expressing rights statements OAI-rights Effort (cont. ) • Extensible, providing a general framework for expressing rights statements within OAI-PMH. – Not an effort to develop a new rights expression language • Use Creative Commons licenses as a motivating and deployable example. • Release of specification by 2 nd quarter ’ 04 • Invited OAI-rights group – Standard OAI development model

Dimensions of OAI-PMH and rights Entity Association • Metadata: concern in NSDL for (re)use Dimensions of OAI-PMH and rights Entity Association • Metadata: concern in NSDL for (re)use of rich metadata • Content: predominant application of the protocol to resource discovery and ultimate access makes this important

Dimensions of OAI-PMH and rights Aggregation Association • OAI-PMH aggregations – Repository – Set Dimensions of OAI-PMH and rights Aggregation Association • OAI-PMH aggregations – Repository – Set – Item • Rights association with an aggregation may provide shortcut (e. g. , the rights for all resources in a repository/set…) • Cost of shortcut is pseudo-statefulness, possibly complex overriding rules

Dimensions of OAI-PMH and rights Binding • Choices – exploit mechanisms in metadata formats Dimensions of OAI-PMH and rights Binding • Choices – exploit mechanisms in metadata formats e. g. , DCrights – restrict the rights statements to some more specific protocol mechanism – allow some mixture of these methods. • DC-rights problems – Semantics is restricted to rights about resource – Can’t embed XML in dc value – What if DC is not required • Burden on harvesters if rights embedding is not explicit but scattered across several locations

OAI-PMH Static Repositories • Provide a lightweight mechanism for data provider participation • Intended OAI-PMH Static Repositories • Provide a lightweight mechanism for data provider participation • Intended for relatively small and static collections • Two components – Static Repository XML format • Semantically equivalent to Identify and List. Records • Invisible to harvester – Static Repository Gateway • Virtual data provider for static repository data • Unique base. URL for each “contained” static repository

Static Repositories and Static Repository Gateway Static Repositories and Static Repository Gateway

Static Repositories Open Issue Relationship to RSS? ? ? Static Repositories Open Issue Relationship to RSS? ? ?

Conclusions • Interoperability and lowest common denominator • Rapid advances automated methods – Moore’s Conclusions • Interoperability and lowest common denominator • Rapid advances automated methods – Moore’s law – Smart algorithms – Benefits of issues of scale • Combining human effort and automated methods – Extracting order from chaos – Learning from order • Move beyond resource discovery