01828ff3f5620cbba6ece5d3aceb6b0e.ppt
- Количество слайдов: 32
Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003
Where we are now • De facto standard for Internet information exchange • Deployed extensively and internationally – (digital) libraries – Museums – Eprint repositories – Research projects
Protocol Stability • OAI-PMH has been stable since release – No functional changes, just typographic edits – Validation of leadership/participation model • No plans for a 3. 0 release – Core protocol will not be extended – Minor 2. x release could occur (more later) – Additional implementation guidelines (more later)
NSDL and OAI-PMH
The NSDL Context • National STEM (Science, Technology, Engineering, Mathematics, Medicine) Digital Library • Major National Science Foundation project targeted at the application of web and Internet to (STEM) education • $25 M over six years to over 100 projects – – Collections Services Targeted Research Core Integration
NSDL technical guidelines • Aggregation rather than collection – Core integration team will not manage any collections • Spectrum of interoperability – Accommodate diversity of participation models – Open interfaces and standards permitting plug in of array of value-added services • One library many portals – Accommodate multiple quality and selection metrics – Tailor presentation of content and nature of services to audience needs
Spectrum of interoperability Level Agreements Example Federation Strict use of standards (syntax, semantic, and business) AACR, MARC Z 39. 50 Harvesting Digital libraries expose metadata; simple protocol and registry Open Archives metadata harvesting Gathering Digital libraries do not cooperate; services must seek out information Web crawlers and search engines
Translating to initial goals • This is a big task that no one has done before! • Work on the priorities – Focus on one point on spectrum of interoperability • Metadata harvesting • Incorporate NSF funded collections and selected other collections – Leverage existing (or at least emerging) technologies and protocols • OAI, u. Portal, Shibboleth, SDLIP, In. Query – Provide reliable base level services • Search and Discovery, Access Management, User Profiles, Exemplary Portals, Persistence • Plant some seeds for the future – Machine-assisted metadata generation – Automated collection aggregation – Web gathering strategies
Metadata Repository • Central storage of all metadata about all resources in the NSDL – Defines the extent of NSDL collection – Metadata includes collections, items, annotations, etc. • MR main functions – Aggregation – Normalization – redistribution • Ingest of metadata by various means – Harvesting, manual, automatic, cross-walking • Open access to MR contents for service builders via OAI-PMH
Metadata Strategy • Collect and redistribute any native (XML) metadata format • Provide crosswalks to Dublin Core from standard formats – DC-GEM, LTSC (IMS), ADL (SCORM), MARC, FGCD, EAD • Concentrate on collection-level metadata • Use automatic generation to augment item -level metadata
Importing metadata into the MR Cleanup and crosswalks Harvest Collections Database load Staging area Metadata Repository
Exporting metadata from the MR
NSDL and OAI-PMH Two years later • Concepts are good, practice is hard • Issues – Metadata is hard • http: //www. well. com/~doctorow/metacrap. htm – XML is hard – Protocols are hard • Static repositories (more later) – IP is relevant (more later)
Some Essential Metadata Questions • Review original (DC) metadata assumptions – Metadata is essential for good resource discovery – “Joe Sixpack” could create metadata • Account for current realities – 2003 is not 1994 – Google, etc. keeps getting better
Metadata Space
Metadata Triage
Reconsidering the Dublin Core Requirement • Questions about utility of unqualified DC – The conundrum…. • Specification too loose to serve intended interoperability goal • But more complex metadata may be too hard • Limited energy for interoperability – Data providers implement required DC at expense of better metadata • Use of protocol for purposes other than resource discovery
Rethinking record-oriented model Implications for record-oriented harvesting? ?
Topology Evolution Simple Data Provider, Service Provider Topology
Topology Evolution (cont. ) Metadata Aggregator
Topology Evolution (cont. ) OAI-PMH p 2 p network
OAI-P 2 p. MH Issues • Document (metadata) location – Exploit unique identifiers, use efficient key-based location mechanisms (distributed hash tables) • Provenance-based queries – Metadata records may go through refinement and/or translation phases as they move through value-added aggregators. – Exploit provenance guidelines • Network harvesting – Broadcast query (Gnutella) inefficient – Exploit techniques for efficient routing of queries (Ptrees)
OAI-PMH and Intellectual Property • Protocol exists in a context where information providers have concerns about use of intellectual property • OAI-PMH is nominally about metadata, but… – Rich metadata is an intellectual product – The protocol can be used to transmit anything (e. g. content) that can be encoded in XML – Generally metadata leads to content so….
OAI-rights effort • Goal is to investigate and develop means of expressing rights about metadata and resources in the OAI framework. • The result will be an addition to the OAI implementation guidelines that specifies mechanisms for rights expressions within OAI-PMH. – No changes to core protocol
OAI-rights Effort (cont. ) • Extensible, providing a general framework for expressing rights statements within OAI-PMH. – Not an effort to develop a new rights expression language • Use Creative Commons licenses as a motivating and deployable example. • Release of specification by 2 nd quarter ’ 04 • Invited OAI-rights group – Standard OAI development model
Dimensions of OAI-PMH and rights Entity Association • Metadata: concern in NSDL for (re)use of rich metadata • Content: predominant application of the protocol to resource discovery and ultimate access makes this important
Dimensions of OAI-PMH and rights Aggregation Association • OAI-PMH aggregations – Repository – Set – Item • Rights association with an aggregation may provide shortcut (e. g. , the rights for all resources in a repository/set…) • Cost of shortcut is pseudo-statefulness, possibly complex overriding rules
Dimensions of OAI-PMH and rights Binding • Choices – exploit mechanisms in metadata formats e. g. , DCrights – restrict the rights statements to some more specific protocol mechanism – allow some mixture of these methods. • DC-rights problems – Semantics is restricted to rights about resource – Can’t embed XML in dc value – What if DC is not required • Burden on harvesters if rights embedding is not explicit but scattered across several locations
OAI-PMH Static Repositories • Provide a lightweight mechanism for data provider participation • Intended for relatively small and static collections • Two components – Static Repository XML format • Semantically equivalent to Identify and List. Records • Invisible to harvester – Static Repository Gateway • Virtual data provider for static repository data • Unique base. URL for each “contained” static repository
Static Repositories and Static Repository Gateway
Static Repositories Open Issue Relationship to RSS? ? ?
Conclusions • Interoperability and lowest common denominator • Rapid advances automated methods – Moore’s law – Smart algorithms – Benefits of issues of scale • Combining human effort and automated methods – Extracting order from chaos – Learning from order • Move beyond resource discovery


