2cda46b9fee49f1a67b392cf71000898.ppt
- Количество слайдов: 159
IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas Baker Carl Lagoze GMD Cornell Univ.
Introductions EVA 2000 • Thomas Baker – GMD Library, Bonn, Germany – Dublin Core Executive Committee – EU DELOS Network of Excellence • Carl Lagoze – Digital Library Research Group, Faculty of Computing and Information, Cornell University, Ithaca, NY, USA – Dublin Core Advisory Committee – NSF Digital Library Initiative
EVA 2000 Workshop Roadmap • Introduction to Metadata (30 min. ) • Dublin Core Metadata Initiative (60 min. ) Break • Simplicity and Complexity (45 min. ) • Metadata Infrastructure (45 min. ) Lunch • Deploying and Using Metadata (90 min. ) • Metadata Landscape (30 min. )
EVA 2000 Moscow Introduction to Metadata
EVA 2000 Haven’t we done metadata already?
EVA 2000 What’s wrong with this model? • Expensive – Complex (even for its original goal? ) – Professional intervention (assumes single community of expertise) • Monolithic – One size fits all approach – Reflects its centralized system origins • Bias towards physical artifacts – Fixed resources – Incomplete handling of resource evolution and other resource relationships
Internet Commons includes Multiple Communities EVA 2000 Home Pages Scientific Data Commerce Geo Library Internet Commons Museums Whatever. . .
EVA 2000 Web Challenge to Traditional Cataloging • Scale • Permanence • Authenticity • Organizational Context • Variety
EVA 2000 • • • State of the Web as an Information System Search systems are motivated by advertising Index coverage is unpredictable and limited (1/3) Too much recall, too little precision Index spam abounds Resources (and their names) are volatile What about versions, editions, back issues? Archiving is presently unsolved Authority and quality of service are spotty Managing Intellectual Property Rights is hard
EVA 2000 Metadata: Part of a Solution • Structured data about data – helps to impose order on chaos – enables automated discovery/manipulation • Variety across various dimension: – specialization – decentralization – democratization
EVA 2000 Metadata Takes Many Forms
EVA 2000 Metadata Challenges • Accommodate multiple varieties of metadata • Tension: functionality and simplicity • Tension: extensibility and interoperability • Human and machine creation and use • Community-specific functionality, creation, administration, access
EVA 2000 Warwick Framework: Containing Chaos • Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2) • Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata • Provide context for metadata efforts (including Dublin Core) – avoids the “black-hole” of comprehensive element sets – focuses interoperability issues at package level
Modularization Allows Distributed Management EVA 2000 • Communities of expertise (not software vendors) are responsible for: – – – Semantics Registration Administration Access management Authority of data Sharing and Distribution
EVA 2000 Interoperability requires conventions about: • Semantics – The meaning of the elements • Structure – human-readable – machine-parseable • Syntax – grammars to convey semantics and structure
EVA 2000 Moscow Dublin Core Metadata Initiative
EVA 2000 History of the Dublin Core • 1994: "Do we have a simple set of tags for ordinary people to describe their Web pages? " • 1995: The Dublin Core: 13 elements, later 15 • 1996: The Dublin Core is but one of many vocabularies needed ("Warwick Framework") • 1997: "WF needs formal expression in a Resource Description Framework (RDF)" • 2000: Dublin Core Metadata Initiative recommends qualifiers, broadens its organizational scope beyond the Core
EVA 2000 A pidgin for digital tourists • Metadata is language. • Dublin Core is a small and simple language -- a pidgin -- for finding resources across domains. • Speakers of different languages naturally "pidginize" to communicate – E. g. , tourists using simple phrases to order beer ("zwei Bier bitte" "dva pivo" "biru o san bai". . . ) • We are all "tourists" on the global Internet.
EVA 2000 A grammar of Dublin Core • http: //www. dlib. org/dlib/october 00/baker/10 baker. html • By design not as subtle as mother tongues, but easy to learn and extremely useful in practice • Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives) • Simple grammars: sentences (statements) follow a simple fixed pattern. . .
EVA 2000 Example Dublin Core statements • Resource has Title 'Grammar of Dublin Core'. • Resource has Creator 'Tom Baker'. • Resource has Subject 'Metadata'. • Resource has Relation http: //foo. org/file. htm.
EVA 2000 implied verb implied subject Resource has one of 15 properties DC: Creator DC: Title DC: Subject DC: Date. . . property [o pti property value (an appropriate literal) X on on al al qu qu ali a fie lifie r] r] qualifiers (adjectives)
EVA 2000 The fifteen special nouns (properties)
EVA 2000 Resource has Subject "Languages -- Grammar" LC SH Resource has Date IS O 8 Rev 60 ise d 1 "2000 -06 -13"
EVA 2000 Dumb-Down Principle for qualifiers • The fifteen elements should be usable and understandable with or without the qualifiers • Like saying that nouns can stand on their own without adjectives • If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!
EVA 2000 To test whether qualifiers are "good", cover them with your hand ask: -- Does the statement still make sense? -- Is it still correct? Resource has Subject "Languages -- Grammar" LC SH Resource has Date IS O 8 Rev 60 ise d 1 "2000 -06 -13"
EVA 2000 Element Refinements • Make the meaning of an element narrower or more specific. – a Date Created versus a Date Modified – an Is. Replaced. By Relation versus a Replaces Relation • If your software does not understand the qualifier, you can safely ignore it.
Value Encoding Schemes EVA 2000 • Says that the value is – a term from a controlled vocabulary (e. g. , Library of Congress Subject Headings) – a string formatted in a standard way (e. g. , "2000 -05 -03" means May 3, not March 5) • Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.
EVA 2000 Peer review of proposals for new terms • DCMI Usage Committee reviews proposals for new qualifiers (and perhaps elements) • Evaluates proposals in light of grammatical principles (are the qualifiers ignorable? ) • Tiered model of approval status (tentative): proposed, conforming, recommended, obsolete • First qualifiers "recommended" in July 2000 • http: //purl. org/DC/documents/rec/dcmes-qualifiers-20000711. htm
EVA 2000 Open questions in Dublin Core • What are "appropriate values" for the fifteen properties? How can they be used for cross-domain searching? • How can DCMI control the evolution of Dublin Core as it is adapted in practice? • How can an application use DC as a pidgin while describing resources with more complex metadata? • Can we keep the Core simple?
EVA 2000 Search buckets versus description • Think of DC elements as fuzzy search buckets – Different types of data appropriate for different buckets: URLs, date strings, word strings, names – Separate books about Sigmund Freud versus books by Sigmund Freud into different buckets • Search bucket: for discovering resources • But general, fuzzy categories may not be sufficient for describing resources – After searching, display more detailed descriptions on screen
EVA 2000 DCMI broadens its mission (Oct 2000) • The mission of the DCMI is to make it easier to find resources using the Internet through the following activities: – Developing metadata standards for discovery across domains (example: the Dublin Core) – Defining frameworks for the interoperation of metadata sets – Facilitating the development of community or disciplinary specific metadata sets that are consistent with items 1 and 2
EVA 2000 A context for the Core • If "the Dublin Core" is the core of DCMI, what is the surrounding context? • If "the Dublin Core" is the simple pidgin, what is the broader landscape of metadata language? • How do pidgins relate to more complex models or "application profiles"? • Do we need pidgins for describing other things, such as "people" and "events"?
EVA 2000 Using DC with other vocabularies • Specialized application profiles [government information, education, mathematics] may need to: – Use general-purpose Dublin Core elements – Use elements from another, more domain-specific standard – Narrow standard definitions of DC elements for specific local uses – Invent local elements outside the scope of existing standards
EVA 2000 Example: adapting DC: Title to local uses • As defined in the official Dublin Core "namespace": – "Title: A name given to the resource" • As defined in a UK "application profile": – "Title: A name given to the collection" • Definition is narrower
Namespaces in translation EVA 2000 • Dublin Core has been translated into 26 languages – machine-readable tokens are shared by all – human-readable labels are defined in different languages – translations are distributed, maintained in many countries
EVA 2000 One token - labels in many languages “Verfasser” rdfs: label dc: creator [Server in Germany] rdfs: label “Creator” [DCMI Server] rdfs: label [Server in “Pencipta” Jakarta]
EVA 2000 RDF -- a more powerful sentence pattern • Dublin Core statements: – Resource has Creator "Tom Baker". – Resource has Identifier http: //foo. org/bar. html. • Resource Description Framework "triples" a more powerful way to say the same thing: – http: //foo. org/bar. htm has Creator "Tom Baker".
EVA 2000 DCMI Re-organization • Expanded mission – Core metadata elements for Agents (or Events)? – Frameworks for integrating multiple standards • Re-organization model – Membership organization like W 3 C or Unicode Consortium? – Retain open consensus model – International perspective – Better training, documentation, outreach
EVA 2000 DCMI Open Metadata Registry • Managing vocabularies defined by the DCMI – Languages – Versioning – Controlled vocabularies • Foundation for modular, incremental integration and evolution • Collaboration with European SCHEMAS Project and ULIS in Tsukuba, Japan • http: //wip. dublincore. org/registry/
EVA 2000 Official recognition of the Dublin Core • CEN Workshop Agreement – endorse Dublin Core elements as CWA 13874 – provide usage guidelines for European industry • NISO Z 39. 85 – National Information Standards Organization, an ANSI affiliate – Balloting concluded in August 2000
DCMI Activities EVA 2000 • Standards development and maintenance • Metadata registry • Technical working groups and periodic workshops • Tutorial materials and user guides • Education and training • Access to software • Liaisons with other standards or user communities
EVA 2000 DC-9 Workshop in Tokyo, 2001 • DC-8 Workshop was a National Library of Canada (Ottawa) – emphasis on application profiles, longer-term organizational mission, and domain-specific adaptations of Dublin Core • DC-9 in Tokyo: well-defined tracks – implementation reports and research papers – ongoing technical working group meetings – general introduction and tutorials for nonexperts
EVA 2000 Moscow Simplicity and Complexity
EVA 2000 • • Warwick Framework Container/Package approach to metadata Rejection of universal ontology Recognition of individual community needs Provide scope for metadata efforts
EVA 2000 Warwick Framework Design Containers for aggregating Packages of typed metadata sets Package Dublin Core Package MARC Metadata Package Indirect Reference URI Package Terms and Conditions
EVA 2000 Warwick Framework Implementation and Research • Packaging, linking, storing, and transmitting component/package framework • Semantic interactions and interoperability among multiple metadata packages/vocabularies
EVA 2000 Interoperability among Metadata Vocabularies Dublin Core MARC abc core classes IMS INDECS
EVA 2000 Harmony Project • Project Investigators – Dan Brickley - ILRT, Bristol (U. K. ) – Jane Hunter - DSTC, Brisbane (Australia) – Carl Lagoze - Computer Science, Cornell (U. S. ) • More Information – http: //www. ilrt. bris. ac. uk/discovery/harmony/
Attribute/Value approaches to metadata… EVA 2000 The playwright of Hamlet was Shakespeare metadata noun creator Shakespeare rig ht has a literal metadata adjective yw Hamlet implied verb Pla subject ht wrig ator. play dc: cre R 1 dc: title “Shakespeare” “Hamlet”
EVA 2000 …run into problems for richer descriptions… The playwright of Hamlet was Shakespeare, who was born in Stratford creator ce has a Shakespeare Stratford bir th pla Hamlet ht wrig ator. play dc: cre R 1 dc: creator. birthplace “Shakespeare” “Stratford”
EVA 2000 …because of their failure to model entity distinctions R 1 creator e nam R 2 title “Hamlet” birth “Shakespeare” place “Stratford”
EVA 2000 Applying a Model-Centric Approach • Formally define common entities and relationships underlying multiple metadata vocabularies • Describe them (and their interrelationships) in a simple logical model • Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.
EVA 2000 Applications of the ABC Model • Guidance for communities developing vocabularies • Foundation for understanding existing vocabularies • Basis for mappings among vocabularies using formalisms such as RDF
EVA 2000 Harmony/ABC Workshop • January 27 -28 2000 CNI Washington • Representatives from – Dublin Core, INDECS, MPEG-7, IFLA – Archives, Museums, Libraries, Audiovisual • Result: Importance of processes, events, and states in understanding and describing resources
EVA 2000 Conceptual Basis: Evolution of Content over Time IFLA Entity Model From Bearman, et. al. , D-Lib Magazine, January 1999.
EVA 2000 Events help metadata relationships? • Recognizing inherent lifecycle aspects of digital content - transformation of “input” resources to “output” resources and of their descriptions. (e. g. , IFLA model) • Modeling implied events as first-class objects provides attachment points for common entities – e. g. , agents, contexts (times & places), roles. • Clarifying attachment points facilitates mapping across common entities in different vocabularies.
EVA 2000 Content, Events, & Descriptions
EVA 2000 ABC Event Model
EVA 2000 A Simple Example: Live At Lincoln Performance • Performance at The Lincoln Center for the Performing Arts • On April 7, 1998 at 8 pm Eastern time • Orchestra is New York Philharmonic • Musical score – “Concerto for Violin” • 130 minute MP 3 audio recording • Rights held by Lincoln Center
EVA 2000 Example in ABC Model
EVA 2000 Derivation of Multiple Views Dublin Core in XML/RDF ABC Description in XML ID 3 tags embedded in MP 3 MPEG-7 description in DDL CIDOC CRM Model
Step 1 – Structural Mapping EVA 2000 Event-aware model Resource-centric model
EVA 2000 Structural Mapping Rules Event attributes transferred to output: • Context/Date, /Time, /Place -> Date. Performance, Time. Performance, Place. Performance • Act/Role -> Agent. Role e. g. Orchestra • Event Type -> Relation between input & ouput e. g. Performance ->Relation. is. Performance. Of • Output Description generated from event Type and input Title e. g. “Performance of Concerto for Violin”
Step 2 – Semantic Mapping EVA 2000
EVA 2000 XSLT for Transformations • Works well for structural and syntactic mapping between metadata descriptions • Semantic mappings need to be hardcoded • Unsuitable for loosely constrained or variable input
EVA 2000 A More General Solution • Flexible semantic mappings require additional knowledge: – Metadata Term Ontology – Meta. Net • Methods for using that context knowledge for mapping – Some combination of procedural language (Java) and XSLT – Investigating more general mapping rule language (analogies to compiler technology)
Planned Experimental Context EVA 2000 • CIMI Experiments – – Dublin Core for basic resource descriptions Richer descriptions derived from ABC model Mapping among descriptions Understanding relationship between ABC and CIDOC CRM • Connecting with Recordkeeping Metadata Issue - SPIRT Project
EVA 2000 Moscow Metadata Infrastructure
EVA 2000 Metadata is language • Metadata schemas are languages for making statements about resources: – Book has Title "Gone with the Wind". – Web page has Publisher "Springer Verlag". • Vocabulary terms (elements) are defined in standards like Dublin Core • Metadata grammars constrain the statements and data models one can form
EVA 2000 But languages evolve with use • Inevitably, languages resist stability • People stretch official definitions • Implementers misunderstand the intended meaning or use of elements • Implementors coin local terms and extensions • If the application does not fit the standard, the standard is often "customized" to fit the application
EVA 2000 Metadata languages are "multilingual" • Metadata is not a spoken language • The words of metadata -- "elements" -- are symbols that stand for concepts expressible in multiple natural languages • Standards may have dozens of translations • Are concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?
EVA 2000 What metadata languages lack • Comprehensive dictionaries – Where can one get an overview of vocabulary terms used in metadata languages? • A publication context for implementers – Where can you see how they are using metadata? • Standard grammars – How do we understand the principles of metadata?
EVA 2000 Can we manage this evolution? • How can we (scalably) monitor the usage of a language that is: – Never spoken? – Rarely published in a way that can be harvested? • How can dictionary editors help a metadata language evolve and grow in response to usage? • How can this evolution occur across (human) languages?
EVA 2000 RDF Schemas (RDFS) -- W 3 C standard • A dictionary format for metadata terms: – Simple XML format for terms and definitions • Example: "Title" (Dublin Core) – Human-readable label and definition: • Title: A name given to the resource. – Unique, machine-readable identifiers • dc: title • Support for cross-references – between terms in related standards – between local adaptations and related standards
EVA 2000 Print world versus the Web • Traditional print world – Standards are currently defined and published as paper documents or Web pages in HTML – Metadata implementors rarely publish their local extensions and adaptations • RDF Schemas (RDFS) – Web-based publication format – Explicit cross references from implementation schemas and the standards on which they are based
EOR -- an RDF Schema Browser EVA 2000 • Harvests RDF Schemas – – Schemas distributed on multiple Web servers Creates huge database of schemas for searching Web interface functions as a "metadata browser" Click on cross-references between linked terms • Downloadable as open source software – http: //eor. dublincore. org/index. html – Authors: Eric Miller (OCLC, RDF Working Group, DCMI) and Tod Matola
EVA 2000 Hyperlink Metadata Terms over the Web • Index of metadata terms searchable as one huge database • Click on cross-references to follow term-toterm links between vocabularies • Point-to-point, like the Web itself – In 1992, Gopher located the right file within directory trees (but not points within the file) – HTML enabled point-to-point links between documents
"Editor" -- a MARC relator -- refines "Contributor" EVA 2000
Follow the link to MARC Relator Terms EVA 2000
. . . the source of which looks like this: EVA 2000
. . . or to Contributor EVA 2000 [here, in English, French, German]
Or view the schema of My. RDF itself. . . EVA 2000
. . . itself an RDF schema like the others EVA 2000
EVA 2000 Registries can function as dictionaries • Historically, dictionaries of English, French, etc: recorded variants, prescribed forms, and helped standardize (national) languages • Metadata dictionaries can help metadata vocabularies evolve more like other human languages – Not just top-down, like traditional standards – Also bottom-up, in response to usage
EVA 2000 Dictionaries prescribe and describe • Prescribe definitions and recommend usage • Describe how terms are actually used – Monitor usage through collecting examples • Editors and usage boards must strike a balance between prescription and description.
EVA 2000 SCHEMAS Project -- a Thin Registry • http: //www. schemas-forum. org, an EU Project • Pointers to resources elsewhere (a "thin" registry or portal) • Short descriptions of metadata standards activities • Critical commentaries by domain experts • Promote the publication of schemas (in RDF) • Goal: help implementors discover how others (e. g. EU Projects) are using standards in order to harmonize usage
DCMI -- a Thick Registry EVA 2000 • A thick registry: stores official metadata element definitions in a central database or repository • Managing a namespace (as a standards agency): publish qualifiers as available, with version control – Managing translations of the standard in multiple languages • Eventually: – User guide interface – Support for standardisation processes (peer review) – Downloadable input to software tools for generating, editing, validating DC metadata
Dictionaries as a tool for harmonization EVA 2000 • Knowledge of how other projects are using standards will avoid "reinventing the wheel" • To help information providers harmonize their schemas for improved access within domains: – – – Between countries (Nordic Metadata Project) Preprint repositories (Open Archives Initiative) Subject gateways (Renardus) Theses and dissertations (NDLTD) Mathematics and physics (Math. Net, Phys. Net)
EVA 2000 A global registry infrastructure? • Analogously to HTML for text, RDF Schema format suggests a scalable ecology of metadata vocabularies on the Web • Sharing machine-readable elements translated into many languages suggests a global (multilingual) metadata language for digital libraries • Can a well-managed registry infrastructure allow this language to evolve -- with flexible innovation in usage alongside more stable standards?
EVA 2000 The scope of registries • Anything "semantic" (terms and definitions) is potentially an RDF schema: – controlled vocabularies – namespaces, application profiles, annotations – the "schema" of the registry itself • Application constraints can be modelled in XML Schemas – "title is mandatory"; "date must be after 1980" • Will XML and RDF Schemas merge?
EVA 2000 Moscow Deploying and Using Metadata
EVA 2000 Syntax Alternatives: HTML • Advantages: – Simple Mechanism – META tags embedded in content – Widely deployed tools and knowledge • Disadvantages – Limited structural richness (won’t support hierarchical, tree-structured data or entity distinctions). – Limited formalisms (parsing and schema definition)
EVA 2000 Dublin Core in HTML
Syntax Alternatives: XML EVA 2000 • The standard for networked text and data • Wide-spread tool support – – – Parsers (DOM and SAX) Extensibility (namespaces) Type definition (XML Schema) Transformation and Rendering (XSLT) Rich linking semantics (XLINK)
EVA 2000 XML Schema • Rich XML-based language for expressing type semantics • Replaces arcane and limited DTD (origin in SGML) • Facilities – Data typing (both complex and primitive) – Constraints – Defaults
EVA 2000 Syntax Alternatives: RDF • RDF (Resource Description Format) • The instantiation of the Warwick Framework on the Web • Provides enabling technology for richlystructured metadata • Rich data model supporting notions of distinct entities and properties • Syntax expressed in XML
EVA 2000 RDF Components • Formal data model • Syntax for interchange of data • Schema Type system (schema model)
RDF Data Model EVA 2000 • Directed labeled graphs • Model elements – – – Resource Property Value Statement Containers
EVA 2000 RDF Model Primitives Resource Property Value Resource Statement
EVA 2000 RDF Syntax Example dc: Title URI: R “CIMI Presentation” dc: Creator “Eric Miller”
RDF Model Example #2 EVA 2000 dc: Title URI: R “CIMI Presentation” oa: Creator “Eric Miller” URI: ERIC bib: Aff “OCLC” URI: OCLC bib: Name “Eric Miller” bib: Email “emiller@ oclc. org”
RDF Syntax Example #2 EVA 2000
EVA 2000 RDF Containers • Permit the aggregation of several values for a property • Express multiple aggregation semantics – unordered – sequential or priority order – alternative
EVA 2000 RDF Schemas • Declaration of vocabularies – properties defined by a particular community – characteristics of properties and/or constraints on corresponding values • Schema Type System - Basic Types – Property, Class, Sub. Class. Of, Domain, Range – Minimal (but extensible) at this time – minimize significant clashes with typing system designed for XML Schema WG • Expressible in the RDF model and syntax
EVA 2000 Relationships among vocabularies dc: Creator marc: 100 ms: director bib: Author
EVA 2000 Bringing it together • RDF Metadata transmission – Embedded (e. g. ), Transmitted with resource (HTTP), Trusted 3 rd Party (HTTP GET) • RDF Data Model – Support consistent encoding, exchange and processing of metadata… critical when aggregating data from multiple sources • RDF Schema – Declare, define, reuse vocabularies
EVA 2000 Open Archives Initiative http: //www. openarchives. org
EVA 2000 What is Interoperability? • Naming? – Handles – Purls • Metadata? – MARC – Dublin Core • Document models? – Web. DAV • Federated searching? – Z 39. 50? – DASL? • Services and Protocols? – Dienst
EVA 2000 Partitioning Interoperability Mediator Services Linking, Searching, Summarizing Metadata Harvesting Document Models
EVA 2000 The World According to OAI Service Providers Searching Current Awareness harvesting Data Providers Summarization
EVA 2000 UPS Meeting Results • Establishment of Open Archives Initiative – Loose coalition to experiment with interoperability solutions • Santa Fe Convention – Organizational and technical framework to support metadata harvesting for e. Print archives
EVA 2000 Metadata Harvesting is not New • Harvest Project (1992 -1995) – DARPA-funded – Mike Schwartz (U. Colorado), Mic Bowman (Penn State), Udi Manber (U. Arizona)
“Open” Archives EVA 2000 • Political Agenda? – Author self-archiving of E-Prints – “Mission” to reformulate scholarly publishing framework • Technical? – Infrastructure to facilitate interoperability across multiple domains
EVA 2000 Other communities of interest • “Cambridge” digital library federation meetings – research library community has many materials for which they’d like to ‘expose’ metadata • San Antonio OAI workshop – librarians, publishers (some), others
EVA 2000 Technical Umbrella for Practical Interoperability… E-Print Archives Publishers Reference Libraries …that can be exploited by different communities
EVA 2000 Acting mission statement Supply and promote an application independent technical framework – a supportive infrastructure that empowers different scholarly communities to pursue their own interests in interoperability in the technical, legal, business, and organizational contexts that are appropriate to them. Dan Greenstein, Director DLF
EVA 2000 What does this REALLY Mean? • Keep the bar low enough to make widespread adoption possible • Provide enough back-doors to make true “disruption” possible (e. g. , e. Print community: – refine record notion to mandate full-content connection – refine metadata to mandate linkage to fullcontent
EVA 2000 Organizational Stability • Institutional backing of CNI (Coalition for Networked Information) and DLF (Digital Library Federation) • Formation of steering committee – first steps towards international involvement
EVA 2000 Framework for Partitioning Tasks • Steering Committee – policy guidance • Technical Committee – technical specifications • Workshops – public dissemination, feedback, communitybuilding
EVA 2000 Ithaca Technical Meeting • Input – experiences gained with implementing & discussing the current SFc specs – emerging interest for the application of SFcconcepts as a general interoperability framework in a scholarly environment
EVA 2000 Ithaca technical meeting • Output – guidelines for an in-depth revised technical spec to be issued early 2001 – stable for experimentation; not definitive – minimize risk for early adopters – maximize chances for future interoperability across communities
EVA 2000 Components of OAI Model underlying concepts abstract principles concrete implementation of principles
EVA 2000 OAI Underlying Concepts managed archives (data providers) records in an archive open interface to archives service providers
Building on Underlying Concepts EVA 2000 abstract principles metadata harvesting identifiers metadata set formats acceptable use registration implementation of principle OAI harvesting protocol URIs (community schemes) DC & XML container (parallel sets) Flow Control (usage restrictions) (community specific)
EVA 2000 What is a record? A record in an archive is a metadata-record. The metadata record describes – and can contain an entry point to- full-content.
EVA 2000 Metadata: Interoperability & Extensibility We recognize that archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle. However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set.
EVA 2000 Metadata Solutions • Adoption of unqualified Dublin Core Element Set as required metadata. • Support for parallel metadata sets maintained – EPMS (e-print community) – Others • Research library community • Museum community
EVA 2000 Metadata XML Container
EVA 2000 Identifier Issues • Basic identifier constraints based on URI specifications – A key for requesting a record from a repository – Key and metadata format ID uniquely identify a record • Individual communities may develop URN registration schemes
Identifier Solutions EVA 2000 full-identifier = oai: archive-identifier: record-identifier Registered URI Scheme Unique ID within archive: (syntax is archiveexample = oai: ncstrl. cornellcs/TR 94 -1418 Archive specific) Idendifier: Registered within OAI
EVA 2000 Repositories, Identifiers, and Records Identifier Datestamp MF 1 MF 2 MF 3 MF 4
Selective harvesting EVA 2000 • Recognized need for light-weight facility for selective harvesting – By Date • Sets – A low-cost means of selective harvesting – NOT a general tool for defining global categories – Attribution of meanings to sets can be done within communities and in bilateral fashion
Protocol Solutions EVA 2000 • Normalized and Enhanced Verb Set – – – Get. Record Identity List. Identifiers List. Metadata. Formats List. Records List. Sets
Protocol Solutions EVA 2000 • CGI-script friendly syntax – – baseurl? verb=verbname&argname=argval. . . verbname is the name of the verb argname is the name of the attribute argval is the value of the attribute • Example http: //foo/blaz? verb=List. Records&set=S 1
EVA 2000 Registration Solutions • Automation through: – On-line registration of: • Archive identifier (uniqueness enforcement) • base-url of archives OAI protocol implementation – Identity verb that exposes archive characteristics – Use of protocol for registration of metadata formats and validity checking • Registration of service providers is still an open issue
EVA 2000 Release Schedule • October 15 – normalized meeting notes distributed to meeting group • November 1 – beta specification to steering committee and limited distribution • Early January – stabilization of specification and public meeting
EVA 2000 Moscow Metadata Landscape
EVA 2000 Conferences • ACM Digital Libraries 2001, San Antonio, June 2001, http: //www. dl 00. org/ • European Conference on Digital Libraries, Darmstadt, Sep 2001 http: //www. ecdl 2001. org • Asian Digital Library Conference, Seoul, December 2000, http: //ADL 2000. kaist. ac. kr • Tenth International WWW Conference, Hong Kong, May 2001, http: //www 10. org
EVA 2000 NSF Digital Library Initiative • Phase I (1994 -1998): six large-scale testbeds involving research universities, industrial partners, and next-generation technologies • Phase II (1999+): expanded scope, smaller projects as well as large testbeds, emphasis on making accessible new types of content
EVA 2000 Distributed National Electronic Resource (UK) • A managed environment for Internet access to scholarly journals and other materials relevant to higher education in the UK • Uses international standards (eg, Dublin Core) • National purchase and licensing agreements for best value to UK education community • e. Lib research funding since mid-1990 s emphasized incremental improvement of standards and services
EVA 2000 Global Info (Germany) • "The German Digital Library Project" • Since 1996, integrating access to scientific information among libraries, publishers, learned societies, and individual scientists • Emphasis on open standards (e. g. , Dublin Core) and open-standard formats (e. g. , XML, RDF, MPEG)
European Union EVA 2000 • Fifth Framework Programme, 1998 -2002 – – several dozen projects with several countries each Digital Heritage, Cultural Content Interactive Electronic Publishing Multimedia Content and Tools • DELOS Network of Excellence – http: //www. ercim. org/delos/ – Communication within European digital library research community and international networking
EVA 2000 Math. Net • German Mathematical Societies index math pre -prints and home pages of mathematicians – Encourages use of Dublin-Core-based metadata by distributing free metadata editor; displays hits "with metadata" separately from hits "without metadata" • International Mathematical Union (IMU) planning international Web service based on German Math. Net model • Seeking international agreement on simple metadata profiles for types of math materials
EVA 2000 IMS Global Learning Consortium, Inc. • Teachers seeking appropriate classroom materials on Web may want to know: – for which age-group? – has it already been used successfully in classrooms? – will it work on my equipment? • IMS: Rich descriptions of learning resources in a standard record format
Federal Geographic Data Committee EVA 2000 • (US) FGDC Content Standard for Digital Geospatial Metadata: integrate access to resources about a particular area found in diverse repositories • Government, education, and business needs – – Emergency management Integrated databases and comprehensive maps City planning Environmental control
EVA 2000 Visual Resources Association • VRA Core Categories in a two-level model for describing objects such as paintings and buildings • "Works" described separately from "images" of those works (One-to-One Principle) • Conceptual clarity of One-to-One Principle implies more complex work-flow and processing for catalogers and software
EVA 2000 Nordic Metadata Project • Cooperation between Scandinavian countries (since circa 1996) • Pioneered idea of metadata-based distributed index across national boundaries • Net. Lab (Lund University) maintains SAFARI, which harvests Dublin-Core-based metadata embedded in documents on Web servers
EVA 2000 Renardus Project (EU) • http: //www. konbib. nl/coop/reynard – National libraries (Netherlands coordinates) – NDR: National Digital Resource in UK – Die Deutsche Bibliothek • Goal: integrated access to subject gateways in Europe • High-level agreement on simple, Dublin. Core-based schema as common denominator
EVA 2000 Networked Digital Library of Theses and Dissertations (NDLTD) • http: //www. ndltd. org • International consortium of projects putting dissertations online • Difficult to agree on single unified metadata schema -- national, legal, and disciplinary requirements differ significantly • NDLTD agreement on a small Dublin-Corebased set of metadata elements?
CIDOC EVA 2000 • International Council of Museums: objectoriented model (CIDOC) designed for describing multiple entities that may be – – physical (e. g. , museum objects) conceptual (e. g. , works) temporal (e. g. , historical periods) spatial (e. g. , places) • Implies an integrated information space of "encyclopedic" scope
EVA 2000 Rich Site Summary (RSS) • Metadata for content syndication (news feeds) • Used in developing media content portals • Built on established vocabularies (DC), uses RDF syntax • Layers of application-specific semantics: syndication vocabularies, annotation vocabularies, etc.
Moving Picture Experts Group EVA 2000 (MPEG) • MPEG 4: encoding and interacting with audio-visual objects • MPEG 7: multimedia content description interface for such objects • MPEG 21: ambitious "umbrella" framework describing the infrastructure for delivering and consuming multimedia content
EVA 2000 More. . . • INDECS - Uses an event-based model to describe intellectual property rights for commercial transactions • DOI - Uses the INDECS framework with a Digital Object Identifier for content description and management of references between scientific, technical, and medical journals • BSR - Basic Semantic Registry as a universal interlingua of concepts • GILS - Government Information Locator Service
EVA 2000 . . . and more. . . • PDS - Planetary Data System • IEEE Learning Object Metadata - an elaborate, hierarchical scheme for describing multiple facets of educational material • MARC 21 - Machine Readable Cataloging format and related vocabularies for libraries • EPICS Data Dictionary, a subset of which -- ONIX -- describes books in a specific XML format (pushed by Amazon. com)
EVA 2000 For further information. . • "Metadata Watch Reports" of SCHEMAS Project, http: //www. schemas-forum. org – Critical overview (with expert commentary) on the metadata landscape as it evolves – Related database of individual activity reports • D-Lib Magazine, http: //www. dlib. org/dlib/ • Ariadne, http: //www. ariadne. ac. uk
EVA 2000 Why the Web won • Tim Berners-Lee's original model was very simple, and it was easy to implement • Real-world experience with simple HTML led iteratively to better understanding of priorities – As with bicycles and airplanes, there was no "theory" for design -- design was perfected iteratively, starting simple • Complex standards impose significant costs, especially if legacy data must be converted
EVA 2000 Learning from experience • People are only human: the most perfect language is always subject to interpretation • By design, metadata languages must allow for innovation and evolution • Physics and art history, Chinese and Finnish -different languages will continue in real life • Likewise, a diversity of metadata languages is inevitable • Interoperability over "everything" can only be via a simple and general pidgin
EVA 2000 thomas. baker@gmd. de


