b4a873775c110981d0c504ffaaa1e8ae.ppt
- Количество слайдов: 58
Metadata 101 Amy Benson NELINET, Inc. November 7, 2005 1
Standards l l l 2 Increase interoperability Lower use and participation barriers Build larger communities of users which can drive creation of a wider range of relevant services and tools (Windows vs Mac) Improve chances of long term survival of materials Prefer open over proprietary
Categories l Metadata containers – l Metadata standards – l l 3 MARC, MODS, DC, EAD, TEI, ONIX, FGDC, GILS Metadata content standards Transmission standards and protocols – l XML, RDF METS, OAI, SOAP, Z 39. 50, SRW Identifiers – URI, URL, PURL, URN, DOI, ISTC
Metadata - What is it? l l l 4 Data about data Information about any aspect of a resource size, location, attributes, topic, origin, use, audience, creator, quality, access rights, reviews… the list is endless An aid to the discovery, identification, assessment, and management of described entities
Types of Metadata l Descriptive – l Discovery – l What files comprise it? Administrative – 5 How can I find it? Structural – l What is it? When was it created?
Types of Metadata l Identifiers – l Terms & conditions – l Can I use it? Preservation – 6 How can I get to it? Which key characteristics of the resource need to be maintained?
MARC l Advantages – – – l Disadvantages – – – 7 Rich set of descriptive elements Highly interoperable within library community Long, established history Low extensibility As is, not interoperable beyond the library world Weak on administrative, rights, and other kinds of metadata important for digital resources
MARC l Future of MARC – l l l 8 Must MARC die? No. New life through XML MARC XML from the Library of Congress (LC) MODS: a version of MARC encoded in XML, developed by the Library of Congress Crosswalks between MARC and many other metadata schemas already exist
MARC XML l l LC has developed a MARC XML schema, stylesheets, and tools The schema allows representation of a complete MARC record in XML – l Will support new transformations to new uses of MARC data – 9 Lossless conversion MARC to MARCXML to Dublin Core and MODS
Metadata Object Description Schema (MODS) l l l 10 Set of 20 bibliographic elements - a subset of the MARC 21 Format for Bibliographic Data Not as complete as the full MARC format, but richer than Dublin Core (for example) Highly interoperable with existing MARC records Uses language-based tags, rather than numbers like MARC 21 (245, 650, etc. ) Under development by the LC Network Development and MARC Standards Office
MODS l XML-based – l l 11 l Intended to work with/complement other metadata formats Can be used for conversion of existing MARC records or to create new resource description records Useful particularly for library applications that want to go beyond the OPAC Shares features of MARC and Dublin Core
MODS Elements l l l l l 12 l Title. Info Name Type. Of. Resource Genre Publication. Info Language Physical. Description Abstract Table. Of. Contents Target. Audience l l l l l Note Cartographics Subject Classification Related. Item Identifier Location Access. Condition Extension Record. Info
MODS Elements l l 13 Title element is mandatory, all others are optional Elements can have subelements and attributes which provide refining detail for the element Elements and sub-elements are repeatable, except in certain cases Elements display in any order
MODS Example 14
MODS Implementation l MODS User Guidelines – l l MODS Implementation Registry Contains descriptions of MODS projects planned, in progress, and fully implemented – 15 http: //www. loc. gov/standards/mods/registry. html
Dublin Core (DC) l l l A method of describing resources intended to facilitate the discovery of electronic resources Designed to allow simple description of resources by non-catalogers as well as specialists National and International standard – – 16 l ANSI/NISO standard Z 39. 85 -2001 ISO standard 15836 Includes 15 “core” elements
Dublin Core Elements l l l l 17 Title Creator Subject Description Publisher Contributor Date Type l l l l Format Identifier Source Language Relation Coverage Rights
Dublin Core l l l l 18 All elements optional and repeatable Elements display in any order Authority control not required Simple and Qualified DC Extensible Flexible International
Dublin Core l Simple – – – l Qualified – – 19 Lowest common denominator Less rich Discovery role – leads to resource or more complete description of resource More precise Less interoperable
Dublin Core Examples l l l Generic Title=“The sound of music” HTML <meta name = "DC. Title" content = “The sound of music”> XML <? xml version="1. 0"? > <metadata xmlns: dc="http: //purl. org/dc/elements/1. 1/"> <dc: title> The Sound of Music</dc: title> </metadata> 20
Dublin Core Examples - HTML 21
Dublin Core Examples - XML 22
Other Metadata Standards l l l l 23 Encoded Archival Description (EAD) Text Encoding Initiative (TEI) Visual Resources Association (VRA) Global Information Locator Service (GILS) Online Information Exchange (ONIX) Content Standards for Digital Geospatial Metadata (CSDGM) aka FGDC Document Data Initiative (DDI)
Crosswalks l Crosswalks map an element from one scheme to its closest equivalent in another scheme – l l 24 Example: MARC 1 XX field is mapped to DC ‘creator’ Instrumental for converting data in one format to another format - one that is potentially more widely accessible Support the demand for cross-domain searching and interoperability
Crosswalks l There is rarely a one-to-one correlation between elements of different schemes – – – l MARC to DC – 25 One to many - DC to MARC Many to one or none - MARC to DC None to one or many http: //www. loc. gov/marc 2 dc. html#unqualif
Content Standards l AACR (Anglo-American Cataloguing Rules) – – – 26 “The rules cover the description of, and the provision of access points for, all library materials commonly collected at the present time. ” The current text is the 2 nd ed, 2002 Revision (with 2003, 2004, and 2005 updates) The Joint Steering Committee for Revision of AACR (JSC) is working on a new code, “RDA: Resource Description and Access” scheduled to be published in 2008
Content Standards l International Standard Bibliographic Description (ISBD) – – – 27 A family of standards to regularize the form and content of bibliographic descriptions Available for different material types: monographs, computer files, etc. Designed to promote record sharing and exchange
Content Standards l Describing Archives: A Content Standard (DACS) – – 28 Designed to facilitate consistent, appropriate, and self-explanatory description of archival materials and creators of archival materials Replaces Archives, Personal Papers, and Manuscripts (APPM)
Metadata Encoding & Transmission Standard (METS) l l l 29 A system for packaging metadata necessary for both the management of digital library objects within a repository and the exchange of such objects between repositories, or between repositories and their users Used for: Digital collection repositories Developed by the Digital Library Federation (DLF) and Library of Congress (LC)
Metadata Encoding & Transmission Standard (METS) l l METS can be understood as a binder that unites metadata about a particular resource A METS record includes six parts: – – – 30 Header Descriptive metadata Administrative metadata File groups Structural map Behavior section
31
METS Schema 32
Open Archives Initiative (OAI) l l l 33 A tool that supports interoperability among multiple databases OAI goal: coarse-granularity resource discovery OAI handles simple discovery from multiple community-specific repositories with metadata crosswalked to unqualified Dublin Core
OAI l l 34 Roots are in the science community interested in locating and searching multiple repositories of pre- and e-prints of scientific papers Not really an archive, the way we traditionally think of the word
OAI l l 35 Data providers expose (make available) the metadata for their collections Service providers harvest the exposed metadata and aggregate it (so that one search does it all) and/or provide additional services related to the harvested metadata, such as providing easy access to recent additions, updated materials, pre-set searches, etc.
OAI l OAI Protocol for Metadata Harvesting – – 36 Metadata content must be encoded in XML and have a corresponding XML schema for validation Metadata must be supplied in unqualified Dublin Core format, at least Other metadata formats are optional Metadata may optionally include a link to the actual content / resource
OAI Infrastructure repository Service Provider repository DC DC DC Harvester DC repository 37
OAI Infrastructure user search Repository 38
OAI Infrastructure user search Repository repository 39
Z 39. 50 l l 40 Z 39. 50 is a search and retrieval protocol, maintained by LC, capable of operating over TCP/IP Negotiates queries with multiple, separate databases – does not harvest + create new db Built in to some library software systems OAI not intended to replace other approaches, but to provide an easy-to-use alternative for different constituencies and purposes
Search/Retrieve Web Service l l l The primary function of SRW is to allow a user to search remote databases of records Protocol uses easily available technologies -XML, SOAP, HTTP, URI -- to perform tasks traditionally done using proprietary solutions such as database queries and responses Builds on Z 39. 50 and moves it forward – 41 ZING: Z 39. 50 International: Next Generation
Functional Requirements for Bibliographic Records (FRBR) l A study by IFLA (International Federation of Library Associations) of the full range of functions performed by the bibliographic record – What do we use bibliographic records for? l l l 42 Description, access, location, identification, annotations. . . The report provides a framework for the nature of and uses for bibliographic records A conceptual model that can be used as a means to meet user needs and expectations
Functional Requirements for Bibliographic Records (FRBR) l Tasks we use bibliographic records for: – – l 43 Finding Identifying Selecting Obtaining access to resources FRBR should allow systems to handle bibliographic data in new, useful ways that fulfill these tasks
Functional Requirements for Bibliographic Records (FRBR) l l Conceptual model of relationships between bibliographic entities Hierarchical relationships – Work l – Expression l l 44 The intellectual product An ‘expression’ of the parent work such as a translation, edition, revisions, annotated text, etc. – Expressions entail additional intellectual effort
Functional Requirements for Bibliographic Records (FRBR) l Hierarchical relationships – Manifestation l l – Item l l 45 Published runs of each expression in multiple formats over time The level at which we traditionally create a catalog record Each copy of a specific manifestation Circulation records track items
46
Functional Requirements for Bibliographic Records (FRBR) l OCLC is researching the application of FRBR to World. Cat – l “FRBRization” They have created an algorithm that groups records automatically based on the Work/Expression/Manifestation/Item model http: //www. oclc. org/research/projects/frbr/algorithm. htm 47
Identifiers l Four potential purposes – Locator l – Identifier l – Groups like resources similar to a uniform title Differentiator l 48 Unique label for a resource Gatherers l – Where is the document I seek? Helps identify different versions of same resource
Identifiers l Uniform Resource Identifiers (URI) – Generic set of all names/addresses that refer to resources on the Web including: l l l 49 Uniform Resource Locator (URL) Persistent Uniform Resource Locator (PURL) Uniform Resource Name (URN) Open. URL DOI ISTC
Uniform Resource Locator (URL) l l l Web address or location at which a resource is held, not an identifier for the resource itself Most common way to locate documents / items on the Web (http, ftp, mailto, etc. ) Not particularly stable or permanent – l 50 Error 404: File not Found No metadata, but important starting point as we look at some of the related technologies
Persistent Uniform Resource Locator (PURL) l l 51 PURL Service is managed by OCLC Functionally, a PURL is a URL The PURL remains constant even if the URL changes - its function is to automatically redirect a user to the current URL PURL system/resolver is updated by resource manager to reflect any changes to location of the file, or URL
PURLs l l l 52 PURLs can be used both in documents and in cataloging systems PURLs increase the probability of correct resolution and long-term access to resources Use of PURLs can reduce the burden and expense of catalog maintenance (and business card printing)
PURL - Example l US Government is a big user of PURLs – 53 http: //www. ccny. cuny. edu/library/Divisions/Governm ent/iraqbib. html
Uniform Resource Name (URN) l l Uniform Resource Names (URNs) are intended to serve as persistent, location-independent resource identifiers Globally unique Never change Format – l 54 urn: <namespace identifier>: <namespace specific string> Use a resolver system to indicate current location of resource
Digital Object Identifier (DOI) l l 55 Overseen by the International DOI Foundation DOIs are persistent, location-independent identifiers of resources Developed to enable management of copyrightable materials in an electronic environment (locate, buy, sell, track, license) Specific type / implementation of a URN
DOI l A two-part number with a prefix identifying the original publisher and a suffix identifying the specific work – l A DOI resolution request for a specific resource would return one or more URLs - *locations* where a user could obtain access to the resource – 56 Similar to the ISBN Appropriate copy: online, text, free, illustrated, etc.
DOI l l l Applications of the DOI will require metadata The basis of the DOI metadata scheme is a minimal "kernel" of elements DOI minimal kernel elements of metadata: – 57 DOI, DOI genre, identifier, title, type, origination, primary agent, agent role, and administrative data such as registrant, and date of registration
Questions? Amy Benson Program Director NELINET Digital Services NELINET, Inc. benson@nelinet. net 508. 597. 1937 800. 635. 4638 x 1937 58
b4a873775c110981d0c504ffaaa1e8ae.ppt