537c9eccba098da4b3efa549ac2f08fa.ppt
- Количество слайдов: 29
Metadata for Digital Libraries: A Functional Approach Cornell Digital Imaging Workshop October 21, 1998 Sandra Payette Digital Library Research Group Cornell University payette@cs. cornell. edu
Metadata CREATOR: Plato TITLE: The Republic Metadata is structured data about data that imposes order on a disordered information universe. Image File Image 1 Image 2 Image 3 Storage cdrom 1 cdrom 2 Access Control List
Many Types of Metadata • • Descriptive Structural Terms and conditions Administrative Content ratings Provenance Relationship
Basic Functions We Must Support • Resource Discovery • Access and Use • Preservation and Administration
Resource Discovery: Focus on Descriptive Metadata
Metadata for Resource Discovery • Catalogs – OPAC / MARC Records • Indexes – Structured descriptive records (e. g. , Dublin Core) – Abstracts – Full-text surrogates (e. g, via OCR)
Challenges • Impracticality of large-scale traditional cataloging – time consuming, labor intensive, special skills – limited coverage - only “selected” items • Problems with resource discovery – full-text indexing ineffective (false hits, irrelevancy, overload) – full-text approaches not useful for non-textual data (e. g. , audio, video, executable programs)
One Solution: Simple Descriptive Surrogates • • Easy to create Applicable across domains Applicable for different genre of objects Allows interoperability among robots, indexers, and search clients
Dublin Core Element Set • Good baseline descriptive record • Can exist along side other specialized metadata • Common ground for discovery across disparate resources • No specialized skills required • Flexibility through qualifiers Source: http: //www. purl. org/Metadata/dublin_core/
Dublin Core : 15 Elements • Title name given to the work by the author • Author or Creator person(s) responsible for the intellectual content • Subject and Keywords the topic of the work, keywords, or formal classification schemes • Description textual description of the content (abstract, prose describing an image, etc. ) • Publisher the organization making the work available in its present form • Other Contributor person(s) other than the author who have made significant contributions to the intellectual content • Date the date the work was made available • Resource Type category of the resource • Format Data representation of the resource • Resource Identifier Unique Identification string (e. g. URL, URN, ISBN. . . ) • Source object from which this object is derived (if applicable) • Language language of the intellectual content of the object • Relation relationship of the object to other objects or collections • Coverage spatial locations and temporal duration characteristics • Rights Management a pointer to a copyright notice, a rights management statement, or a rights server.
Dublin Core in HTML META Tags <html> <head> <title>Cornell Digital Library Research Group</title> <META name="DC. subject" content=”digital library research"> <META name="DC. subject" content="networked object description"> <META name="DC. publisher" content=”Cornell University"> <META name="DC. creator" content=”Lagoze, Carl, lagoze@cs. cornell. edu. "> <META name="DC. creator" content=”Payette, Sandra, payette@cs. cornell. edu. "> <META name="DC. title" content=”Cornell Digital Library Research Group"> <META name="DC. date” content="1998 -05 -15"> <META name="DC. form" scheme="IMT" content="text/html"> <META name="DC. language" scheme="ISO 639" content="en"> <META name="DC. identifier" scheme="URL" content="http: //www 2. cs. cornell. edu/NCSTRL/CDLRG/cdlrg. htm"> </head> <IMG SRC="/mydir/mysubdir/mypicture. gif" WIDTH=208 HEIGHT=216> </html> Source: http: //www. w 3. org/TR/REC-html 40/
Warwick Framework • Developed by Dublin Core community • Broader framework to accommodate diverse metadata schemes • Encourages community-specific definition and administration of metadata • Modularity supports interoperability among: – content providers – catalogers and indexers – automated resource discovery systems
Warwick Framework Container Simple Package: Typed Metadata Set Package Dublin Core Package Other Descriptive Package Reference to MARC Package URI MARC Record
WWW Infrastructure Evolving in this Direction • Dublin Core submitted to IETF as RFC – ftp: //ftp. isi. edu/in-notes/rfc 2413. txt • Resource Description Framework (RDF) – http: //www. w 3. org/RDF/ • Extensible Markup Language (XML) – http: //www. w 3. org/XML/
Resource Description Framework (RDF) • Influenced by the Warwick Framework, among others • Enables interoperability between applications that exchange metadata • Mix and match of metadata elements from different schemas • An application of XML (transfer syntax)
A Simple RDF Model DC: Creator www 2. cs. cornell. edu/CDLRG/doc 1 DC: Publisher QCSchema: Rating www. xxx. org/rate My. Rating A Your. Rating B
RDF Expressed in XML <? xml: namespace name= “http: //www. purl. org/Metadata/dublin_core/” as=“DC”> <? xml: namespace name= “http: //www. w 3. org/Schemas/RDF/” as=“RDF”> <RDF: Serialization> <RDF: Assertions href=“http: //www 2. cs. cornell. edu/CDLRG/doc 1”> <DC: Creator>Sandy Payette</DC: Creator> <DC: Publisher>Cornell DLRG </DC: Publisher> </RDF: Assertions> </RDF: Serialization> Dublin Core Element Set
RDF: Why is it important? • Market demand for metadata deployment • Software infrastructure will be ubiquitous (e. g. free in browsers, servers, proxies, editors, etc. ) • RDF is a general purpose framework that provides structured, human-readable and machineunderstandable metadata for the web • Allows stakeholder communities to independently developed, maintain, and reuse vocabularies
Access and Use Focus on Structural Metadata
Structural Metadata • What is it? Data that…. – Defines structure within documents – Aggregates images into meaningful entities – Correlates document components to image files – Organizes a collection of objects • Where is it? – ASCII text files in directories – Relational databases – Embedded in documents or surrogates (e. g. SGML)
First. . . A Data Model Front 0: 1 Table Contents 1: N Chapter 1: N 0: 1 Index Page 1: N Data models mirror natural attributes and relationships of real-world objects
“Binding” Document Images with SGML <!DOCTYPE EBIND PUBLIC "-//UC Berkeley//DTD ebind. dtd (Electronic. Binding (Ebind))//EN" [ <!ENTITY % birch PUBLIC "-//UC Berkeley//ENTITIES Birch-tree fairy book (Page Images)//EN"> %birch; ]> <ebind type="book"> <front> <page><image entityref="birch 001" seqno="1" nativeno="i"></page> <page><image entityref="birch 002" seqno="2" nativeno="ii"></page> <page><image entityref="birch 003" seqno="3" nativeno="iii"></page> <page><image entityref="birch 004" seqno="4" nativeno="iv"></page> <div 0 type="titlepage"> <page><image entityref="birch 005" seqno="5" nativeno="v"></page> <page><image entityref="birch 006" seqno="6" nativeno="vi"></page> </div 0> <div 0 type="introduction"> <head>Introductory note</head> <page><image entityref="birch 007" seqno="7" nativeno="vii"></page> </div 0> Source: http: //sunsite. berkeley. edu/Ebind/
Finding Aids in SGML • Encoded Archival Description (EAD) – SGML mark up of descriptive access tools (inventories, registers, indexes, and guides) – provides more detail about a collection than in typical catalog record – facilitates access - “drill down” into collection – potential international standard – maintained jointly by Library of Congress and Society of American Archivists (SAA) Source: http: //www. loc. gov/rr/eadhome. html
Preservation and Administration Focus on Administrative Metadata and Persistent Identifiers
Administrative Metadata • Information for managing images… over time – relocation – migration (new formats) – copyright tracking – archiving of objects and services • Where is it? – File headers (to help prevent orphaned images) – External databases (e. g. , relational db) – Separate files stored with images
Create a Preservation Audit Trail Image File Attributes: • formats • versions • compression Image Attributes: • resolution • bit depth • orientation Process Data: • creation date/time • equipment used Rights Management Data: • Expiration dates • Copyright info • source statements
Persistent Identifiers • Globally unique names • Persistent … names are permanent, lasting • Used in resolution services to locate the object (locations change over time). Unique Identifier: cnri. dlib/april 97 -payette Naming Authority URL: Item Name http: //www. somewebserver. org/somedirectory/somefile
Identifiers: Current Initiatives • IETF Uniform Resource Names (URN) – specification of URN framework – requirements for resolution systems – syntax definition • Existing Systems – CNRI’s Handle System – OCLC PURLs – DOI Initiative
Further reading • IFLA: A Good List http: //www. nlc-bnc. ca/ifla/II/metadata. htm • Lynch, et. al. : CNI Resource Discovery White Paper http: //www. cni. org/projects/nidr. html • Lagoze: Resource Discovery in the Digital Age http: //www. dlib. org/dlib/june 97/06 lagoze. html • Payette: Persistent Identifiers, RLG Digi. News http: //www. rlg. org/preserv/diginews 22. html • W 3 C: Metadata Overview http: //www. w 3. org/Metadata
537c9eccba098da4b3efa549ac2f08fa.ppt