- Количество слайдов: 61
Digital Libraries Lillian N. Cassel
A digital library • An informal definition of a digital library is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network. - • Wm Arms, Digital Libraries, 1999 • A focused collection of digital objects, including text, video, and audio, along with methods for access and retrieval, and for selection, organization, and maintenance of the collection. - • Witten and Bainbridge, How to Build a Digital Library 2003
What is a library? • An active exercise to explore what we know about, and think about, traditional libraries. • How do we translate these characteristics to the digital world? – Is that the right model? Are we unnecessarily constraining the digital environment? Are there things that do not translate?
Vannevar Bush • “As we may think” • (http: //www. theatlantic. com/doc/194507/bush) • Reflecting after WWII – The value of collaboration – The sad use of scientific expertise to invent the atomic bomb – The need for organization and access to information.
memex • Vannevar Bush’s vision Image source: kelty. rice. edu/375/images/memex/camera. jpg http: //www. knowledgesearch. org/presentations/etcon/images/memex. gif
My. Life. Bits • Gordon Bell and Microsoft • http: //www. guardian. co. uk/science/story/0, 3605, 1674359, 00. html “Gordon Bell doesn't need to remember, but has no chance of forgetting. At the age of 71, he is recording as much of his life as modern technology will allow, storing it all on a vast database: a digital facsimile of a life lived. If he goes for a walk, a miniature camera that dangles from his neck snaps pictures every minute or so, immediately committing the scene to a memory built not of neurons but ones and noughts. If he wanders into a cafe, sensors note the change in light, the shift of temperature and squirrel the information away. Conversations are recorded and steps logged thanks to a GPS receiver carried with him. ”
Related work • Walden’s Path – http: //www. csdl. tamu. edu/walden/ – System used by itself or as a service within a digital library – Allows a user to make a path through a set of related resources and save the path for reuse at a later time. • Used to allow a teacher to “blaze a trail” through a collection of materials to help students find their way from a starting point to a goal. • Also for recording personal trips through a collection of material to be revisited. How does that compare to a set of bookmarks?
Moving Forward • Looked at what a library is • Now – How do we translate that to a digital entity? • Information resources, including digital libraries, are very complex systems. – A formal model helps to capture the essence of the system and give special attention to specific areas – The model also allows developers of digital libraries to have a check list of areas to consider and develop well.
The 5 S model • Streams – The flow of information in various formats • Structures – Organizational aspects of the DL • Spaces – Views of components; real or abstract images • Scenarios – Services and behaviors • Societies – Communities and relationships among them
5 S summary Model Primitives Formalisms Objectives Stream Text; video, audio, software program Sequences, types Describes properties of the DL content, encoding and textual material or particular forms of multimedia data. Structure Collection, catalog; hypertext; document; metadata; organizational tools Graphs; nodes; links; labels; hierarchies Specifies organizational aspects of the DL content Space User Interface; index; retrieval model Sets; operations; vector Defines logical and space; measure space; presentational views of several probability space DL components Scenarios Service, event; condition; action Sequence diagrams; collaboration diagrams Societies Community; Object-oriented managers; actors; modeling constructs; classes; relationships; design patterns attributes; operators Details the behavior of DL services Defines managers responsible for running DL services; actors that use those services, and relationships among them Source: http: //www. dlib. vt. edu/projects/5 S-Model/
Etana - A DL for archeology
An example application of 5 S Etana: A DL for an archeological site Scenario model Society model Archaeologist General public Services Value added Service Manager Domain specific Space model Geographic space Structure model Repository building User interface Information Satisfaction Metric space Stream model *Site Text *Partition Video *Sub-partition Audio Spatial Temporal Region Taxonomies Metadata Artifact-specific *Locus Drawing *Container Photo *Artifact 3 D Source: E. A. Fox http: //feathers. dlib. vt. edu/
Applying the model, informally Personal Photos; Movie, TV, media • Stream - what types of data? Gif, jpg, avi? • Structure - How are the elements organized? Is there a hierarchy? Are there multiple structures? • Spaces - How would you index the items? How would you divide them into related groups • Scenarios - what services would you provide? What information do we need to provide those services? • Societies - who is the library intended to serve? Remember to include agents and other processes as well as users. In your group, choose one or the other (photos or movie/TV/media). Start with stream, scenarios, societies.
More formally: Definitions • Definition: A stream is a sequence whose co-domain is a non empty set. • Definition: A structure is a tuple (G, L, F) where G = (V, E) is a directed graph with vertex set V and edge set E, L is a set of label values, and F is a labeling function.
Definitions, cont’d • Definition: A space is a measurable space, measure space, probability space, vector space, topological space, or metric space – A vector space is a representation for the set of elements in a collection. The vector representing each element is a set of characteristics held by that element and both connecting that element to others that are similar and distinguishing it from those that are different. – We will do an exercise to illustrate
Definitions - 3 • Definition: A scenario is a sequence of related transition events (e 1, e 2, …, en) on state set S such that ek = (sk, sk+1, ) for 1 <= k <= n. – More easily visualized, a scenario is a path in a directed graph, G = (S, ∑e), where vertices correspond to states in the state set S and directed edges are equivalent to events in a set of events, ∑e, and correspond to transitions between states. – Scenarios must be implemented to make a working system.
Definitions - 4 • Definition: A society is a tuple (C, R) where – C = (c 1, c 2, …, cn) is a set of conceptual communities, each community referring to a set of individuals of the same class or type (e. g. actors, activities, components, hardware, software, data); – R = (r 1, r 2, …, rm) is a set of relationships, each relationship being a tuple rj = (ej, ij) where ej is a Cartesian product ck 1 x ck 2 x … x cknj. 1<= k 1 < k 2 < … < knj<= n, which specifies the communities involved in the relationship and ij is an activity.
The Digital Library Content • Essential elements for a digital library – Users – Content – Services
Content - requirements • Store – Organize – Describe • Find • Deliver
Describing the content • How to describe content – Metadata • Machine readable description of anything • What description – Machine readable requires standard descriptive elements • Dublin Core (http: //dublincore. org/) – International standard – “a standard for cross-domain information resource description. ” – 15 descriptive elements • Other metadata schemes – IEEE-LOM
Metadata • What does metadata look like? • Metadata is data about data – Information about a resource, encoded in the resource or associated with the resource. • The language of metadata: XML – e. Xtensible Markup Language
Google Books Project • Michael A. Keller, Closing Keynote – – Ida M. Green University Librarian at Stanford, Director of Academic Information Resources, Publisher of High. Wire Press, and Publisher of the Stanford University Press: • "One good turn deserves another; how the Google Book Search project is benefiting everyone".
Google Books demo • Full text - Life of Miguel de Cervantes • Limited Preview - The Life of Miguel de Cervantes Saavedra • Snippet View - "Discreción" in the Works of Cervantes: A Semantic Study
What has been accomplished • As of September 2006 • Nearly 30, 000 Stanford books digitized – ~1 M books from all partner libraries • Over 4, 000 books identified as needing preservation treatment (& so not digitized) • A great debate about copyright has started – Orphan works – What can an archive do to provide access – Defense of fair use underway This slide is taken from the presentation by Michael A. Keller at ECDL 2006
Original Principles • If legally possible, digitize every book (9 M volumes) in the Stanford libraries – Now digitizing with imprint dates up to 1963 • Partner libraries (*added recently) – – – University of Michigan (similar to Stanford) Harvard (public domain (? ), maybe > 1 M) NYPL (public domain, unusual collections) Oxford - Bodleian (earlier than 1885, ~ 1 M titles) University of California (similar to Stanford >6 M) (more to follow) This slide is taken from the presentation by Michael A. Keller at ECDL 2006
Purposes • Digital preservation – Virtual Bookshelves in Stanford Digital Repository under construction as part of the Stanford Digital Repository – For Stanford use only • Other searching and research functions – – Subtle searching (as in Socrates & High. Wire) Taxonomic (LCSH & High. Wire) & Associative Searching (Takano) Citation linking (High. Wire & “Infor. Tools” (Ebrary) Better navigation (through visualization ? ) (Grokker) • Digitized books from all sources as test bed for new research; combine with articles, datasets, etc. for data mining & other transformative uses. This slide is taken from the presentation by Michael A. Keller at ECDL 2006
Some Conclusions • Google Book Search – Is an indexing, not a publishing project – Offers substantial increases in access to contents of books in library collections by keyword searching – Offers publishers global marketing of their publications – Offers several useful services to readers • Offers participating libraries – Digital copies of books on their shelves for preservation – New possibilities for services to local readers – New possibilities for research for local faculty & students This slide is taken from the presentation by Michael A. Keller at ECDL 2006
Google statement • “Many of the books in Google Book Search come from authors and publishers who participate in our Partner Program. For these books, our partners decide how much of the book is browsable -- anywhere from a few sample pages to the whole book. • For books that enter Book Search through the Library Project, what you see depends on the book's copyright status. We respect copyright law and the tremendous creative effort authors put into their work. If the book is in the public domain and therefore out of copyright, you can page through the entire book and even download it and read it offline. But if the book is under copyright, and the publisher or author is not part of the Partner Program, we only show basic information about the book, similar to a card catalog, and, in some cases, a few snippets -sentences of your search terms in context. The aim of Google Book Search is to help you discover books and learn where to buy or borrow them, not read them online from start to finish. It's like going to a bookstore and browsing - with a Google twist. ” http: //books. google. com/support/bin/answer. py? answer=43729&topic=9259&hl=en
Other projects • Open Content Alliance (Yahoo and the Internet Archive) • The Internet Archive www. archive. org • The European Digital Library (Growing number of countries) • others Comments? Discussion?
A DL example • Library of Congress American Memory project – http: //memory. loc. gov/ammem/index. html – “American Memory provides free and open access through the Internet to written and spoken words, sound recordings, still and moving images, prints, maps, and sheet music that document the American experience. It is a digital record of American history and creativity. These materials, from the collections of the Library of Congress and other institutions, chronicle historical events, people, places, and ideas that continue to shape America, serving the public as a resource for education and lifelong learning. ”
Dublin Core for a map • Map found in the LOC American Memory collection – Map at http: //memory. loc. gov/ammem/gmdhtml/gmdhome. html • Dublin Core metadata illustration found at http: //webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm – Part of a DL course at U. of Alabama
Go to web site to explore what is there -including copyright information, title, history, etc.
Dublin Core: Title • Name given, usually by the creator or publisher < META name = “DC. Title” content = “Novi Belgii Novæque Angliæ: nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” > Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Dublin Core: Subject • What the work is about, possibly keywords, terms from classification scheme if available. Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Dublin Core: Description • Free text description, abstract, etc. Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Dublin Core: Source • Is this object derived from another? Is this map a part of a larger map? Is this text a variation or revision of another piece of text?
Dublin Core: Language • Language of the content of the resource • For the map, there is no language content Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Dublin Core: Relation • To what other object(s) or collection is this object related? Does it also exist in another collection? Is it derived from another document or image? How is it related? Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Dublin Core: Creator • Person or organization responsible for the Intellectual Content of this object Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Dublin Core: Publisher • Entity responsible for making the resource available in its present form • Not shown in the example, but should be something like this: Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Dublin Core: Contributor • Any entity making a contribution to this object. • Example: someone who added some information to the original document or image • No entry for this map.
Dublin Core: Rights • A pointer to a copyright notice, a rights management statement, or a rights server.
Dublin Core: Date • Date on which this object was made available in its present form, possibly the date it was entered into this digital collection. Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Dublin Core: Type or Category • What sort of thing is this? Some examples: home page, novel, poem, working paper, technical report, essay dictionary, … • Type should be selected from a controlled list. For example, see the DCMI Type Vocabulary: • http: //dublincore. org/documents/2006/08/28/dcmi-type-vocabulary/ Why is this recommended as a controlled vocabulary field?
DCMI Type Vocabulary • • • Collection Dataset Event Image Interactive. Resource Moving. Image • • • Physical. Object Service Software Sound Still. Image Text See the official page for explanations of the categories. Note that Image is a broad category and Moving Image and Still. Image are more restricted subcategories.
Dublin Core: Type • Category of this resource Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Dublin Core: Format • The way the content is encoded. This tells what resource is needed to access this content. http: //www. graphcomp. com/info/specs/mime. html
Dublin Core: Unique ID • The key for this object in the collection. • I cannot find one for the map we are looking at, but the ID for the map of which it is a part is g 3715 ct 000001 • The Metadata specification for that would be Source: http: //memory. loc. gov/cgi-bin/query/r? ammem/gmd: @filreq(@field(NUMBER+ @band(g 3715+ct 000001))[email protected](COLLID+dsxpmap))
Dublin Core: Coverage • The time, space or other measurement of the scope or completeness of the object. • No coverage entry specified, but might be this: would a controlled vocabulary be better?
International Concensus • Recognition of International Scope of Resource Discovery on Web • 17 Countries Currently Involved in DC Working Groups • 50+ Implementation Projects in 10 Countries Source: webapp. slis. ua. edu/smmweb/DLib/Metadata/Organizing. Internet. Resources_files/v 3_document. htm
Guide to Good Practice • The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials • http: //www. nyu. edu/its/humanities/ninchguide/index. html
Legal and Technical Issues • Legal: When is a resource available to digitize and make available. What requirements exist for controlling access. • Technical: How do we control access to a resource that is stored online? – Policies – Encoding – Distribution limitations
Date of work Protected from Term Created 1 -1 -78 or after When work is fixed in tangible Life + 70 years 1(or if work of corporate authorship, the shorter of 95 medium of expression Published before 1923 In public domain None Published 1923 63 When published with notice 28 years + could be renewed for 47 years, now extended by 20 years for a total renewal of 67 years. If not so renewed, now in public domain Published from 1964 - 77 When published with notice 28 years for first term; now automatic extension of 67 years for second term Created before 11 -78 but not published 1 -1 -78, the effective date of the 1976 Act which eliminated common law copyright Life + 70 years or 12 -31 -2002, whichever is greater Created before 1 -1 -78 but published between then and 12 -31 -2002 1 -1 -78, the effective date of the 1976 Act which eliminated common law copyright Life + 70 years or 12 -31 -2047 whichever is greater years from publication, or 120 years from creation Chart created by Lolly Gasaway. Updates at http: //www. unc. edu/~unclng/public-d. htm
Works for hire • Usual case -- works created by faculty are not the property of the university. – Faculty surrender copyright to publishers of journals and books – Some publishers allow faculty to retain copyright, giving the publisher specific limited rights to reproduce and distribute the work.
Fair use • No clear, easy answers. • Checksheet provided in the article is a good guide to the issues. • Link to the checksheet: http: //www. copyright. iupui. edu/checklist. htm
Moral rights • Fair to the creator – Keep the identity of the creator of the work – Do not cut the work – Generally, be considerate of the person (or institution) that created the work.
Getting Permission • With the best will in the world, getting the appropriate permissions is not always easy. – Identify who holds the rights – Get in touch with the rights holder – Get a suitable agreement to cover the needs of your use. • Useful links: http: //www. loc. gov/copyright/ http: //www. utsystem. edu/OGC/Intellectual. Property/PERMISSN. HTM – Connections to various ways to discover and contact the rights holder of a work.
Checking copyright status Source: NINCH Guide to Good Practice. Chapter 4: Rights Management
Considering people depicted in the work Source: NINCH Guide to Good Practice. Chapter 4: Rights Management Copyright: Lauryn G. Grant
Technical issues • Link the resource to the copyright statements • Maintain that link when the resource is copied or used • Approaches: – – Steganography Encryption Digital Wrappers Digital Watermarks
Issues in Encryption • General cases for protection of controlled content: Concern for passive listening, active interference. – Listening: intruder gains information, may not be detected. Effects indirect. – Active interference • Intruder may prevent delivery of the message to the intended recipient. • Intruder may substitute a fake message for the intended one • Effects are direct and immediate • Less likely in the case of digital library content