Скачать презентацию HATHI TRUST A Shared Digital Repository Digital Preservation Скачать презентацию HATHI TRUST A Shared Digital Repository Digital Preservation

b64cddab3a6a940304d06e9008bdd7c9.ppt

  • Количество слайдов: 44

HATHI TRUST A Shared Digital Repository Digital Preservation, Hathi. Trust, and the Reimagination of HATHI TRUST A Shared Digital Repository Digital Preservation, Hathi. Trust, and the Reimagination of the Library Landscape Jeremy York Iceland August 5, 2010

Outline • Digital Preservation in U. S. • Hathi. Trust – – – About Outline • Digital Preservation in U. S. • Hathi. Trust – – – About Hathi. Trust Content What we do (services) Governance Partnership & Resources • Google Settlement • Publishing • Changing Library Landscape

Books and Journals Archives Data Portico • Centralized • Journals • Source files, mainly Books and Journals Archives Data Portico • Centralized • Journals • Source files, mainly focused on XML, highly controlled transformation Internet Archive • Centralized • Web files ICPSR • Centralized • Social science data LOCKSS • Distributed • Journals • Web files, not source images or XML Meta. Archive (NDIIPP) • Distributed • Private LOCKSS Network • Web files DATA PASS (NDIIPP) • Distributed • Social science data Hathi. Trust • Centralized • Books and Journals • Master image and OCR files International Internet Preservation Consortium • Distributed • Harvesting tools, Access, Preservation strategies Geo. MAPP (NDIIPP) • Distributed • Geospatial data • State governments OCLC – Digital Archive • Centralized • Master files, web archiving • CONTENTdm, custom repository LOCKSS, Dura. Cloud, DSpace, Fedora

NDIIPP Mission: Develop a national strategy to collect, preserve and make available significant digital NDIIPP Mission: Develop a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations. • Since 2000 • Broad collaborations with institutions and organizations (e. g. , OCLC, Portico) • Funding (Establishing a network, Preserving Creative America, Preserving State Government Information) • Standards/Best Practices • Tools o JHOVE 2 (validation) o Chronopolis (data grid framework) o Dataverse (management, dissemination, exchange, and citation of virtual collections (dataverses) of quantitative data) o Bag. It (transfer utilities creation, manipulation and validation of bags) o Hub and Spoke (repository interoperability) o FITS (bundle of identification, validation and metadata extraction tools)

About About

Hathi. Trust Digital Library • Digital Repository – Initial focus on digitized book and Hathi. Trust Digital Library • Digital Repository – Initial focus on digitized book and journal content – “Light” archive • Collections and Collaboration – Comprehensive collection – Shared strategies – Local services – Public Good

Current Partners – – Columbia University New York Public Library University of California system Current Partners – – Columbia University New York Public Library University of California system CIC (Committee on Institutional Cooperation) University of Chicago University of Illinois Indiana University of Iowa University of Michigan State University – University of Virginia – Yale University of Minnesota Northwestern University Ohio State University Pennsylvania State University Purdue University of Wisconsin Madison

Content Distribution 6, 383, 209 – Total 1, 234, 088 – Public Domain * Content Distribution 6, 383, 209 – Total 1, 234, 088 – Public Domain * As of August 5, 2010

Language Distribution (1) * As of July 25, 2010 Language Distribution (1) * As of July 25, 2010

Language Distribution (2) The next 40 languages make up ~13% of total * As Language Distribution (2) The next 40 languages make up ~13% of total * As of July 25, 2010

Dates * As of July 25, 2010 Dates * As of July 25, 2010

Originating Institution * As of July 25, 2010 Originating Institution * As of July 25, 2010

Content over time * As of July 25, 2010 Content over time * As of July 25, 2010

Content Growth Content Growth

What we do What we do

Services (1) • Ingest – Google, Internet Archive – Working toward sustainable model for Services (1) • Ingest – Google, Internet Archive – Working toward sustainable model for ingest of content from diverse sources • Long term preservation – Bit level, migration – Standard and open formats (ITU G 4 TIFF, JPEG 2000, JPG, Unicode) – OAIS, TRAC – Validation, integrity, redundancy

Services (2) • Preservation…with Access • Brings concerns of research libraries to bear on Services (2) • Preservation…with Access • Brings concerns of research libraries to bear on the way the scholarly record is cared for and made available – – – Scholarly Resource Bibliographic Search Full text search Collections Full PDF download of public domain

Services (4) • Rights Management – Rights Database – Copyright review • US 1923 Services (4) • Rights Management – Rights Database – Copyright review • US 1923 1963 • 188 k candidates, 85 k reviewed • 60% in public domain • Data Distribution – Metadata files, Bib API, Data API • Print on Demand

Services (5) • • Community Development Environment Non Google Ingest Non Book/Non Journal Ingest Services (5) • • Community Development Environment Non Google Ingest Non Book/Non Journal Ingest Computational Research

Outlook • Leverage partner resources and input to create and maintain the library of Outlook • Leverage partner resources and input to create and maintain the library of the future • This is our library • The more we use it, the better it will become

Governance Governance

Governance Budget/Finances Decision making Strategic Advisory Board Executive Committee Hathi. Trust Guidance on Policy, Governance Budget/Finances Decision making Strategic Advisory Board Executive Committee Hathi. Trust Guidance on Policy, Planning

Partnership & Resources Partnership & Resources

Funding • Funded for a initial 5 years with base funding from partners • Funding • Funded for a initial 5 years with base funding from partners • 3 year review of governance and sustainability • Budget – separately held within UMich budget system • Cost Models – Per GB cost of storage per year with a one time fee on new content to build a capital fund – Volume overlap

Cost Model 1 Reasonable costs of sustaining the archive, includes cost of replacement, capital Cost Model 1 Reasonable costs of sustaining the archive, includes cost of replacement, capital fund

Cost Model 1 • Economies of scale keep costs low – $0. 145/volume/year for Cost Model 1 • Economies of scale keep costs low – $0. 145/volume/year for Google digitized – about $0. 45/volume/year for IA digitized • Advantages not fully known until you jump in

Cost Model 2 • Shared space to deal with shared problems – Use Hathi. Cost Model 2 • Shared space to deal with shared problems – Use Hathi. Trust as part of broader library strategies • Beginning to see benefits of aggregating this body of materials together – Overlap, collection development – Coordinated print management – Begin to ask “What is missing”?

Cost Model 2 For public domain volumes: (PD*X*C)/N For a given in copyright volume Cost Model 2 For public domain volumes: (PD*X*C)/N For a given in copyright volume : IC=(C*X)/H • • Share in costs of curation Share in uses of relevant materials Voice in future directions Free riders?

Staff • Staff/Expertise – highly integrated – Project managers, IT and communications staff, copyright Staff • Staff/Expertise – highly integrated – Project managers, IT and communications staff, copyright experts, administrators (UM, Indiana and UC taking the lead) • Working groups • Shared development space

Enterprise Management Governance Communication and Coordination with partner institutions Budget, Finances Decision making Project Enterprise Management Governance Communication and Coordination with partner institutions Budget, Finances Decision making Project management Policy Planning Repository Administration Hardware configuration and maintenance Data management (content storage, backup, integrity checks, deletion) Rights Management Bibliographic Data Management Security Hardware selection and replacement Content and Metadata specifications Permissions Entity description (record level) Copyright review Web and application server configuration and maintenance Copyright determination Object identification (item level) Copyright information management (database) Data availability Collection Development Digital • Expansion beyond books and journals (born digital, images and maps, audio) • Selection of content (for non Google volume ingest and pilots projects) Print • Cloud Library (effect of digital on print) Rightsholder permissions Disaster Recovery Logging e Commerce Content Ingest Content Access Processes for ensuring content integrity Quality Assurance User Services Financial contributions of partners Transformation Page. Turner Quality Review Usability Validation Print on Demand Collection Builder Content Certification User support (helpdesk) Large scale Search Research Center Bibliographic Catalog APIs Hathi. Trust Functional Framework Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e. g. , DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy

Working Groups Current • Quality • Discovery Interface (with OCLC) • Collections • Communication Working Groups Current • Quality • Discovery Interface (with OCLC) • Collections • Communication • Usability Past • Storage • Research Center

Google Settlement (1) • • • 2005, Author’s Guild, AAP sued Google claimed fair Google Settlement (1) • • • 2005, Author’s Guild, AAP sued Google claimed fair use Settlement – 2008 Amended – Nov 2009 Works covered – registered with U. S. copyright office, Canada, UK, Australia • Works not covered – public domain, published after 5 Jan 2009

Google Settlement (2) • Google continues scanning • In copyright, non commercially available out Google Settlement (2) • Google continues scanning • In copyright, non commercially available out of print work – Sell individual access, any book retailer 63% of revenue to rights holders, distributed by BRR – display up to 20% – Copy & paste and printing – Rights holders can open access, distribute under CC, set printing limits – Institutional subscription (available to libraries, fee based on FTE users) • Includes unclaimed works – BRR required to search for rights holders and hold revenue on their behalf • Public access terminals • Cash payments to Rightsholders whose works were scanned before May 5, 2009

Book Rights Registry • Book Rights Registry – Represent the interests of the Rightsholders Book Rights Registry • Book Rights Registry – Represent the interests of the Rightsholders – equal representation of Author and Publisher sub classes on board; one author and publisher representative from US, UK, Canada, Australia; court appointed representative for rights holders of unclaimed works – Establish and maintain a database of contact information for authors and publishers; – Use commercially reasonable efforts to locate Rightsholders; – Distribute payments received from Google for the Rightsholders’ share of revenues; and – Assist in the resolution of disputes between Rightsholders. – Funded by Google (initial 34. 5 million, ongoing percentage of revenues) http: //www. googlebooksettlement. com/help/bin/answer. py? hl=en&answer=118704

Settlement for Hathi. Trust • Complementary – Settlement provides access to covered works, Hathi. Settlement for Hathi. Trust • Complementary – Settlement provides access to covered works, Hathi. Trust is preservation, trust for the future – Research Center (75% of Google Book Search scanned from Hathi. Trust partner libraries) • Specifically sanctions – Section 108 uses, access for users with print disabilities, computational research • Does not allow – Fair use, sale of access, interlibrary loan, e reserves, use in course management systems

Publishing Libraries would like to buy more e. Books Cost is high Not good Publishing Libraries would like to buy more e. Books Cost is high Not good models for consortia (multiple users) Move to on demand purchase, leasing of volumes • Do we need to own it? • •

Changing Library Landscape • Leverage collective resources, expertise – Drive costs down – Increase Changing Library Landscape • Leverage collective resources, expertise – Drive costs down – Increase discoverability, use – Improve strength of archiving – Reduce redundancy of collections (digital and print), effort – Address collective challenges • Focus on local resources and services • Redefine who we are, what we provide – Collections, research

Thank you! Thank you!