Скачать презентацию Anatomy of Aggregate Collections Exploring Mass Digitization and Скачать презентацию Anatomy of Aggregate Collections Exploring Mass Digitization and

58ba8bd14ffb924fabce169cf26784ac.ppt

  • Количество слайдов: 19

Anatomy of Aggregate Collections Exploring Mass Digitization and the “Collective Collection” Brian Lavoie Research Anatomy of Aggregate Collections Exploring Mass Digitization and the “Collective Collection” Brian Lavoie Research Scientist OCLC Research NELINET September 21, 2006

Road map § Aggregate collections as a tool for understanding mass digitization projects • Road map § Aggregate collections as a tool for understanding mass digitization projects • “Anatomy of aggregate collections: the example of Google Print for Libraries” (d-Lib, September 2005) § Digital preservation and mass digitization § Conclusion

The shrinking “width of the border” Distance Metrics Physical Collection A Technical Economic Collection The shrinking “width of the border” Distance Metrics Physical Collection A Technical Economic Collection B

Aggregate collections § Definition: combined holdings of multiple institutions, viewed as a single collection Aggregate collections § Definition: combined holdings of multiple institutions, viewed as a single collection 2 institutions, consortium, all libraries everywhere … • World. Cat: aggregate collection of more than 70 million items, held by more than 25, 000 institutions worldwide • § Libraries embedded more deeply in networks of collaboration and coordination Decisions increasingly taken in context of inter-institutional environments, rather than local collection in isolation • Shift in focus to resources of the “system”, rather than individual collections • § As library networks develop and expand, opportunities arise to create value through collective action, or by aligning local collections with aspects of the system-wide environment

Anatomy of aggregate collections § Analysis of aggregate collections supports … Collaborative decision-making: direct Anatomy of aggregate collections § Analysis of aggregate collections supports … Collaborative decision-making: direct collaboration by libraries (for example, collaborative storage strategies) • “Decision-making in context”: local decision-making made in a larger context (for example, selecting print materials for digitization, given what has already been digitized elsewhere) • § Better understanding of the anatomy of aggregate collections critical for wide range of library decision-making contexts: • Collection management (cooperative collection development, shared off-site storage, collaborative preservation) • Deeper resource sharing (meta-search, reducing frictions in resource sharing networks) • Mass digitization § OCLC Research activities aimed at mobilizing library data (World. Cat) to understand manage aggregate collections

Mass digitization and aggregate collections Google Book Search (aka Google Print for Libraries) Aggregate Mass digitization and aggregate collections Google Book Search (aka Google Print for Libraries) Aggregate collection of digitized print books (combined holdings of Harvard, Michigan, Oxford, NYPL, and Stanford) Focus on copyright issues; very little discussion of Google Book Search as aggregate collection http: //www. dlib. org/dlib/september 05/lavoie/09 lavoie. html

The system-wide print book collection as represented in World. Cat (January 2005) ~55 million The system-wide print book collection as represented in World. Cat (January 2005) ~55 million ~32 million print books ~41 million ~35 million More information: Schonfeld & Lavoie “Books without Boundaries: A Brief Tour of the System-wide Print Book Collection” Journal of Electronic Publishing, Vol. 9, No. 2, Summer 2006 http: //www. hti. umich. edu/cgi/t/text-idx? c=jep; cc=jep; view=text; rgn=main; idno=3336451. 0009. 208

G 5 coverage of system-wide print book collection 10. 5 million unique books G 5 coverage of system-wide print book collection 10. 5 million unique books

Holdings overlap Potential redundancy rate of 40 percent Holdings overlap Potential redundancy rate of 40 percent

Language distribution Language English German French Spanish Chinese Russian Italian Japanese Hebrew Arabic Portuguese Language distribution Language English German French Spanish Chinese Russian Italian Japanese Hebrew Arabic Portuguese Polish Dutch Latin Korean Swedish All others Google 5 0. 49 0. 10 0. 08 0. 05 0. 04 0. 03 0. 02 0. 01 0. 07 System-wide 0. 52 0. 08 0. 06 0. 04 More than 430 0. 03 languages in 0. 03 Google 5 0. 04 collection 0. 01 0. 01 < 0. 01 0. 08

Cumulative age distribution of G 5 holdings > 80 percent of Google 5 collection Cumulative age distribution of G 5 holdings > 80 percent of Google 5 collection still in copyright

Works Coverage slightly higher (35 %) Holdings overlap slightly greater (56 % held uniquely) Works Coverage slightly higher (35 %) Holdings overlap slightly greater (56 % held uniquely)

Some speculation … § What results would have been obtained if a different group Some speculation … § What results would have been obtained if a different group of libraries had been selected? § What incremental extensions to coverage can be obtained by adding additional library collections to original Google 5? § Chose 5 new libraries: • • • Small US liberal arts college Large US public university Large US private university Large US metropolitan library Large Canadian university

Beyond the Google 5 … Total holdings: Total unique books: % of system-wide: “New” Beyond the Google 5 … Total holdings: Total unique books: % of system-wide: “New” Google 5 ~8 million 5. 9 million 18 percent “Original” Google 5 ~18 million 10. 5 million 33 percent Redundant holdings: 26 percent 42 percent Impact by library type: Large US metropolitan library: Large US private university: Large Canadian university: Large US public university: Small US liberal arts college: % of holdings unique relative to original G 5 collection: 39 percent (most unlike G 5) 25 percent 23 percent 21 percent 13 percent (most like G 5)

“The Google 10” Google 10 collection: 12. 3 million books + 1. 8 million “The Google 10” Google 10 collection: 12. 3 million books + 1. 8 million (17 %) Original Google 5 (10. 5 million books) Diminishing returns? Original G 5: ~18 million holdings 58% unique New G 5: ~8 million holdings 22% unique

The challenge of digital preservation Authenticity/ Understandability ECONOMICS Render RIGHTS Media Management Secure Storage The challenge of digital preservation Authenticity/ Understandability ECONOMICS Render RIGHTS Media Management Secure Storage Description Capture/Selection “The Preservation Pyramid”Adapted from Priscilla Caplan (FCLA)

But … § Chris Rusbridge’s “digital preservation fallacies”: • • • Digital preservation is But … § Chris Rusbridge’s “digital preservation fallacies”: • • • Digital preservation is very expensive File formats become obsolete quickly Interventions must occur frequently Digital preservation repositories should have very long timescale aspirations The preserved object must be easily and instantly accessible in contemporary formats The preserved object must be faithful in all respects to original Source: Rusbridge, C. “Excuse me … Some Digital Preservation Fallacies? ” Ariadne February 2006; http: //www. ariadne. ac. uk/issue 46/rusbridge/ § Bottom Line: significant progress has been made, but: Still lack well-understood, standardized practices for preserving digital materials • No consensus on what “successful digital preservation” means •

Mass digitization and digital preservation Roles and responsibilities: Google? Libraries? Elsevier? JSTOR? Digitized books Mass digitization and digital preservation Roles and responsibilities: Google? Libraries? Elsevier? JSTOR? Digitized books as artifacts to be preserved, or disposable surrogates? Implications for redundancy in system? What uses can digitized output be put to? • Discovery/linking (e. g. , mbooks) • Text-mining Infrastructure to support largescale digital content management Efficient, automated workflows for preservation metadata “Last copy”

Summing up … § Distance between collections shrinking; mass digitization programs and other aggregate Summing up … § Distance between collections shrinking; mass digitization programs and other aggregate collections increasingly common features of library landscape § To mobilize aggregate collections, need to understand anatomy of aggregate collections – i. e. , data and analysis to support planning and collaboration Characterize and promote the “collective collection”: the collective library resource • Chart a course through mass digitization (e. g. , G 5 study) • § Mass digitization raises important questions about long-term preservation (summarized by “preservation pyramid”); need strategies to secure long-term future of digitization investments