9573786e41773e03a1e259823cb71ce0.ppt
- Количество слайдов: 18
The British Library’s METS Experience The Cost of METS Carl Wilson carl. wilson@bl. uk
Introduction n A relatively young organisation, formed in 1971 n A large collection of items, approximately 20 million n A rapidly growing collection of digital items, between 30 and 50 Terabytes n A large budget BUT n The British Library is a large organisation with many responsibilities n Large collections mean that efficiency is essential n There seems to be a misconception in some quarters that METS is expensive n Our experience suggests that METS saves costs but creating and collecting metadata to archive and preserve digital objects can be expensive regardless of methods used 2
The OAIS Reference Model n OAIS is the reference model for an Open Archival Information System n Provides a framework and a common vocabulary for archival concepts n Focused on long term digital information preservation and access n Key Terms: n Submission Information Package (SIP) n Archival Information Package (AIP) n Dissemination Information Package (DIP) 3
SIPs, AIPs, and DIPs are all Information Packages n An Information Package contains Content Information and Preservation Description Information Content Information Preservation Description Information Packaging Information Descriptive Information About Package 4
OAIS Archive External Data n High level view of OAIS data flow Producer OAIS Archive Submission Information Package Archival Informatio n Package Dissemination Information Package Consumer 5
The British Library’s Digital Object Management System n Developed in response to Legal Deposit Legislation n In principal a copy of all digital material published in the United Kingdom must be deposited at the British Library n The British Library can claim material from the producer n In practise the legislation is not yet in place, a Parliamentary Committee is still working on practical legislation 6
The British Library’s Digital Object Management System n Developed in house n Intended to provide a single preservation level store for the British Library’s digital content n Standards based n Design modeled to fit the OAIS Reference Model n We decided to use METS as: n Submission Information Package n Archival Information Package n Dissemination Information Package 7
Why Use Standards? n Why should an organisation use standards? n Avoid duplication of effort n Build upon the work and best practices of other organisations n Data and metadata standards facilitate exchange of information between organisations using the same standards n REDUCES COSTS 8
Why Use METS? n METS uses XML for metadata representation n XML is a W 3 C standard for data representation and interchange n Unicode n Machine interpretable when validated, use of schema is important n Human readable, and editable using widely available tools n Accompanying standards for schema (DTD and XSD) and transformation (XSLT) n METS was the emerging standard for the encapsulation of data and metadata representing digital objects n Fits the requirements for SIPs, AIPs, and DIPs n METS documents can be validated against a schema 9
Voluntary Deposit of Electronic Publications (VDEP) n A pilot scheme started in anticipation of Legal Deposit legislation in 2001 n Content producers voluntarily submit digital material to The British Library n Electronic content submitted to The British Library on physical carrier, e. g. CD / DVD or by email attachment n VDEP Team catalogues material and then it is managed and accessed using Digitool, a Digital Asset Management system from Exlibris n Selected as the first source of content for DOMS 10
The Ingest of VDEP Material into DOMS Digitool XML Export of Digitool Metadata XSLT Transformation Content by reference DOM SIP METS Document Digitool Content by reference Metadata Ingested Content Ingested Digital Object Management System DOM AIP 11
The Details n Descriptive metadata as MARC 21 XML n Validated to schema n Technical Metadata preserved in proprietary Digitool XML format n This format was documented but no schema was produced n In retrospect this was a mistake n Since rectified by using JHOVE to automate technical metadata production since Digitool 3 introduced n Original material ingested may have to be revisited n All other metadata provided by single text documents referenced in the METS AIP n Rights statement and source statement 12
Lessons Learned n All METS AIPS are validated against schema and can be used by automated systems n Descriptive Metadata section is also valid n All other metadata is difficult to use without bespoke development n The system is entirely automated, barring the creation of the catalogue record n A quarter of a million METS documents produced at little cost 13
Other Automated Ingest Streams n Sound Archive Ingest n Thousands of 2 Gigabyte master wav files n Descriptive metadata gathered from Sound Archive catalogue via Z 39. 50 and transformed from raw MARC to MARC XML. n Technical metadata held in the MARC file, this is a Sound Archive convention n Again single text documents for rights and source metadata n Automated production of METS documents again reduces costs n 19 th Century Book digitisation n The outsource digitisation of one hundred thousand books n 25 million JPEG images, and one hundred thousand PDFs n MARC XML records obtained from OPAC n Technical metadata created using JHOVE 14
The Cost of One Offs n The British Library is involved in many single item Digitisations n Codex Sinaiticus n An early hand written master copy of the bible n The Canterbury Tales n Two early manuscripts including correlation of one edition to the other n The Shakespeare Quartos n Once again historical manuscripts with correlation between editions 15
Codex Siniaticus 16
Conclusions n The use of METS is not expensive n The use of standards cuts costs by building upon the work of others n Automated production of METS documents is cheap n Use of schema validated documents for automated creation n There are sometimes unavoidable costs n Individual historical documents have costs associated with hand crafting metadata structures n METS doesn’t introduce these costs, the process would always add expense 17
Where Next? n The British Library is involved in many single item Digitisations n Codex Sinaiticus n An early hand written master copy of the bible n The Canterbury Tales n Two early manuscripts including correlation of one edition to the other n The Shakespeare Quartos n Once again historical manuscripts with correlation between editions 18
9573786e41773e03a1e259823cb71ce0.ppt