f1790a82071b136000a3384f24c8abae.ppt
- Количество слайдов: 30
Harvesting and DAMS Glen Robson, DAMS Manager, National Library of Wales
What do we do when it gets here • • Normalise Meta data Migrate? Storage Access
Normalise Metadata • Consistency Convert to NLW standards (METS) Consistent METS between projects • Add technical metadata ▫ Link file format to PRONOM registry ▫ Automatic technical metadata Jhove or NZ metadata extraction tool • Add preservation metadata (PREMIS) ▫ Objects history
Harvesting • Take a copy of metadata and Thesis • Different formats ▫ PDF, Word and Text • Complex Objects ▫ E. g. 1 PDF per chapter
Migration • Input: ▫ ▫ ▫ 221 application/msword 4 application/octet-stream 114 application/pdf 3 application/vnd. ms-excel 340 text/plain
Now or later? • Migrate on ingest ▫ How do you choose the format? ▫ Storage Cost • Migrate on obsolescence ▫ Tools available?
Migration • Microsoft Word ▫ Can open it now ▫ Have to have a copy of Word • application/octet-stream ▫ Can’t open now
Storage • LOCKSS • University copy • NLW Copy ▫ ▫ Archive copy on tape Archive copy on Optical Disc Archive copy offsite Access copy • Ethos copy
Access • Convert to MARC ▫ Digital and Print in MARC ▫ Single Point of access for all collections • Mostly automated ▫ Best use of resources
Lessons Learnt and Problems Encountered • Started using Fedora in 2004 ▫ Ingested 3 Digitisation Project 2 Mass Digitisation ▫ Ingesting Video and Radio Programs • Started with Pilot • Purchased VITAL based on Fedora • Project Driven
Lesson 1: Physical carriers degrade or obsolete
Lesson 1: Physical carriers degrade or obsolete
Lesson 1: Physical carriers degrade or obsolete
Lesson 1: Physical carriers degrade or obsolete
Why is this a problem for the library? • Deposit ▫ Sometimes no choice on carrier ▫ Depositors aren’t in a position to change the carrier
Lesson 1: Physical carriers degrade or obsolete • • Age Storage conditions Sun light Temperature • “Widely differing claims have been made for the life expectancy of CD-Rs, but it is generally accepted that they will last longer than the associated technology and are therefore suitable for preservation purposes. CD-Rs offer storage capacities of 650 MB to 700 MB. CDRW is based upon a different recording process to CD-R, and is not recommended for archival storage. ” • http: //www. nationalarchives. gov. uk/documents/media_care. rtf
Practical Example • Deposit of CDs from Cliff Mc. Lucas and Brith Gof Theater company • 22% of the Cliff Mc. Lucas CDs • 60% from Brith Gof could not be copied or read. • According to the sleeves, many of the Brith Gof discs contain material relating to performances between about 1989 and 1992. • Only real solution is to copy data from carrier as soon as possible
CDAS
Lesson 2: Digital can get BIG • Wills Project ▫ 182, 404 Wills ▫ 816, 325 Images ▫ 998, 729 Fedora Objects • Welsh Journals ▫ 50 Titles ▫ Thousands of Pages • Offair ▫ 40, 000 Records • SCIF Newspaper and Magazines ▫ 2 Million Pages • Repository 3 Million plus Objects
Problems • • • Processing takes time Management Discovery Cost Cataloguing / Metadata
Lesson 2: Digital can get BIG • • Sgrîn – Cardiff Media Company closing down (2006) Collect data from Shared drive Stats: ▫ 29. 2 GB ▫ 68, 446 files Microsoft Word Documents: 32, 086 JPEG Images: 18, 093 Rich Text Format: 2, 707 Microsoft Excel Documents: 2, 498 Microsoft Works Word Document: 2, 127 Files with missing File extension: 2, 036 • Selection? • Cataloging?
Lesson 3: Metadata is expensive • Accessioning: ▫ Depositor adds metadata (Roda) ▫ Deposit comes with metadata (Ethos) • Digitisation ▫ Structure / Context ▫ From Catalogue ▫ Write Once use many • Automate as much as possible
Lesson 4: You can’t automate everything • Offair Recording • Original Plan: ▫ BOB System records programs Metadata from EPG ▫ Harvest from BOB create MARC record ▫ Ingest • Totally automated
Lesson 4: You can’t automate everything • Spanners in the works: ▫ Duplicate Recordings ▫ Failed Recordings ▫ EPG Errors • New workflow: ▫ BOB System records programs Metadata from EPG ▫ Fix failed validation records (Human Process) ▫ Harvest from BOB create MARC record ▫ Ingest
Lesson 5: Things Change
Ingest Early • • Items managed early Missing items picked up earlier Change / Creation at the same point 1 interface rather than 1 creation 1 edit • Preserve but allow change ▫ Systems make it difficult
Lesson 6: Workflows not Projects • Develop specific Project based workflows • Have to be customised each time • Symptom of project based funding • Digitisation Workflow • Generic Services ▫ Technical Metadata ▫ Checksums
Preservation Paranoia • Lesson we may learn: ▫ How much metadata is too much? ▫ How much technical metadata should we have? ▫ Migrations MS-Word: PDF Text Image of each page Open Office XML
Summary • • Physical carriers degrade or obsolete Digital can get BIG Metadata is expensive You can’t automate everything Things change Workflows not Projects Preservation Paranoia
Questions
f1790a82071b136000a3384f24c8abae.ppt