a1615dd819912bd0e9b7308537f81576.ppt
- Количество слайдов: 9
Designing Storage Architectures for Preservation Collections Library of Congress, September 17 -18, 2007 Preservation and Access Repository Storage Architecture Stephen Abrams Harvard University Library stephen_abrams@harvard. edu
Digital preservation at Harvard • Obligation to ensure the ongoing usability of library digital assets over time • Digital Repository Service (DRS) – Managed preservation and access repository – Seven years of production operation – 6. 7 million assets (27 TB) • Primary strategy: redundancy and heterogeneity • Primary challenge: scaling
Scaling: linear or exponential?
Storage classification • All managed assets are assigned a storage classification – Public use (U) High availability, fast response – Archival storage (A) High capacity, low cost • Use assets are optimized for web-friendly delivery • Archival assets are optimized for longevity • Asset classification is known at the point of acquisition
Architectural requirements • Each asset is stored: – – In at least 3 physical locations On at least 2 storage mediums With at least 2 on-line copies (U) / 1 on-line copy (A) With at least 1 off-line copy • Ongoing auditing for bit-level error detection and correction • Virtualization layer with uniform interface to all assets, regardless of physical medium • Application interface exposed as NFS-mountable file systems
Storage architecture
Storage architecture • QFS cache and primary U disk archive on EMC CX 3 -40 (FC / SATA, RAID-1/ RAID-5) at on-campus data center • Redundant switched FC data paths to primary / fail-over Sun T 2000 / Solaris file servers running SAM-QFS • Primary A / secondary U disk archive on EMC CX 3 -80 (FC / SATA, RAID-1/ RAID-5) at off-campus data center • Redundant FC data paths to T 2000 file server running SAM-QFS • Secondary A / tertiary U tape archive on Storage. Tek SL 500 (LTO-3) FC-attached to primary on-campus T 2000 • Tertiary A / quaternary U tape archive on LTO-3 media at off-campus managed storage facility • Disk archives are UFS file systems containing Tar files; even with the loss of the SAM infrastructure they are susceptible to full (if timeconsuming) recovery with standard Unix / Linux tools
Storage virtualization • SAM-QFS reader / writer on primary on-campus T 2000 file server • SAM-QFS reader on fail-over on-campus / off-campus T 2000 file servers • All U and A assets written to QFS cache on CX 3 -40 • Immediate creation of all UFS disk and LTO-3 tape archive copies • Immediate release from cache with “stage never” • SAM manages all copies of all assets; externally each asset appears as a single file in an NFS-mountable file system • Application access requests are initiated by NFS reads and are fulfilled directly from primary disk archive copy without staging to cache
Issues • Disk vs tape • LTO-3 vs LTO-4 • Tape archive media pooling • All hardware / software installed; currently engaged in configuration and preliminary unit / integration testing • Need to establish benchmarks for system performance • Planning for migration from existing storage solution • Automated data classification • Response to an anticipated escalating rate of asset acquisition – – Google mass digitization Web archiving Audio / video content Scientific data sets
a1615dd819912bd0e9b7308537f81576.ppt