e98aadd644652bb3278cb2d3c8a497f5.ppt
- Количество слайдов: 27
International Atomic Energy Agency Digital Preservation Session Tue, 4 Nov 2008 34 th INIS Liaison Officers’ Meeting 3 -5 Nov 2008, Vienna, Austria S. Rieder, G. St-Pierre, Y. Reynaud-Pulido, T. Kalapurackal Database Production and Imaging Group, INIS Unit INIS & NKM Section
Digital Preservation at INIS Mission: • preservation of nuclear knowledge • serving as a reservoir of nuclear information • provision of quality information services • promotion of a culture of “information and knowledge sharing“ 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 2 International Atomic Energy Agency
Digital Preservation at INIS ü INIS Non-Conventional Literature (NCL) v Production of the INIS electronic Full Text Database ü Digital Preservation Activities v Digitization projects at IAEA Ø at Member States Ø 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 3 International Atomic Energy Agency
Digital Preservation at INIS Objectives: ü ü ü Consistent, high-level of image quality Interoperability and accessibility of digitized resources Long-term preservation of digital resources for future generations Member States IAEA ü Develop good practices for digital preservation v ‘Overview of INIS Digital Preservation Practices ’: INIS Information Letter No. 253 & Attachment (2008 -10 -03) http: //www. iaea. org/inisnkm/marea/restrictedpdf/2008/infoletter 253_attachment. pd f 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 4 International Atomic Energy Agency
INIS Digital Preservation Principles v INIS principles and workflow base on Cornell University’s digital imaging tutorial: http: //www. library. cornell. edu/preservation/tutorial/index. html v available in English, French, Spanish 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 5 International Atomic Energy Agency
INIS Workflow Ø Ø Ø Document Benchmarking Document Preparation Scanning Quality Control Image Enhancement Metadata Creation/Validation Export including Compression Completeness Check Back-up Post-processing Storage and dissemination 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 6 International Atomic Energy Agency
Benchmarking & Document Preparation Benchmarking: v v v v Adequately capture the ‘original’ content in digital form? Physical format & condition meets digitizing requirements? What is the type of material to be digitized? Which resolution? At which bit-depth? Which compression parameters? Estimated accuracy level for OCR? Other considerations? Preparation : v v v Physically (unbind, remove staples/clips, etc. ) Structurally (add/remove barcodes, separate chapters, parts, etc. ) Characteristics of paper (eg. size, thick, glossy/mat, condition) 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 7 International Atomic Energy Agency
Scanning – Capture Modes & Optical Resolution Capture modes: depends on the physical form of original Ø Ø Bitonal: Greyscale: Ø Colour: 1 bit/pixel – black & white (printed text) 8 bits/pixel – 256 grey shades (black & white photographs) 24 bits/pixel – 16 million colours & grey shades (continuous tone & colour) Optical Resolution: Ø “dots per inch” (DPI) or “pixels per inch” (PPI) Ø High resolution fine detail Ø Bit depth: large file size amount of information captured Ø Greater bit depths more accurate representation 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 8 International Atomic Energy Agency
Scanning at INIS – Capture & Optical Resolution INIS practice: Ø Standard Scanner Settings (for Plain b/w text): Ø bitonal (black & white) Ø 300 dpi Ø Special Cases (colour, pictures): Ø greyscale and colour Ø 200 – 300 dpi with 8 bit depth (256 colours/tones) Ø IMPORTANT: post-processing image compression needed to reduce file size Ø NEVER use colour settings to scan B/W documents 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 9 International Atomic Energy Agency
Quality Control - QC Retain: value utility integrity of resources Verify: quality accuracy consistency INIS verifies: Ø Ø Ø Ø accuracy & completeness (eg. same number of pages? ) data integrity correctness of metadata form and validity correct matching of metadata and image files ‘checksum’ algorithm (authenticity & integrity of digitized files) number & order of bytes (eg. after move, copy, transfer, burn) Ø visual inspection: resolution, colour, tone, appearance attn: changeable light & monitors 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 10 International Atomic Energy Agency
Image Enhancement Definition: Any process applied to the raw scan to improve quality or legibility of the resource Image Enhancement at INIS: despeckling v deskewing v noise reduction v black border removal v colour and tone adjustment, etc. v 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 11 International Atomic Energy Agency
Quality Control and Image Enhancement (1) q Skewed? 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 12 International Atomic Energy Agency
Quality Control and Image Enhancement (2) q Noisy (e. g. unnecessary dots)? 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 13 International Atomic Energy Agency
Quality Control and Image Enhancement (3) q Black border ? 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 14 International Atomic Energy Agency
Quality Control and Image Enhancement (4) IMPORTANT • Paper Size must match document hard copy • A 4 ≠ Letter Size • Text cut = RESCAN • If noticed during QC of incoming PDF, INIS will request the Input Centre to resend the page 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 15 International Atomic Energy Agency
File Formats Ø Very important: Prefer ‘non-proprietary’ formats Ø Several standard file formats exist Ø different resolution, bit-depth, colour capabilities, etc. INIS Digital Collection: 1. From ‘Paper’ or ‘Microfiche’ Ø Ø 2. to ‘Digital’: Master images in TIFF Group IV (b/w), in JPEG (colour) Majority Full-Text searchable PDF Digital files received from INIS National Centres Ø Ø PDF Compression: JBIG 2 (b/w) , JPEG (colour) 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 16 International Atomic Energy Agency
Preservation Formats Ø PDF: open standard – official ISO 32000 -1: 2008 Ø PDF/A: Long-term archiving of electronic documents ü Creation of PDF documents whose visual appearance will remain the same over the course of time ü Official ISO standard: ISO 19005 -1: 2005 ü Further development ongoing http: //www. pdfa. org INIS: Ø considers adopting PDF/A ü for efficient preservation ü long-term archival of the Agency’s and Member States’ nuclear information resources Ø pilot project in 2009 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 17 International Atomic Energy Agency
OCR – Optical Character Recognition ü Printed text searchable as electronic text ü Primary objective for INIS digitization projects: creation of ‘searchable full text’ INIS: v major tool for mass production: ABBYY Fine. Reader 8 v ~ 98% accuracy: printed text in Latin & Cyrillic characters Satisfactory testing with Script and Arabic Characters: v Adobe Acrobat Professional 8. 0: Chinese (Simplified), Japanese, Korean v ABBYY Fine. Reader Pro 9: Hebrew, Thai v VERUS™ Professional: Arabic 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 18 International Atomic Energy Agency
Various OCR types v Typewritten v Hand print and cursive v Fraktur v Music scores v MICR (Magnetic Ink Character Recognition) 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 19 International Atomic Energy Agency
OCR process 1 (no or wrong dictionary) International Atomic Energy Agency
OCR process 2 (proper dictionary) International Atomic Energy Agency
OCR - value added Image Layer Hidden Text errors • Scanned (raster) Image • Visual representation of Enables full-text search Extra information for the original document 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT search engines 22 International Atomic Energy Agency
Storage of Digital Files v Mandatory: reliable & controlled environment Ø Storage of master files: high quality, industry standard devices, eg. CD-R, DVD, or other contemporary reliable media Ø Backup of master files: regularly, off-site, secure location Ø RAID: Redundant Array of Independent Disks Ø several drives act collectively as a single storage system Ø consider RAID for large production environment INIS: v v v THECUS N 5200 B PRO, 5 x 3, 5" SATA Raid 5 disks 1 TB each configured as local network data storage 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 23 International Atomic Energy Agency
Back-up and Off-Site Storage ü Create: ü Store: regular back-ups of master files remote from the original source in a secure location INIS: Ø 1970 to 1997: ‘microfiche’ v NCL full text: paper microfiche v safe, long-term storage INIS National Centres v full set of NCL microfiche Austrian Central Lib. of Physics Ø From 1997: ‘digital’ v NCL on CD: INIS Document Delivery Centres (National Centres) Secure “off site” & back-up: Austrian Central Lib. of Physics v v 2008: microfiche to PDF Austrian Central Lib. of Physics v INIS National Centres v INIS Online Database 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 24 International Atomic Energy Agency
Preservation Planning Ø Contents of digital files must remain ‘meaningful’ Different processes: 1. Refreshing: copy files from one storage medium to another v verify authenticity & integrity of the files (e. g. checksum) 2. Migration: transfer files from one HW & SW to another or from one computer generation to next generations Ø format-based: move files from ‘obsolete’ format to ‘new’ format 3. Emulation: re-create technical environment v maintain information about HW & SW = system reengineered INIS: Refreshing Ø CD to DVD (until 2007) Ø from 2008: copy to Thecus storage device Ø When PDF/A implemented: ‘migration’ 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 25 International Atomic Energy Agency
Metadata v Key role for digital resources: Ø Describe, process, manage, track, access, preserve INIS: comprehensive ‘bibliographic’ metadata describe the intellectual content of full text bibliographic elements to identify & retrieve resources INIS Database: digital resources with bibliographic metadata Ø Ø v Technical metadata for digital resources: automatic creation with PDF files v Future: more sophisticated approach with implementation of PDF/A 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT 26 International Atomic Energy Agency
International Atomic Energy Agency Thank you for your attention! Your INIS Digital Preservation Team 34 th ILO Meeting, 3 -5 Nov 2008, Vienna, AT


