d45e0dce7370f2ce66621e2673b93652.ppt
- Количество слайдов: 61
Trials and Tribulations: Archiving Electronic Records Adam Jansen Digital Archivist Washington State Archives
Records and Information or, Why we do what we do If - Information is power… And - Records are storage of information Then – Records must be preserved for future generations
Shifting Media • Historically records were stored on paper, kept in filing cabinets – When the cabinet was full, records sent to file room • Now records stored electronically on computers – When the computer is ‘full’ – add more hard drives Basic skills to manage and maintain records has been lost, replaced by infinite storage
Higher Standards • As electronic records become more integrated into society, producers of those records will be held to higher standards of conduct – HIPPA – SOx – Federal and State Mandates – Case Law
WA Public Records Laws As defined in RCW 40. 14 ANY records that have been made by or received by any agency of the state of Washington in connection with the transaction of public business
Records Retention Any destruction of official public records shall be pursuant to a schedule approved under RCW 40. 14 Why? . . . The foundation of democracy in America is government accountability to the people
So the question becomes… who takes care of the records, and do they have the knowledge?
Caretakers of Information • Historically records sent to file room, staff maintained access to records and managed lifecycle based on need and legal requirements • Now records are managed by users and IT staff, based on capacity and cost – Neither trained in the ‘science of information management’
Why a Digital Archives? • Comply with statutory & regulatory mandates. – The Law requires preservation of certain public records – it doesn’t specify whether those records are paper or electronic. All records must be given the same care. • Avoid loss of legal & historical records – As technology changes, the older media (5 ¼” floppy disks, for instance) become harder to read. • Centralize Records – Centralization means uniformity in maintenance – ‘Trained professionals’ serve as caretakers • Preserve rare and ‘at-risk’ paper records • Improved access for citizens – By centralizing historical electronic records in one location, ‘onestop shopping’ will provide the information quicker and easier
What the Digital Archives is not • Not mass storage for active business applications & data • Not remote back-up for state & local government networks & data
The Digital Archives will: • Preserve electronic records with long-term legal, historical and/or fiscal significance • Assure platform-neutral retrieval 50, 100, or more years from now • Provide security back-up of certain permanent electronic legal records (courts, vital records, land records, etc. )
Project History • 2001 Session – Legislative approval (SSB 6155, 20012003 Capital Budget) • January – September 2002 – Building Programming • January 2003 – Building construction begins • September 2003 – ISB technology review • October 2004 - Grand Opening • Q 4 2006 – Full implementation
Monies In and Out • Primary funding source - $1 surcharge • Expenditures – $14. 5 M joint use facility – $1. 5 M technology acquisition – $950, 000 Software Development – Ongoing budget of $2. 1 M/year
Requirements to E-Archive • • Hardware Software Management Authenticity
Hardware • File Room of the 21 st century • Capacity and Speed double every 18 months • Many choices – Tape – Optical – Spinning Disc First Immutable Law of Digital Archiving “What hardware you use today will be obsolete within four years”
Digital Archives Hardware • Network – Cisco Backbone end to end – LAN and SAN • EMC – SAN storage – 5 TB now, 20 TB by end of Year • HP – Servers and desktops • ADIC – Tape Library for offsite, disaster recovery • Microsoft – Software and Development w/EDS
Archival Software Formats • • • Native ASCII TIF PDF/A XML Whenever possible seek the Open, documented solution! Remember Word. Star and DBase II ? ? ?
File Formats Digital Archives Multi-pronged approach: Stored as BLOBs in DB with metadata: • Maintain native format, wrapped • Create open file format version • Render XML formatted version, wrapped • Acquire original hardware and software
Content Management • Essential to maintain control of the information explosion • Allows hard coded rules and information exchange • BUT still requires a strong knowledge, understanding and implementation of basic records management Second Immutable Law of Digital Archiving: “Data is Data, a Record is a Record, It is the content that drives retention, not the media”
‘Content Management’ • Not true CM but rather archival storage and retrieval • Do. D 5015. 2 -STD compliant system • Wrap original file in native format • Wrap XML copy • Apply metadata & XML for indexing, searching & retrieval • Provide chain of custody & authenticity
‘Content Management’ • • • Microsoft Solution Custom Coded. Net front end SQL Server back end Biz. Talk translation utility SSH Tectia for secure transport
Authenticity • Maintain Chain of Custody • In the care of trusted 3 rd party • Received from trusted, known source
Data Security • • Encrypted SSH FTP transmission Issue Digital Certificate Verify IP and computer information MD 5 Hash on all original files Copy of FTP on tape prior to ingestion DB backups on tape Record Level Security for confidential Info
FTP Fingerprint • FTPUpload Date="8/23/2005 9: 13: 05 AM" NTUser. Name="temp" Domain="CRISPLUS" SFTPUser. Name="Franklin. Co. Auditor" Host. Information Windows. Version="Microsoft Windows NT 5. 0. 2195. 0" • CPU ID="x 86 Family 15 Model 2 Stepping 9, Genuine. Intel" Level="15" • Local Area Connection: Connection-specific • DNS Suffix. : annex. co. franklin. wa. us • Description. . . : Intel(R) PRO/100 VE • Physical Address. . : 00 -0 D-60 -3 C-22 -34 • DHCP Enabled. . : Yes • Autoconfiguration Enabled. . : Yes • IP Address. . . : 172. 30. 7. 39 • Subnet Mask. . . : 255. 0 • DNS Servers. . . : 172. 30. 7. 2, 198. 239. 73. 3 • Primary WINS Server. . . . : 172. 30. 7. 2 • Secondary WINS Server. . . : 198. 239. 73. 3
Record Level Security • Restrict records at item, field or series level • Restrict to individual, dept, office or global • Uses authenticated login to reveal fields • Anonymous users see ‘Restricted’
Open Record
Restricted Record Confidential
MOU
Ingestion Process • MUST be flexible – No Mandate and 3300 agencies • Microsoft Biz. Talk 2004 • Transforms, adds metadata based on business rules • Creates ‘deep storage’ copy wrapping original file in XML, with Hash • Creates ‘web’ version of original file
Biz. Talk 2004
fname firstname Fst_name first Jun-07 -05 07 -Jun-05 06/07/2005 First_Name Biz. Talk Predefined Pipelines 06/07/2005
Deep Storage XML Schema Record Common • Who Vital Records • What • Type • When • Where • Original File • ‘web’ file • Security • Fixity Birth • Date of • Father, Mother • Hospital
Deep Storage XML
Archive Database • Designed around latest industry standards • Open source, non-proprietary file storage • Applies metadata ‘tags’ to save information about record – creator, date, agency, subject, etc. • Provides chain of custody & authenticity of record • Allow search and retrieval of archival records through a web page
Web Design Wire Frame www. digitalarchives. wa. gov
Admin Pages • • Requires authenticated log-in Allows viewing of confidential information E-Transmittal process Viewing of open orders
Who’s Visiting? ? ? • • Avg over 300 visits per day Avg length of stay 9 minutes 6%. gov - 4%. edu - 1%. org 13% came from Internet Search (Google, MSN, Yahoo) Visitors from: Canada, US Military, Romania, Germany, France, Australia, Japan, UK, Netherlands, Russia, Thailand, Portugal, Belgium, Poland, Italy, Indonesia, Singapore, Sweden, Mexico, New Zealand, Czech Republic, Hungary, Brazil, Norway, Columbia, Austria, Greece, Bulgaria, China, Yugoslavia, Philippines, Spain, South Korea, Denmark, Oman, Pakistan, South Africa, Jamaica, Switzerland
Risks • Distributed, non-standardized environment • No mandate to use Digital Archives • Limited technology expertise in some agencies • Unpredictable data growth rate • Few business models • Emerging technologies • Limited internal expertise
Management Issues • • • Authenticity of record Metadata File naming conventions Corporate Culture Start small with e-mail, web page Use existing retention schedules Educate Shift AWAY from desktops Management Software is a must! Privacy of sensitive data
Third Immutable Law “Anything that you do today, will need major overhaul in two years” Technology and industry changing at unprecedented rates… But, more records are ‘lost’ every day! – Key is to be flexible and attack with forethought
Digital Archives Eastern Washington University, Cheney, Washington Adam Jansen Digital Archivist ajansen@secstate. wa. gov
Secure FTP
Custom FTP Configuration • • Uses SSH Tectia client 128 Bit Encryption Ease of use Minimal user interaction/Intervention Simple notification XML log file output Digital Footprint
Right Click Send to
Drag and Drop
Double Click Send
Notifications • • • Minimal Notification Minimal User interaction Ease of understand of Notification Quick notification of errors. Ease of Cleanup of sent files.
“No Data” Error
Duplicates
Possible Errors
Completion Delete
E-Commerce
Add to Shopping Cart • Ecommerce Functionality – Add to Shopping cart
Shopping Cart
Shipping Info
Billing Information
View and Submit Order
Confirmation
Order Request
d45e0dce7370f2ce66621e2673b93652.ppt