Скачать презентацию Trials and Tribulations Archiving Electronic Records Adam Jansen Скачать презентацию Trials and Tribulations Archiving Electronic Records Adam Jansen

d45e0dce7370f2ce66621e2673b93652.ppt

  • Количество слайдов: 61

Trials and Tribulations: Archiving Electronic Records Adam Jansen Digital Archivist Washington State Archives Trials and Tribulations: Archiving Electronic Records Adam Jansen Digital Archivist Washington State Archives

Records and Information or, Why we do what we do If - Information is Records and Information or, Why we do what we do If - Information is power… And - Records are storage of information Then – Records must be preserved for future generations

Shifting Media • Historically records were stored on paper, kept in filing cabinets – Shifting Media • Historically records were stored on paper, kept in filing cabinets – When the cabinet was full, records sent to file room • Now records stored electronically on computers – When the computer is ‘full’ – add more hard drives Basic skills to manage and maintain records has been lost, replaced by infinite storage

Higher Standards • As electronic records become more integrated into society, producers of those Higher Standards • As electronic records become more integrated into society, producers of those records will be held to higher standards of conduct – HIPPA – SOx – Federal and State Mandates – Case Law

WA Public Records Laws As defined in RCW 40. 14 ANY records that have WA Public Records Laws As defined in RCW 40. 14 ANY records that have been made by or received by any agency of the state of Washington in connection with the transaction of public business

Records Retention Any destruction of official public records shall be pursuant to a schedule Records Retention Any destruction of official public records shall be pursuant to a schedule approved under RCW 40. 14 Why? . . . The foundation of democracy in America is government accountability to the people

So the question becomes… who takes care of the records, and do they have So the question becomes… who takes care of the records, and do they have the knowledge?

Caretakers of Information • Historically records sent to file room, staff maintained access to Caretakers of Information • Historically records sent to file room, staff maintained access to records and managed lifecycle based on need and legal requirements • Now records are managed by users and IT staff, based on capacity and cost – Neither trained in the ‘science of information management’

Why a Digital Archives? • Comply with statutory & regulatory mandates. – The Law Why a Digital Archives? • Comply with statutory & regulatory mandates. – The Law requires preservation of certain public records – it doesn’t specify whether those records are paper or electronic. All records must be given the same care. • Avoid loss of legal & historical records – As technology changes, the older media (5 ¼” floppy disks, for instance) become harder to read. • Centralize Records – Centralization means uniformity in maintenance – ‘Trained professionals’ serve as caretakers • Preserve rare and ‘at-risk’ paper records • Improved access for citizens – By centralizing historical electronic records in one location, ‘onestop shopping’ will provide the information quicker and easier

What the Digital Archives is not • Not mass storage for active business applications What the Digital Archives is not • Not mass storage for active business applications & data • Not remote back-up for state & local government networks & data

The Digital Archives will: • Preserve electronic records with long-term legal, historical and/or fiscal The Digital Archives will: • Preserve electronic records with long-term legal, historical and/or fiscal significance • Assure platform-neutral retrieval 50, 100, or more years from now • Provide security back-up of certain permanent electronic legal records (courts, vital records, land records, etc. )

Project History • 2001 Session – Legislative approval (SSB 6155, 20012003 Capital Budget) • Project History • 2001 Session – Legislative approval (SSB 6155, 20012003 Capital Budget) • January – September 2002 – Building Programming • January 2003 – Building construction begins • September 2003 – ISB technology review • October 2004 - Grand Opening • Q 4 2006 – Full implementation

Monies In and Out • Primary funding source - $1 surcharge • Expenditures – Monies In and Out • Primary funding source - $1 surcharge • Expenditures – $14. 5 M joint use facility – $1. 5 M technology acquisition – $950, 000 Software Development – Ongoing budget of $2. 1 M/year

Requirements to E-Archive • • Hardware Software Management Authenticity Requirements to E-Archive • • Hardware Software Management Authenticity

Hardware • File Room of the 21 st century • Capacity and Speed double Hardware • File Room of the 21 st century • Capacity and Speed double every 18 months • Many choices – Tape – Optical – Spinning Disc First Immutable Law of Digital Archiving “What hardware you use today will be obsolete within four years”

Digital Archives Hardware • Network – Cisco Backbone end to end – LAN and Digital Archives Hardware • Network – Cisco Backbone end to end – LAN and SAN • EMC – SAN storage – 5 TB now, 20 TB by end of Year • HP – Servers and desktops • ADIC – Tape Library for offsite, disaster recovery • Microsoft – Software and Development w/EDS

Archival Software Formats • • • Native ASCII TIF PDF/A XML Whenever possible seek Archival Software Formats • • • Native ASCII TIF PDF/A XML Whenever possible seek the Open, documented solution! Remember Word. Star and DBase II ? ? ?

File Formats Digital Archives Multi-pronged approach: Stored as BLOBs in DB with metadata: • File Formats Digital Archives Multi-pronged approach: Stored as BLOBs in DB with metadata: • Maintain native format, wrapped • Create open file format version • Render XML formatted version, wrapped • Acquire original hardware and software

Content Management • Essential to maintain control of the information explosion • Allows hard Content Management • Essential to maintain control of the information explosion • Allows hard coded rules and information exchange • BUT still requires a strong knowledge, understanding and implementation of basic records management Second Immutable Law of Digital Archiving: “Data is Data, a Record is a Record, It is the content that drives retention, not the media”

‘Content Management’ • Not true CM but rather archival storage and retrieval • Do. ‘Content Management’ • Not true CM but rather archival storage and retrieval • Do. D 5015. 2 -STD compliant system • Wrap original file in native format • Wrap XML copy • Apply metadata & XML for indexing, searching & retrieval • Provide chain of custody & authenticity

‘Content Management’ • • • Microsoft Solution Custom Coded. Net front end SQL Server ‘Content Management’ • • • Microsoft Solution Custom Coded. Net front end SQL Server back end Biz. Talk translation utility SSH Tectia for secure transport

Authenticity • Maintain Chain of Custody • In the care of trusted 3 rd Authenticity • Maintain Chain of Custody • In the care of trusted 3 rd party • Received from trusted, known source

Data Security • • Encrypted SSH FTP transmission Issue Digital Certificate Verify IP and Data Security • • Encrypted SSH FTP transmission Issue Digital Certificate Verify IP and computer information MD 5 Hash on all original files Copy of FTP on tape prior to ingestion DB backups on tape Record Level Security for confidential Info

FTP Fingerprint • FTPUpload Date= FTP Fingerprint • FTPUpload Date="8/23/2005 9: 13: 05 AM" NTUser. Name="temp" Domain="CRISPLUS" SFTPUser. Name="Franklin. Co. Auditor" Host. Information Windows. Version="Microsoft Windows NT 5. 0. 2195. 0" • CPU ID="x 86 Family 15 Model 2 Stepping 9, Genuine. Intel" Level="15" • Local Area Connection: Connection-specific • DNS Suffix. : annex. co. franklin. wa. us • Description. . . : Intel(R) PRO/100 VE • Physical Address. . : 00 -0 D-60 -3 C-22 -34 • DHCP Enabled. . : Yes • Autoconfiguration Enabled. . : Yes • IP Address. . . : 172. 30. 7. 39 • Subnet Mask. . . : 255. 0 • DNS Servers. . . : 172. 30. 7. 2, 198. 239. 73. 3 • Primary WINS Server. . . . : 172. 30. 7. 2 • Secondary WINS Server. . . : 198. 239. 73. 3

Record Level Security • Restrict records at item, field or series level • Restrict Record Level Security • Restrict records at item, field or series level • Restrict to individual, dept, office or global • Uses authenticated login to reveal fields • Anonymous users see ‘Restricted’

Open Record Open Record

Restricted Record Confidential Restricted Record Confidential

MOU MOU

Ingestion Process • MUST be flexible – No Mandate and 3300 agencies • Microsoft Ingestion Process • MUST be flexible – No Mandate and 3300 agencies • Microsoft Biz. Talk 2004 • Transforms, adds metadata based on business rules • Creates ‘deep storage’ copy wrapping original file in XML, with Hash • Creates ‘web’ version of original file

Biz. Talk 2004 Biz. Talk 2004

fname firstname Fst_name first Jun-07 -05 07 -Jun-05 06/07/2005 First_Name Biz. Talk Predefined Pipelines fname firstname Fst_name first Jun-07 -05 07 -Jun-05 06/07/2005 First_Name Biz. Talk Predefined Pipelines 06/07/2005

Deep Storage XML Schema Record Common • Who Vital Records • What • Type Deep Storage XML Schema Record Common • Who Vital Records • What • Type • When • Where • Original File • ‘web’ file • Security • Fixity Birth • Date of • Father, Mother • Hospital

Deep Storage XML Deep Storage XML

Archive Database • Designed around latest industry standards • Open source, non-proprietary file storage Archive Database • Designed around latest industry standards • Open source, non-proprietary file storage • Applies metadata ‘tags’ to save information about record – creator, date, agency, subject, etc. • Provides chain of custody & authenticity of record • Allow search and retrieval of archival records through a web page

Web Design Wire Frame www. digitalarchives. wa. gov Web Design Wire Frame www. digitalarchives. wa. gov

Admin Pages • • Requires authenticated log-in Allows viewing of confidential information E-Transmittal process Admin Pages • • Requires authenticated log-in Allows viewing of confidential information E-Transmittal process Viewing of open orders

Who’s Visiting? ? ? • • Avg over 300 visits per day Avg length Who’s Visiting? ? ? • • Avg over 300 visits per day Avg length of stay 9 minutes 6%. gov - 4%. edu - 1%. org 13% came from Internet Search (Google, MSN, Yahoo) Visitors from: Canada, US Military, Romania, Germany, France, Australia, Japan, UK, Netherlands, Russia, Thailand, Portugal, Belgium, Poland, Italy, Indonesia, Singapore, Sweden, Mexico, New Zealand, Czech Republic, Hungary, Brazil, Norway, Columbia, Austria, Greece, Bulgaria, China, Yugoslavia, Philippines, Spain, South Korea, Denmark, Oman, Pakistan, South Africa, Jamaica, Switzerland

Risks • Distributed, non-standardized environment • No mandate to use Digital Archives • Limited Risks • Distributed, non-standardized environment • No mandate to use Digital Archives • Limited technology expertise in some agencies • Unpredictable data growth rate • Few business models • Emerging technologies • Limited internal expertise

Management Issues • • • Authenticity of record Metadata File naming conventions Corporate Culture Management Issues • • • Authenticity of record Metadata File naming conventions Corporate Culture Start small with e-mail, web page Use existing retention schedules Educate Shift AWAY from desktops Management Software is a must! Privacy of sensitive data

Third Immutable Law “Anything that you do today, will need major overhaul in two Third Immutable Law “Anything that you do today, will need major overhaul in two years” Technology and industry changing at unprecedented rates… But, more records are ‘lost’ every day! – Key is to be flexible and attack with forethought

Digital Archives Eastern Washington University, Cheney, Washington Adam Jansen Digital Archivist ajansen@secstate. wa. gov Digital Archives Eastern Washington University, Cheney, Washington Adam Jansen Digital Archivist [email protected] wa. gov

Secure FTP Secure FTP

Custom FTP Configuration • • Uses SSH Tectia client 128 Bit Encryption Ease of Custom FTP Configuration • • Uses SSH Tectia client 128 Bit Encryption Ease of use Minimal user interaction/Intervention Simple notification XML log file output Digital Footprint

Right Click Send to Right Click Send to

Drag and Drop Drag and Drop

Double Click Send Double Click Send

Notifications • • • Minimal Notification Minimal User interaction Ease of understand of Notification Notifications • • • Minimal Notification Minimal User interaction Ease of understand of Notification Quick notification of errors. Ease of Cleanup of sent files.

“No Data” Error “No Data” Error

Duplicates Duplicates

Possible Errors Possible Errors

Completion Delete Completion Delete

E-Commerce E-Commerce

Add to Shopping Cart • Ecommerce Functionality – Add to Shopping cart Add to Shopping Cart • Ecommerce Functionality – Add to Shopping cart

Shopping Cart Shopping Cart

Shipping Info Shipping Info

Billing Information Billing Information

View and Submit Order View and Submit Order

Confirmation Confirmation

Order Request Order Request