4571acbbe08818aff2b0205960456e99.ppt
- Количество слайдов: 67
ISO “Reference Model For an Open Archival Information System (OAIS)” Tutorial Presentation for DADs Workshop Prepared by: Lou Reich (CSC) and Don Sawyer (NASA) June 22, 1998 10041267 M-1
Overview of Tutorial Background Information OAIS Purpose, Scope and Applicability OAIS Concepts OAIS Responsibilities OAIS Detailed Reference Model – Archival Information Model – Functional Model Analysis of Archive Issues Using OAIS RM – Archive Associations – Migration Summary 10041267 M-2
Background Information 10041267 M-3
Genesis of the Effort (1) The international context : – Multiplication of data holding sites, – Rapidly growing needs for the long term preservation of digital data – Incomplete understanding of the digital preservation issues – Constant technological changes and lack of standards in the area of digital archiving Initial framework : ISO Technical Committee (TC) 20, Aircraft and Space Vehicles, and its Sub-Committee (SC) 13: Space Data and Information Transfer Systems proposal : – Define an Archive reference model and its Services categories – Address data used in conjunction with space missions – Address intermediate and indefinite long term storage of digital data – First step before developing specific standards needed to support archive services 10041267 M-4
Genesis of the Effort (2) Proposal made to Consultative Committee for Space Data Systems (CCSDS) and ISO TC 20/SC 13 – Develop a ‘Reference Model’ to establish common terms and concepts – Ensure broad participation, including traditional archives (Not restricted to space communities; all participation is welcome!) – Focus on data in electronic forms, but recognize that other forms exist in most archives – Follow up with additional archive standards efforts as appropriate Impact of both CCSDS and ISO procedures – The CCSDS reference model will become an ISO standard 10041267 M-5
Status of the Effort (Updated 1998 -10 -10) Reference Model will be submitted as a draft international standard in December, 1998 —Current version is White Book 4. 0, available under Reference Materials heading, at: —http: //ssdoo. gsfc. nasa. gov/nost/isoas/overview. html Widest possible exposure is desirable now – Participation is still welcome Full CCSDS and ISO standard by June, 1999 10041267 M-6
Open Archival Information System (OAIS) Open – Reference Model standard(s) are developed using a public process and are freely available Information – Any type of knowledge that can be exchanged – Independent of the forms (i. e. , physical or digital) used to represent the information – Data are the representation forms of information Archival Information System – Hardware, software, and people who are responsible for the acquisition, preservation and dissemination of the information – Additional OAIS responsibilities are identified later and are more fully defined in the Reference Model document 10041267 M-7
Document Organization Introduction – Purpose and Scope, Applicability, Rationale, Road Map for Future Work, Document Structure, and Definitions of Terms OAIS Concepts – High level view of OAIS functionality and information models – OAIS external environment – Minimum responsibilities to become an “OAIS” Detailed Models – Functional model descriptions and information model perspectives Migration perspectives – Media migration, compression, and format conversions Archive Cooperation – Criteria to distinguish types of cooperation among archives Annexes – Scenarios of existing archives, compatibility with other standards 10041267 M-8
Purpose, Scope and Applicability 10041267 M-9
Purpose Framework for understanding and applying concepts needed for long-term digital information preservation – Long-term is long enough to be concerned about changing technologies – Starting point for model addressing non-digital information Provides set of minimal responsibilities to distinguish an OAIS from other uses of ‘archive’ Framework for comparing architectures and operations of existing and future archives Basis for comparing data models of digital information preserved by archives, including their changes over time Expands consensus on elements and processes needed for long-term digital information preservation Guides the identification of future OAIS standards Does NOT specify any implementation 10041267 M-10
Scope Addresses minimal responsibilities of an OAIS Addresses a full range of archival functions including Ingest, Archival Storage, Data Management, Access and Dissemination – Internal and external interfaces – High level services at the interfaces Addresses the data models used to represent information Addresses migration of digital information to new media and new forms Addresses interoperability among OAIS archives Provides illustrative examples for context 10041267 M-11
Applicability Organizations with responsibility to preserve information over the long-term Produce information that may need long-term preservation Consumers of information from long-term archives May be applicable to organizations maintaining temporary archives – Rapid technology changes may drive same preservation issues – Information held may need long-term preservation – Functional and information modeling concepts may be useful – The access models are applicable to all entities Standards developers as basis for future standards 10041267 M-12
OAIS Concepts 10041267 M-13
Model View of an OAIS’s Environment Producer is the role played by those persons, or client systems, who provide the information to be preserved Management is the role played by those who set overall OAIS policy as one component in a broader policy domain Consumer is the role played by those persons, or client systems, who interact with OAIS services to find acquire preserved information of interest Producer OAIS (archive) Consumer Management 10041267 M-14
OAIS Information Definition Information is defined as any type of knowledge that can be exchanged, and this information is always expressed (i. e. , represented) by some type of data In general, it can be said that “Data interpreted using its Representation Information yields Information” In order for this Information Object to be successfully preserved, it is critical for an archive to clearly identify and understand the Data Object and its associated Representation Information Data Object Interpreted Using its Yields Representation Information Object 10041267 M-15
Information Package Definition Content Information Preservation Description Information An Information Package is a conceptual container of two types of information called Content Information and Preservation Description Information (PDI) 10041267 M-16
Information Package Variants Submission Information Package – Negotiated between Producer and OAIS – Sent to OAIS by a Producer Archival Information Package – Information Package used for preservation – Includes complete set of Preservation Description Information for the Content Information Dissemination Information Package – Includes part or all of one or more Archival Information Packages – Sent to a Consumer by the OAIS 10041267 M-17
External Data Flow Diagram Producer Submission Information Packages OAIS Legend Archival Information Packages queries = Entity Information = Package Data Object = Data Flow query response Dissemination Information Packages orders Consumer 10041267 M-18
OAIS Responsibilities 10041267 M-19
OAIS Responsibilities Accepts Information Packages from information producers Assumes sufficient control to ensure long-term preservation Determines which communities need to be able to understand the preserved information Ensures the information to be preserved is independently understandable to the designated communities Follows documented policies and procedures which ensure the information is preserved against all reasonable contingencies Makes the preserved information available to the Designated Communities in forms understandable to those communities 10041267 M-20
Detailed Models Overview 10041267 M-21
Overview of Detailed Models It was decided to do both a functional and an information model of the OAIS Both models were tasked to: — Use the models to better communicate OAIS Concepts — Use a well established, formal modeling technique — Stay as implementation independent as possible — Avoid detailed designs 10041267 M-22
Detailed Models Information Model 10041267 M-23
General Principles Define classes of “information objects’ that illustrate information necessary to enable Long-term storage and access to Archives The class definition should be implementation Independent Use a variant of Object Modeling Technique (OMT) as a notation 10041267 M-24
OMT Notation Overview Class: Class Name Multiplicity of Associations: Class Exactly one Class Many (zero or more) Class Optional (zero or one) Class One or more Aggregation: Assembly Class 1+ Part -1 Class Part-2 Class Specialization: Association: Parent Class Child -1 Class-1 Association Name Class-2 Child-2 Class 10041267 M-25
Information Objects Information Object 1+ Data Object Physical Object interpreted using 1+ Representation Information interpreted using Digital Object 1+ Bit Sequence 10041267 M-26
Representation Information The Representation Information accompanying a physical object like a moon rock may give additional meaning, as a result of some analysis, to the physically observable attributes of the rock The Representation Information accompanying a digital object, or sequence of bits, is used to provide additional meaning. It typically maps the bits into commonly recognized data types such as character, integer, and real and into groups of these data types. It associates these with higher level meanings which can have complex interrelationships that are also described 10041267 M-27
Classes of Representation Information Structural Information: applied to turn bit sequences into common computer data types, aggregations of these data types, and mapping rules which map from the underlying data-types to the higher level structures needed to understand the Digital Object Semantic Information: includes meanings associated with all the elements of the structural information, operations that may be performed on each data-type, and their inter-relationships. 10041267 M-28
Recursive Nature of Representation Information Preexisting standards that define primitive data-types Mapping rules that map those primitive data-type into the more complex data -type concept used by the Data Object Other semantic information that aids in the understanding of the Data such as a Data Dictionary 10041267 M-29
Sample Representation Net 10041267 M-30
Types of Information Used in OAIS 10041267 M-31
Content Information The information which is the primary object of preservation An instance of Content Information is the information that an archive is tasked to preserve. Deciding what is the Content Information may not be obvious and may need to be negotiated with the Producer The Data Object in the Content Information may be either a Digital Object or a Physical Object (e. g. , a physical sample, microfilm) 10041267 M-32
Preservation Description Information Provenance Information – Describes the source of Content Information, who has had custody of it, what is its history Context Information – Describes how the Content Information relates to other information outside the Information Package Reference Information – Provides one or more identifiers, or systems of identifiers, by which the Content Information may be uniquely identified Fixity Information – Protects the Content Information from undocumented alteration 10041267 M-33
Example of Preservation Description Information Content Information Type Space Science Data Reference Object Identifier Journal Reference Mission, instrument, and title attribute set Provenance Instrument Description Processing History Sensor Description Instrument Context Fixity Calibration history Related data sets Mission Funding history CRC Checksum Reed-Solomon coding Related References Dewy Decimal System Publishing Data Publisher Author Digital Help file User Guide Related Software Language Certificate Checksum Encryption CRC Instrument mode Processing history Decommunication map Software Interface Specifications Bibliographic Information Software Package ISBN Title Author Name Author Version number Serial Number Printing history Copyright Position in series Manuscripts References Revision Histroy License holder Registration Copyright signature Cover 10041267 M-34
Descriptive Information Contain the data that serves as the input to documents or applications called Access Aids can be used by a consumer to locate, analyze, retrieve, or order information from the OAIS. 10041267 M-35
Packaging Information which, either actually or logically, binds and relates the components of the package into an identifiable entity on specific media Examples of Packaging Information include tape marks, directory structures and filenames 10041267 M-36
OAIS Archival Information Package Descriptor Archival Information Package (AIP) derived from e. g. , Information supporting customer searches for AIP Content Information Packaging Information delimited by e. g. , How to find Content information and PDI on some medium further described by e. g. , • Hardcopy document • Document as an electronic file together with its format description • Scientific data set consisting of images and text in three electronic files together with format descriptions Preservation Description Information (PDI) e. g. , • How the Content Information came into being, who has held it, how it relates to other information, and how its integrity is assured 10041267 M-37
AIP Types Based on the difference in Content Object complexity AIUs contain a single Data Object as the Content Object AICs contain multiple AIPs in their Content Objects — Each member of an AIC is an AIP containing Content Information and PDI — The AIC contains unique PDI on the collection process 10041267 M-38
Package Descriptors and Access Aids Package descriptors are needed by an OAIS to provide visibility and access to the OAIS holdings Package Descriptors contain 1 or more Associated Descriptions which describe the AIP Content Information from the point of view of a single Access Aid Some example of Access Aids Include: — Finding Aids - assist the consumer in locating information of interest — Ordering Aids - allow the consumer to discover the cost of and order AIUs of interest — Retrieval Aids - enable authorized users to retrieve the AIU described by the Unit Descriptor from Archival Storage 10041267 M-39
Model of All Data Objects Stored in Data Management 10041267 M-40
Information Model Summary Presented a model of information objects as containing data objects and representation objects Classified information required for Long-term archiving into 4 classes: Content Information, PDI, Packaging Information and Descriptive Information Described how these classes would be aggregated and related in an AIP to fully describe an instance of Content Information Presented information needed for Access, in addition to that needed for Long-term Preservation Put the Access oriented structures in the context of the other data needed to operate an OAIS 10041267 M-41
Detailed Models Functional View 10041267 M-42
General Principles Highlight the major functional areas important to digital archiving Use functional decomposition to clarify the range of functionality that might be encountered – Don't decompose beyond two levels to avoid becoming too implementation dependent – Provide a useful set of terms and concepts – Do not imply that all archives need to implement all the sub-functions Identify some common services which are likely to be needed, and are assumed to be available, as underlying support 10041267 M-43
Common Services Modern, distributed computing applications assume a number of supporting services Examples of Common Services include: — inter-process communication — name services — temporary storage allocation — exception handling — security — file and directory services 10041267 M-44
OAIS Functional Entities DI P R O D U C E R Data Management DI Requests Acces Ingest SIP Archival Storage AIP other information DIP C O N S U M E R Administration MANAGEMENT SIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package DI = Descriptive Information 10041267 M-45
Functional Entities In An OAIS Ingest: This entity provides the services and functions to accept Submission Information Packages (SIPs) from Producers and prepare the contents for storage and management within the archive Archival Storage: This entity provides the services and functions for the storage, maintenance and retrieval of Archival Information Packages Data Management: This entity provides the services and functions for populating, maintaining, and accessing both descriptive information which identifies and documents archive holdings and internal archive administrative data. Administration: This entity manages the overall operation of the archive system Access: This entity supports consumers in determining the existence, description, location and availability of information stored in the OAIS and allowing consumers to request and receive information products 10041267 M-46
Ingest Functions Schedule Submission Delivery: negotiates a data submission schedule with the producer Receive Submission: provides the appropriate storage capability or devices to receive a SIP from the producer. The Receive SIP function may represent a legal transfer of custody for the CI in the SIP, and may require that special access controls be placed on the contents Generate Archival Information Package: transforms one or more SIPs into one or more AIPs that conforms to the internal data model of the archive. Generate Descriptive Information: extracts Descriptive Information from the AIPs to populate the data management system. Coordinate Updates: responsible for transferring the AIPs to Archival Storage and the Descriptive Information to Data Management 10041267 M-47
Archival Storage Functions Receive Data: receives a transfer request and an Archival Information Package from the staging area and moves the data to permanent storage within the archive Manage Storage Hierarchy: positions the contents of the Archival Information Packages (AIPs) on the appropriate media based on directions from ingest (transfer request), administrative policies or usage statistics Refresh Media: provides the capability to reproduce the Archive Holdings over time Error Checking: provides statistically acceptable assurance that no components of the archive information package are corrupted during any internal archive data transfer or transformation Disaster Recovery: provides a mechanism for producing duplicate copies of AIPs (AIUs and AICs) in the archive collection Provide Data: provides copies of stored AIPs to Access 10041267 M-48
Data Management Functions The Administer Database: responsible for maintaining the integrity of the Data Management database Perform Queries: receives a query request from Access and Dissemination and executes the query to generate a result set} that is transmitted to the requester The Generate Report: receives a report request and executes any queries or other processes necessary to generate the report then supplies the report to the requester Receive Database Update: adds, modifies or deletes information in the Data Management persistent storage Activate Request: maintains a record of subscription requests and periodically compares it to the contents of the archive to determine if all needed data is available. If needed data is available, this function generates a Dissemination Request which is sent to the Access. This function can also generate Dissemination Requests on a periodic basis 10041267 M-49
Access Functions (1 of 2) Prepare Finding Aids: provides tools and products which provide an overview of information products available in the archive system Receive Requests: provides a single user interface to the information holdings of the archive. This interface will normally be via computer network or dial-up link to an online service, but might also be implemented in the form of a walk-in facility, printed catalog ordering service, or fax-back type service Coordinate Request: function determines the resources needed to fulfill archive requests and forwards them to those entities for execution. Query and report requests will be fulfilled by data management while dissemination requests will generally require information from data management and archival storage 10041267 M-50
Access Functions (2 of 2) Generate DIP: accepts a dissemination request, retrieves the data from Archival Storage and moves a copy of the data to a staging area for further processing. This function also transmits a report request to Data Management to generate descriptive information. If special processing is required this function accesses data objects in staging storage and applies the requested processes. This function places the completed DIP package in the staging area and notifies both Coordinate Request and the Deliver DIP function that the package is ready for delivery. Deliver DIPs: handles both on-line and off-line deliveries of DIPs to consumers Provide Access Controls: provides a hierarchy of security controls depending on the needs of the archive system. These include restricting access to certain information due to security classifications or copyright restrictions. 10041267 M-51
Administration Functions (1 of 2) Negotiate Submission Agreement: solicits desirable archival information for inclusion into the OAIS and negotiates submission agreements with data producers Manage System Configuration: provides system engineering for the archive system to systematically control changes to the configuration. This function maintains integrity and tractability of the configuration during all phases of the system life cycle. It also audits system operations, system performance, and system usage and plans for system evolution Physical Access Control: provides mechanisms to restrict or allow physical access (doors, locks, guards) to elements of the archive as determined by archive policies 10041267 M-52
Administration Functions (2 of 2) Develop Standards and Policies: is responsible for developing and maintaining the archive system data standards. These standards include format standards, documentation standards and the procedures to be followed during the ingestion process. It will also develop policies for Archival Storage hierarchy management and migration policies Audit AIPs: is carried out by the archive data engineers and may also involve an outside committee (e. g. , science and technical review). The audit process must verify that the quality of the data meets the requirements of the archive and the review committee Interact with Management: receives and carries out Management policies. These policies include such things as the OAIS charter, scope, resource utilization guidelines, and pricing policies. It also provides OAIS performance information to Management 10041267 M-53
Analysis of Archive Issues Using OAIS RM Archive Associations 10041267 M-54
Archive Cooperation Users of multiple OAIS archives have reasons to wish for some uniformity or cooperation among the OAISs. Consumers — Common finding aids to aid in locating information over several OAIS archives — Common Package Descriptor schema for access — Common DIP schema for dissemination, or a single global access site. Producers — common SIP schema for submission to different archives — a single depository for all their products. Managers — Cost reduction through sharing of expensive hardware increasing the uniformity and quality of user interactions with the OAIS 10041267 M-55
Categories of Archive Interactions Independent: no knowledge by one OAIS of Standards implemented at another Cooperating: Potentially common submission standards, and common dissemination standards, but no common access. One archive may make subscription requests for key data at the cooperating archive Federated: Access to all federated OAIS is provided through a common set of access aids that provide visibility into all participating OAISs. Global dissemination and Ingest are options Shared resources: An OAIS in which Management has entered into agreements with other OAISs is to share resources to reduce cost. This requires various standards internal to the archive (such as ingest-storage and access-storage interface standards), but does not alter the community’s view of the archive 10041267 M-56
Cooperating Archives Method B Ing Acc OAIS Adm Adm Ing Acc Method A OAIS Adm Adm Acc OAIS Adm Adm Ing Producer Consumer Ing Method B Acc OAIS Adm Adm Method A The first set of cooperating OAIS merely have an agreement to share at least on common SIP and DIP format to enable the transfer of holdings The second set of cooperating OAIS have standardized their DIP and SIP formats for use by producers and consumers 10041267 M-57
Federated Archives 10041267 M-58
Levels of Autonomy in Associated Archives No interactions and therefore no association Associations that maintain your autonomy. You have to do certain things to participate, but you can leave the association without notice or impact to you. Associations that bind you by contract. To change the nature of this association you will have to re-negotiate the contract. The amount of autonomy retained depends on how difficult it is to negotiate the changes. 10041267 M-59
Analysis of Archive Issues Using OAIS RM Migration 10041267 M-60
Digital Migration is defined to be the transfer of digital information, while intending to preserve it, within the OAIS. Focus on the preservation of the full information content Internal OAIS perspective. Three major motivators are seen to drive Digital Migrations of Archival Information Packages within an OAIS: Media Decay Increased Cost Effectiveness New Consumer Service Requirements 10041267 M-61
Digital Migration Approaches Two basic types of migration in response to motivators: —Repackaging —Transformation Each of these comes in two basic flavors, ordered by increasing risk of information loss: —Physical Repackaging (Replication) • Media replacement with no bit changes —Digital Repackaging • Some bit changes in Packaging Information —Reversible Transformations • Bit changes in Content Information are reversible by an algorithm —Non-reversible Transformations • Bit changes in Content Information are not reversible by an algorithm 10041267 M-62
Some Migration Strategies Use media types with long-lived support – Enhances chances that Replication will be useful Minimize use of media format attributes for holding Content Information – Allows digital repackaging without having to also do transformations when migrating to new media types Maintain originals if non-reversible transformations are required 10041267 M-63
AIP Versions and Editions Version – An AIP that undergoes a transformation during migration becomes a new version Edition – An AIP that is revised to improve its information content is termed a new edition. This is not a migration. Derivation – An AIP that is the result of being derived from one or more other AIPs is termed a derived AIP. This is not a migration. 10041267 M-64
Summary and Request 10041267 M-65
Summary Reference model is to be applicable to all digital archives, and their Producers and Consumers Identifies a minimum set of responsibilities for an archive to claim it is an OAIS Establishes common terms and concepts for comparing implementations, but does not specify an implementation Provides detailed models of both archival functions and archival information Discusses OAIS information migration and interoperability among OAISs 10041267 M-66
Request for Participation (Updated 1998 -10 -10) Ultimate success of this effort depends on obtaining adequate review and comment Reference Model Red Book/ISO Draft International Standard (DIS) expected December 1998 — Current version is White Book 4. 0, available under Reference Materials heading, at: — http: //ssdoo. gsfc. nasa. gov/nost/isoas/overview. html Comments are being actively solicited — Send to donald. sawyer@gsfc. nasa. gov 10041267 M-67


