39e904a7015d01ee508f4fdb445df52c.ppt
- Количество слайдов: 54
DOI SYSTEM: DATA MODEL Workshop on the DOI System International DOI Foundation
Outline / Key concepts in this section • • • DOI Data Model and interoperability Application profiles Kernel metadata Metadata declaration Role of DOI name metadata Origins of the DOI Data Model Semantic interoperability The indecs principles Applications of indecs The use of a data dictionary Example: rights management doi>
Further reading on key concepts in this section doi> • DOI Handbook Chapter 4, “DOI Data Model” http: //www. doi. org/handbook_2000/metadata. html • “DOI System and Data Dictionaries” Factsheet: http: //www. doi. org/factsheets/DOIData. Dictionaries. html
DOI data model • The underlying model of how data within the DOI System relates to other data – • Therefore vital for interoperability Whereas the Handle System component is needed for every DOI name, the DOI Data Model component has not yet been used to its full extent – • doi> Some applications have used it “behind the scenes” Interoperability becomes more important as an economic feature when there are multiple services or multiple uses – which there will be eventually – Don’t design only for today
Application Profile (AP) Framework Entities are identified by DOI names 965 876 453 784 369 908 The properties of groups of DOI names are defined as APs doi> APs have one or more Services have definitions Service Instance Service Definition Application Profile 965 876 453 Application Profile 784 369 908
Application Profile (AP) Framework Entities are identified by DOI names 965 876 453 The properties of groups of DOI names are defined as APs doi> APs have one or more Services have definitions Service Instance Service Definition Application Profile 965 876 453 Application Profile 453 784 369 908 Application Profile 784 369 908 • New APs and services may be created or made available • One change to an AP to affect all DOI names within that AP
DOI Application Profile A DOI Application Profile is a DOI name view: mechanism for “unity in diversity”: what do all these DOI names have in common? Based on any interest group’s view of a type of creation (a DOI Name User Community). Functional granularity: create a grouping when you need it. DOI-AP’s can overlap: things can be in multiple DOI-APs. DOI-AP has metadata kernel, Registration Agency, Governance /Development Group Zero Set = “initial implementation” DOI names (just a single URL redirection; zero additional metadata).
Metadata Single redirection (persistent identifier) Multiple resolution Initial implementation Full implementation W 3 C, WIPO, NISO, IETF, etc Activity tracking
Metadata Single redirection (persistent identifier) Zero App Profile W 3 C, WIPO, NISO, IETF, etc Multiple resolution Defined App Profiles Activity tracking
DOI Kernel Each DOI-AP starts from a basic Kernel (8 elements) and may add whatever else it needs: defined by the DOI Name User Community. DOI name metadata vocabulary being developed - in tandem with ONIX etc Can/should coincide with or provide sector requirements Different DOI-APs’ metadata will interoperate if vocabularies are developed within indecs-based model.
DOI Kernel Contains critical minimum metadata for basic recognition (but not complete disambiguation). Standard base vocabulary DOI -AP entity (e. g. “book”) must be analysable in terms of other attributes (e. g. media, mode, content, subject). DOI name 10. 1000/ISBN 0141255559 resource. Identifier ISBN 0141255559 resource. Name Two for the dough Principal. Agent, role Janet Evanovich, author Structural. Type Physical fixation Character Text Mode Visual referent. Type Book
DOI Kernel as the basis of each application profile Each Profile can be thought of as built from the kernel + extensions: DOI AP Compulsory kernel metadata for application
DOI Kernel as the basis of each application profile Each DOI-AP can be thought of as built from the kernel + extensions…. . . But the kernel is actually what several AP’s have in common (compare the different views of a person) : Son Legal person Agent Alien Scholar Library user Composer credit card holder Shoe purchaser Author Lottery entrant Hospital patient Citizen Car driver Rights owner Marathon runner Software licensee Parent Tax payer Club member e-consumer Back account holder Husband Charity giver Hotel guest Speeding ticket recipient Disney World visitor Frequent Flyer Concert-goer Passenger Employee Voter Dog owner
DOI Kernel as the basis of each application profile This kernel cannot be logically defined from first principles In the absence of existing Application Profiles to define this overlap = kernel, we have made a reasonable estimate from the logical analysis of
3 P IA DO DO I AP 2 DOI AP 1 kernel for any DOI name metadata for AP DOI-APs: all metadata in well-formed structure
Metadata declarations WHAT: • Base kernel metadata must be declared. • DOI-AP-specific metadata is a matter for the DOI Name User Community (Governance Group/Registration Agency) to decide. HOW: • Either local webpage or central repository or both (as decided by User Community rules). • Automated access to metadata declaration via Handle data types? • XML schemas.
Roles of declared metadata = Functional specification of the DOI kernel (a) to assign a unique DOI name to the creation [DOI] (b) to link the DOI name to the principal local identifier of a creation (if any) to enable the integration of DOI name-related applications and metadata with others [resource. Identifier] (c) to enable a searcher or application to identify the creation by its most common name and the parties(s) responsible for its creation or publication [resource. Name, principal. Agent, agent Role]
Roles of declared metadata (continued) (d) to enable a searcher or application to distinguish the fundamental type of creation (abstract, physical, digital or spatio-temporal), and thereby also to distinguish between creations of different types with the same names and creators. [structural. Type] (e) to enable a searcher or application or distinguish the mode of the creation (visual, audio, etc. ) [mode, character] (f) to enable a searcher or application to determine to which DOI name user/application set the creation belongs [DOI-AP].
Handles and metadata: a possible development Handle data types could create a way of processing metadata as a “distributed database” of services: e. g. metadata@10. 1000/123456 rights@10. 1000/123456 abstract@10. 1000/123456 sample@10. 1000/123456 buy@10. 1000/123456 license@10. 1000/123456 pdf@10. 1000/123456 etc. Data types (and results) must be consistent, so the DOI name Handle data type vocabulary must be developed with great care within indecs-based model. Some data types could be application specific.
Origins of DOI data model • • The underlying model of how data within the DOI System relates to other data Two components – • • • Data Dictionary + DOI Application Profile Framework Based on the indecs analysis – • doi> Provides tool for precise description of entity through metadata (and mapping to other schemes). Met the needs of DOI System development aim: do not re-invent the wheel DOI System and indecs development were in parallel DOI Application Profile framework – – Provides means of relating entities: grouping entities and expressing relationships A mechanism for grouping DOI names with similar properties
Definitions of metadata popular. . . Metadata is data about data. Everyone logical. . . An item of metadata is a relationship that someone claims exists between two entities*.
#1: All metadata is just a view e. g. Views of a “person”: some (generic) ways in which you might be identified in metadata schemes. . . Son Legal person Agent Alien Scholar Library user Composer credit card holder Shoe purchaser Author Lottery entrant Hospital patient Citizen Car driver Rights owner Marathon runner Software licensee Parent Tax payer Club member e-consumer Back account holder Husband Charity giver Hotel guest Speeding ticket recipient Disney World visitor Frequent Flyer Concert-goer Passenger Employee Voter Dog owner In each of these roles “you” will have different IDs and attributes. Three
#1: All metadata is just a view Creations are the same. An identifier for a published article may refer to. . . A manuscript The abstract work A draft A (class of) physical copy in a publication A (class of) digital copy (not in a publication) A (class of) digital copy in a publication A (class of) digital format A specific digital copy A (class of) paper copy A specific paper copy An edition A reprint A translation etc…and many combinations of the above Similar views apply to other types of creations. Three
#1: All metadata is just a view Views must not be confused for digital content and rights management. Mistaken identity can be catastrophic. Increasingly, views need to be interoperable • e. g. production workflow, rights, marketing within one business; supply chain transfer; etc. The need for automated, interoperable views in digital commerce will be enormous. Three
#2: (Almost) all terms need identifiers Each of the values of a view must be defined and identified if other views are to recognize them (what do you mean by an abstract work? an edition? a format? a scholar? a name? ) So views need comprehensive controlled vocabularies (note our reliance on ISO language, territory, currency, time codes). Automation needs disambiguity. Terms of rights must be unambiguous. Anything may be a term of an agreement. Emergence of the value of structured ontologies for commerce (like the indecs model). Three
#3: Events are the key to interoperability Most metadata is “thing” or “people” based. • static views e. g. “a creation” In the net future, metadata interoperability will be achieved by describing “events”; relating things and people • dynamic views e. g. “A created B” Event descriptions will also be the key to rights metadata (transactions are events) Three
Meaning • doi> Assigning metadata to a referent, to enable semantic interoperability – “say what the referent is” – Resolution of an identifier may give the referent; or only metadata; or a “manifestation” • Semantic: – Do two identifiers from different schemes actually denote the same referent? – If A says “owner” and B says “owner”, are they referring to the same thing? – If A says “released” and B says “disseminated”, do they mean different things? • Interoperability: the ability for identifiers to be used in services outside the direct control of the issuing assigner – Identifiers assigned in one context may be encountered, and may be re-used, in another place or time - without consulting the assigner. You can’t assume that your assumptions made on assignment will be known to someone else. • Persistence = interoperability with the future
A pointer is not enough • doi> Precisely what is being named? – Suppose I have here a pdf version of Defoe’s “Robinson Crusoe” issued by Norton. I find an identifier – is it of: – – – All works by Daniel Defoe The work “Robinson Crusoe”? The Norton edition of “Robinson Crusoe”? The pdf version of the Norton edition of…. ? The pdf version of…held on this server…? • Most digital objects of interest have compound form, simultaneously embodying several referents. • Multiple identifiers may be necessary (compare music CDs)
Metadata scheme e. g. ONIX Metadata scheme e. g. LOM Agreed term-by-term mapping or “Crosswalk”
Metadata scheme e. g. ONIX Metadata scheme e. g. LOM
Metadata scheme e. g. ONIX Metadata scheme e. g. LOM Term “Author” Central dictionary Metadata Scheme Norman. Rights Term “Writer” ONIX: Author = Norman. Rights: Writer
Metadata interoperability: semantic problems doi> Mappings are not simple: • Different names (and languages) for the same thing (Author vs Writer) • Same name for different things (title, Title) • Data elements at different levels of speciality (title vs Full. Title, Alternative. Title). • Different allowed values for elements (“pii” vs “not pii”) • Data at different levels of granularity (journal_article vs Serial. Article. Work/Serial. Article. Version). • Data in different structures (article as attribute of journal or vice versa). • Data from different sources (local codes vs ONIX codes). • Different contextual meaning (DOI name of what…? ) • Different representation (1 title vs n titles). • Different mandatory requirements (ISSN mandatory vs optional) • Schemas are being updated all the time. . . etc. To manage all of this requires a coherent structured approach.
Dictionary = a common base semantic layer doi> DRM Application layer Technology Platform Communication layer Rights Expression Language Semantic layer Rights metadata DRM systems, “Semantic Web” Xr. ML, XCML, ODRL, etc Data Dictionary
Semantic = “meaning” doi> • Does A “mean the same as” B ? – = in practice, does A need a different identifier from B? – versions; works and manifestations; editions • For a machine, “A means same as B” = “A has same attributes as B” – Which attributes? The answer is entirely contextual: – Do A and B belong to the same class for the purposes of … – The class is defined by a set of attributes (metadata) (RDF, etc) • We group similar things together; what is identified is usually a class – e. g. the class of all copies of the hardback printed second edition of this book from this publisher = the same ISBN • Ultimately, no one thing is the same as another thing (or they wouldn’t be two things) – “Roughly speaking, to say of two things that they are identical is nonsense, and to say of one thing that it is identical with itself is to say nothing at all”. – Liebniz’s Law (no two objects have exactly the same properties) – A class contains similar things • Automation = logic
2001: Ontologies and Semantic Web doi> “Ontologies Of course, this is not the end of the story, because two databases may use different identifiers for what is in fact the same concept, such as zip code. A program that wants to compare or combine information across the two databases has to know that these two terms are being used to mean the same thing. Ideally, the program must have a way to discover such common meanings for whatever databases it encounters. A solution to this problem is provided by the third basic component of the Semantic Web, collections of information called ontologies. ”
Ontology approach: deeper view of metadata • doi> The key to defining what is identified logically – enabling people to use their existing metadata – Ontologies can deliver data dictionaries suitable for mapping • Fundamental, generic, extensible methods can be used to construct interoperable ontologies – by putting metadata into context: entity attribute relationship entity relationship agent context attribute entity resource context time place
The
The
Figure 1 doi> Contextual ontology COA Meta. Model overview metamodel Overview Descriptor Role Agent Resource Role Verb Identifier Every Relationship has a Relator Annotation Context Relator Name Category Role Time Place Role Flag Quantity Entity. Types An Entity may have typed relationships with Entities of any kind (including those of its own kind) Attribute. Types An Entity may have Attributes of any kind. (Attributes, which are aa type of Resource, (Attributes, which are type of Resource, may have their own Attributes). Contextual Relationships Non Contextual Relationships (illustrative: any Type of Entity may relate to any other) Attributes (illustrative: any Entity or Attribute may have Attributes of any type)
1995 -2004: Defining what is identified doi> • Many individual metadata schemes for specific sectors, applications, etc. ; vary from simple to complex data models • 1995+: Dublin Core: need for standardisation on WWW – 15 (+) elements for “output” for simple resource description – Now ISO 15836 • Ontology-based activities: – 1995+ : Common Information System “CIS” (CISAC) – rights, music – 1998: Functional Requirements of Bibliographic Records, “FRBR” (IFLA) – library cataloguing – 1998 -2000: Interoperability of Data in E-Commerce Systems, “indecs” (multiple partners) – generic intellectual property • For “e-commerce” read “automation” • Influenced by CIS and FRBR – 2000: ABC/Harmony – generic events-aware model – Should enable re-use of existing metadata
e. g. Terms of a Licence as a group of Events Licensing Event Permits (MAY) Prohibits (MUST NOT) Requires (MUST) 1 -n Use. Event Has Exception 0 -n Use. Event doi> Event = time, place, entities Has Precondition 0 -n Payment Reporting Event etc This structure allows for whatever level of flexibility or granularity may be required now or in the future.
Contextual Ontology usage examples doi> • ISO MPEG-21 Rights Data Dictionary (http: //iso 21000 -6. net/) • DDEX Digital Data EXchange - music industry (http: //ddex. net/) • ONIX: Book industry (+) messaging schemas (www. editeur. org ) • ONIX: Rights: ONIX for Licensing Terms, Repertoire and Distribution • Digital Library Federation - communication of licence terms (ERMI: working with ONIX for licensing terms) • DOI Data Dictionary (http: //www. doi. org ) • Rightscom’s Ontology. X - licensee of early output, plus their own later work (www. rightscom. com ) • RDA (Resource Description and Access); next generation of AACR/MARC cataloguing – RDA/ONIX common framework • ACAP: Automated Content Access Protocol • Consistent with FRBR, ABC-Harmony, OWL, CIDOC CRM, etc (http: //www. the-acap. org/ )
doi> 1995 -2004: Defining what is identified Development of indecs 2000 -2004 Black = what Red = who 2004 IDF indecs (2000) CONTECS (2001+) ISO MPEG 21 RDD IDF + ONIX indecs Framework Ltd IFPI/RIAA, MPA, IDF, Dentsu. MMG, Rightscom indecs. DD Ontology. X Rights. Com (Mi 3 p etc)
DOI names to express relationships doi> • DOI name of one item may be related to DOI name of another – Through multiple resolution, metadata, Application Profiles… • Example: A DOI name of a work could resolve to several available formats, languages, etc. Article DOI Name 12345 Chinese version DOI Name 56789
Rights: an example of DOI System potential • • • doi> DRM: Technical Protection Measures which use RMI But: simple management WITHOUT technical protection also needs RMI What is being managed for any rights purpose has to be identified We need to accommodate existing and new identifier schemes A consistent approach to all kinds of inter-related entities is necessary: make People do “identity management” Stuff use about Deals “content management” “license management”
Describing rights using data doi> Primary rights events (claims, deals) are described using pieces of data from all these domains: Rights Statement (“claim”): [party] owns [right] in [creation] in [time] and [place] Rights Agreement (“deal”): [party] agreed with [party] in [time] and [place] that [event] Pieces of "rights metadata" used in each rights statement are things which need to be identified
Describing rights using data doi> Primary rights events (claims, deals) are described using pieces of data from all these domains: Rights Statement (“claim”): [party] owns [right] in [creation] in [time] and [place] Rights Agreement (“deal”): [party] agreed with [party] in [time] and [place] that [event] Creations typically have standard identifiers, which may have associated structured data, or which may act as keys to get this data Other pieces of data also need standard identifiers (time, party. . )
Describing rights using data doi> Secondary rights events (licences) are also described using pieces of data: Permission: [party] can [verb] [amount] to [creation] at [time] in [place]. Prohibition: [party] can’t [verb] to [creation] at [time] in [place]. Requirement: [party] must [verb] [amount] to [creation/party] at [time] in [place]. Rights Transfer: [party] can [grant right] to [party] in [creation] at [time] in [place].
Describing rights using data doi> Pieces of "rights metadata" used in each rights declaration Permission: [party] can [verb] [amount] to [creation] at [time] in [place]. Prohibition: [party] can’t [verb] to [creation] at [time] in [place]. Requirement: [party] must [verb] [amount] to [creation/party] at [time] in [place]. Rights Transfer: [party] can [grant right] to [party] in [creation] at [time] in [place].
What are these pieces of "rights metadata"? A mix of data from many sources: 1 Rights “events” 2 Descriptive metadata doi> Statements, agreements, transfers, permissions, prohibitions, requirements, assertions, approvals… Creations, creation types, contributor roles, user roles, tools, classifications, measures … 3 Legal terms Rights, persons, companies, intellectual property, jurisdictions … 4 Financial metadata Terms, currencies, conventions… These sets of “rights metadata" are standardized and maintained in different places.
doi> Distributed rights management This mix of data from many sources is used in many different places by different people in chains of rights events: statement assertion agreement transfer agreement permission prohibition permission requirement etc [party] can [verb] [amount] to [creation] at [time] in [place]. Compound entity can be expanded to reveal more data
doi> Distributed rights management statement assertion agreement transfer agreement permission prohibition permission requirement etc Each of these is an information object: • which needs to be identified (and may be a compound object); • which may need to link to or use information objects in other databases; • which should be interoperable
Summary • • • DOI Data Model and interoperability Application profiles Kernel metadata Metadata declaration Role of DOI name metadata Origins of the DOI Data Model Semantic interoperability The indecs principles Applications of indecs The use of a data dictionary Example: rights management doi>
DOI SYSTEM: DATA MODEL Workshop on the DOI System International DOI Foundation


