Piero Attanasio Managing persistent identifiers and digitisation rights

Скачать презентацию Piero Attanasio Managing persistent identifiers and digitisation rights

1ba302464cd659eba09acf78b37ed65c.ppt

Количество слайдов: 19

Piero Attanasio Managing persistent identifiers and digitisation rights for Europe Bologna, 27 May 2011 www. medra. org Euro. CRIS meeting

Summary www. medra. org Ø Ø m. EDRA A general approach Our experience in right information: Arrow Lesson learned for other applications

m. EDRA - multilingual European DOI Registration Agency Ø m. EDRA is a joint venture between § AIE - Associazione Italiana Editori (Italian Publishers Association) § Cineca - Technological consortium of Italian universities § Thus a public-private-partnership Ø Born in 2004 after a EU co-funded project with the same title Ø m. EDRA is a DOI (Digital Object Identifier) Registration Agency § Active mainly in Italy and Germany (partnership with MVB) § 82% of turnover outside Italy Ø Other field of activities § m. EDRA provides technology services to www. medra. org • Office of Publication of European Europe (DOI registration infrastructure) • Italian ISBN agency Ø m. EDRA main asset: know how on standard for cultural content § Designed by the parent companies as a center for R&D in this field

The standard field: a labyrinth of acronyms The current situation www. medra. org Everybody calls for Ariadne! But Ariadne is an acronym of a project in the educational field

A scheme to exit the labyrinth • We can describe any use of any IP entity as People make deals with stuffs (Norman Paskin, director of IDF, Towards a data dictionary. Identifiers and semantics at work on the net, Electronic Publishing Services, June 2002, www. doi. org/topics/020522 IMI. pdf) People / use / objects • We have to identify and describe www. medra. org people, deals, and stuffs (people, uses, and objects)

A scheme to exit the labyrinth • Each acronym belong to one cell of the table PEOPLE Identification • ISNI * • IPI • VIAF • ONIX-LT • ACAP Description www. medra. org DEALS Well established Under development STUFF • ISBN * • ISSN * • ISMN * • DOI * • ISTC * • ONIX • MARC • DC Developed, still not used * ISO standards • We still have empty cells!

Current and forthcoming trends PEOPLE Identification • ISNI * • IPI • VIAF Description www. medra. org DEALS Current trends Forthcoming • ONIX-LT • ACAP STUFF • ISBN * • ISSN * • ISMN * • DOI * • ISTC * • ONIX • MARC • DC

Standard network resolution Ø An additional layer of complexity (sorry for this!) Ø From identification to resolution: § I. e: reaching resources about the identified resource in a network environment Ø This is the core idea of the DOI www. medra. org § Standardising also this aspect (which goes beyond identification) is a value

Resolution vs identification “What a PI identifies” and “what a PI resolves to” are two different concepts Resolution 1 Resolution 2 What the DOI The DOI® (Digital Object Identifier) is a standard for identifying any object of intellectual property. A DOI provides a means of persistently identifying a piece of intellectual property on a digital network and associating it with related current data. Info (metadat a) Rights info On digital networks, all intellectual property is simply a string of bits; a DOI can apply to any form of intellectual property in any digital environment. DOIs have been called "the bar code for intellectual property": like the physical bar code, they are enabling tools for use all through the supply chain to add value and save cost. A DOI differs from commonly used internet pointers to material such as the URL – Uniform Resource Locator, the usual means of referring to World Wide Web material – because it identifies an object as a first-class entity, not simply the place where the object is located. A DOI is also different from commonly used identifiers of intellectual property like standard bibliographic and related identifiers (ISBN, ISSN, ISRC, etc) because it is associated with defined services and is immediately "actionable" on a network. However, the DOI does not compete with these standards since it allows them to be integrated as suffixes in DOI strings. A DOI is an implementation of the Internet concepts of Uniform Resource Name and Universal Resource Identifier. A DOI is different from abstract naming specifications such as URN in that it is a defined identification DOI Resolution 3 www. medra. org Resolution 4 Identified entity How to buy

Resolution vs identification It is also possible that the PI does not resolve to the identified entity Resolution 1 What the DOI The DOI® (Digital Object Identifier) is a standard for identifying any object of intellectual property. A DOI provides a means of persistently identifying a piece of intellectual property on a digital network and associating it with related current data. Info (metadat a) Rights info On digital networks, all intellectual property is simply a string of bits; a DOI can apply to any form of intellectual property in any digital environment. DOIs have been called "the bar code for intellectual property": like the physical bar code, they are enabling tools for use all through the supply chain to add value and save cost. A DOI differs from commonly used internet pointers to material such as the URL – Uniform Resource Locator, the usual means of referring to World Wide Web material – because it identifies an object as a first-class entity, not simply the place where the object is located. A DOI is also different from commonly used identifiers of intellectual property like standard bibliographic and related identifiers (ISBN, ISSN, ISRC, etc) because it is associated with defined services and is immediately "actionable" on a network. However, the DOI does not compete with these standards since it allows them to be integrated as suffixes in DOI strings. A DOI is an implementation of the Internet concepts of Uniform Resource Name and Universal Resource Identifier. A DOI is different from abstract naming specifications such as URN in that it is a defined identification DOI Resolution 2 www. medra. org Resolution 3 Identified entity (e. g. a book) How to buy.

Arrow: connecting book record to right information Ø The issue: § High transaction costs in managing rights in digital library programmes • Aka as “the orphan work problem”, but it is more than this Ø The need to connect § stuff information (a bibliographic record) § to people information (rightholder contact) § and possibly with license information • E. g. offered by a Collecting Management Organisation www. medra. org Ø We started from a number of information resources created for different purposes Ø We set up a network making those resources interoperable § In use in four countries: Germany, France, UK and Spain § We are working to expand the network to many other European countries (“Arrow Plus”)

Some key characteristics of Arrow Ø A distribution system made interoperable through use of standard § Sometimes cited as a “registry” or a “database”, which is not Ø Separation between right information management and right clearance § This makes the system neutral to legal frameworks and business models Ø Use of different types of bibliographic resources www. medra. org § National library catalogues, VIAF, Books in print, CMO repertoires § (which often use different standards!)

Lessons learned: an identity issue Ø Connecting stuffs, people and information associated to both is a general problem Ø This is first a problem of identity (and identification) § Which entity is relevant in the “stuff” domain? • Unambiguous identification of the book concerned • Unambiguous identification of the work(s) contained in that book § Which entity is relevant in the “people” domain? www. medra. org • Unambiguous identification of the public names • Unambiguous identification of people

Lessons learned: connecting entities Ø We need information associated to works when all the resources are based on books (manifestations) § Need to connect data to data Ø We need information about people and often have information about names § Personal information are delicate: • moving from to • often is from to Ø Definition of relevant entities is worth spending large efforts www. medra. org § Stakeholders awareness and then agreement needed

Lessons learned: managing errors Ø We live in a world with imperfect information § ISBNs (created in 70 ies) are not always there and not always used properly § Work identifier (ISTC) created very recently and at the initial phase of deployment § Right information resident only in proprietary resources www. medra. org Ø Connecting entities (matching, clustering, relation tracking) is always a probabilistic process Ø Need to manage errors § Never promising the true § Combining automatic processes and human intervention § Being transparent on this matter and allow users to check the results

Can our experience be relevant for other application? Ø One example: bibliometric data Ø Indexes rely on data sources, which by definition are imperfect Ø Need to connect: § to to • Different versions (e. g. pre-print, publisher version, etc. ) • Insufficient data about monographs § to • Facilitated by the use of the DOI, but still problematic in many fields Ø Again: managing errors www. medra. org § Need to know errors data § Balancing automatic processes and human intervention

How to build on Ø Once information has been discovered (and possibly assessed), it is crucial that it is re-usable Ø Again: proper use of standards: § Please, don’t call your solution “a standard” § A standard requires broad stakeholders consensus Ø Registering data in the appropriate standard repositories § E. g. : Arrow routine to register the ISTC for every discovered in the process www. medra. org Ø One further step: making information available through use of standard resolution systems § E. g. : your URN resolver is not enough. Use something that is accepted by vast communities (such as Handle)

Identification –> Resolution –> Access Ø Precise and persistent identification is a prerequisite for resolution Ø Resolution facilitates access to precisely identified resource Ø Get a look at “The Answer to the Machine is in the Machine”, one of the Big Ideas for the Digital Agenda launched by the European Commission www. medra. org § The concept: creating stable resolution system between IP entities and information about IP rights § IP right information is not stable, so it cannot be embedded in the manifestation § Through persistent identifiers supported by resolution mechanism, it is possible to connect the entity with IP information

Thank you - Grazie Ø Piero Attanasio piero. attanasio@aie. it www. medra. org Ø Further information on ARROW § www. arrow-net. eu