6bd138084ef8636724777b87eeb39e63.ppt
- Количество слайдов: 33
To name: persistently: ay, there’s the rub Andy Powell, UKOLN, University of Bath a. powell@ukoln. ac. uk DCC Persistent Identifiers Workshop, University of Glasgow – June 2005 UKOLN is supported by: www. bath. ac. uk www. ukoln. ac. uk a centre of expertise in digital information management DCC Persistent Identifiers
Contents • • • beginning middle end chance for discussion note: middle section will focus on technical functional requirements DCC Persistent Identifiers
Overall theme digital tifier orm of iden ful f the only use is the URI, at RI is one th m of U ly useful for eme, the on gistered sch a re conforms to ary… to the contr rances espite appea d eful the most us … I the ‘http’ UR eme is gistered sch re scheme note use of “is” in the first line… “can be mapped to a URI” is not good enough! DCC Persistent Identifiers
Introduction • PI meetings often try to focus on functional requirements… • uniqueness, persistence, resolvability, usability, transportability, simplicity of assignment, applicability to digital and non digital resources, cost, blah, blah… • difficult, because requirements are abstract… not clear how to meet them in practice - difficult to move forward • forget abstract functional requirements… let’s get technical DCC Persistent Identifiers
Internet space Space/time continuum “Internet space” represents some combination of geographic / network distance and domain / administration / application distance… “time” represents time… time my application DCC Persistent Identifiers 5
Internet space Space/time continuum applications that are closely related in terms of space or time likely to share understanding about identifiers – often by hardwiring knowledge into code other application time my application DCC Persistent Identifiers 6
Internet space Space/time continuum other application applications that are “distant” are less likely to share understanding about identifiers knowledge locked within domain or lost over time or, worse, both other application time my application DCC Persistent Identifiers 7
Pushing the boundaries • how do we push the boundaries of identifier understanding further out across the space/time continuum? – standards, standards – go with the crowd – stop telling people existing stuff is broken… or at least, we stop pretending that we can do better! – use what already works and is widely deployed – focus on existing technical standards – stop inventing, start doing DCC Persistent Identifiers
W 3 C Web Architecture al network effects. l naming leads to glob - Globa • Global Identifiers lue of the (Principle) om and increase the va - To benefit fr • Identify with URIs as identifiers for ents should provide U World Wide Web, ag tice) resources. (Good prac inct URIs to distinct esource - Assign dist gle R • URIs Identify a Sin t) resources. (Constrain ULD NOT associate s - A URI owner SHO d practice) • Avoiding URI aliase e same resource. (Goo Is with th tr arbi arily different UR URI SHOULD agent that receives a ge - An • Consistent URI usa URI, character-bysource using the same re refer to the associated tice) an existing character. (Good prac cation SHOULD reuse es - A specifi provides the • Reuse URI schem te a new one) when it an crea URI scheme (rather th to resources. iers and their relation entif desired properties of id T attempt to (Good practice) of URIs SHOULD NO s making use • URI opacity - Agent ood practice) referenced resource. (G infer properties of the http: //www. w 3. org/TR/webarch/ DCC Persistent Identifiers
URIs and XML • in order for identifiers to work across the space/time continuum we need – global and unambiguous identifiers – global and unambiguous ways of exchanging identifiers between software applications • the Uniform Resource Identifier is the only option for the former • XML is the “best” option for the latter – and in particular the XML Schema Any. URI datatype “global” means “very widely deployed technology” – e. g. in my mum’s house! DCC Persistent Identifiers
Use URIs digital tifier orm of iden ful f the only use is the URI, at RI is one th m of U ly useful for eme, the on gistered sch a re conforms to ary… to the contr rances espite appea d eful the most us … I the ‘http’ UR eme is gistered sch re scheme DCC Persistent Identifiers
URI scheme registration • registration of URI schemes is important • registration helps to ensure uniqueness • without registration the same scheme can be used in ignorance by someone, somewhere else in the space/time continuum • registration doesn’t guarantee that every URI with a scheme will be unique – but it helps! • without registration there are no guarantees of uniqueness or persistence DCC Persistent Identifiers
Use registered URI schemes digital tifier orm of iden ful f the only use is the URI, at RI is one th m of U ly useful for eme, the on gistered sch a re conforms to ary… to the contr rances espite appea d eful the most us … I the ‘http’ UR eme is gistered sch re scheme DCC Persistent Identifiers
Semantic Web • the Semantic Web relies on URIs to identify resources • resources == stuff (digital/physical/conceptual things) • the semantic Web is built on a global, shared body of metadata (RDF) • terms in the metadata language are identified using URIs • those URIs must be “resolvable”… in order that “reasoning” can be performed DCC Persistent Identifiers
Note: dereferencing URIs • the Web Architecture talks about “dereferencing” URIs rather than “resolving” them – in many cases “dereferencing” a URI results in obtaining a “representation” of the resource – several representations may be available • the Web Architecture says: • SHOULD provide tation - A URI owner Available represen s (Good practice) the resource it identifie representations of http: //www. w 3. org/TR/webarch/ • only ‘http’ URIs offer simple, widely deployed dereferencing mechanism DCC Persistent Identifiers
Quick quiz… • what kind of identifier is this? – 1361 -3200 is an ISSN it identifies UKOLN’s Ariadne magazine DCC Persistent Identifiers
Quick quiz… • what kind of identifier is this? – 1361 -3200 – info: lccn/n 78890351 is an ‘info’ URI it identifies a Library of Congress metadata record (an authority file) but I don’t know which DCC Persistent Identifiers
Quick quiz… • what kind of identifier is this? – 1361 -3200 – info: lccn/n 78890351 – 10. 1000/182 is a DOI it is also a Handle it identifies the “DOI Handbook” DCC Persistent Identifiers
Quick quiz… • what kind of identifier is this? – 1361 -3200 – info: lccn/n 78890351 – 10. 1000/182 – 79 Ceti is a Flamsteed Designation it identifies a 7 th magnitude star in the constellation of Cetus DCC Persistent Identifiers
Quick quiz… • what kind of identifier is this? – 1361 -3200 – info: lccn/n 78890351 – 10. 1000/182 – 79 Ceti – http: //purl. org/dc/terms/audience is an ‘http’ URI a. k. a. a URL it is also a PURL it identifies a DCMI metadata term – i. e. a conceptual resource DCC Persistent Identifiers
Quick quiz… • what kind of identifier is this? – 1361 -3200 – info: lccn/n 78890351 – 10. 1000/182 – 79 Ceti – http: //purl. org/dc/terms/audience • only one of these can be understood and dereferenced by every single bit of currently deployed Internet software… Question: why would we want to use anything else? DCC Persistent Identifiers
But… But, ‘http’ URIs are just locators aren’t they? • ‘http’ URIs are identifiers, just like any other But, ‘http’ URIs can only be used for Web resources, accessed over HTTP, can’t they? • ‘http’ URIs can identify any resource – digital, physical or conceptual But, ‘http’ URIs break every 30 days or something, don’t they? • ‘http’ URIs don’t have to break, they just need to be assigned/managed carefully DCC Persistent Identifiers
Use ‘http’ URIs tifier orm of iden ful f the only use is the URI, at RI is one th m of U ly useful for eme, the on gistered sch a re conforms to ary… to the contr rances espite appea d me is istered sche ful reg he most use …t cheme ‘http’ URI s the DCC Persistent Identifiers
Case study 1 - LOM • XML example from IEEE LOM… • typical of many XML / identifier encodings … <general> <identifier> <catalog>DOI</catalog> <entry>10. 1000/182</entry> </identifier> … </general> … DCC Persistent Identifiers
Case study 1 - LOM • the “catalogue” indicates what kinds of identifier is being used … just a string <general> <identifier> <catalog>URI</catalog> <entry>http: //purl. org/poi/rdn. ac. uk/12 -34</entry> </identifier> … </general> just a string … nothing in the XML schema indicates that this is a URI – some applications will be blind DCC Persistent Identifiers
Case study 1 - LOM • a improved version might be… … <general> <identifier>http: //purl. org/poi/rdn. ac. uk/12 -34</identifier> … </general> … where the XML schema indicates that this is of datatype Any. URI therefore all XML-aware applications will know this is a URI DCC Persistent Identifiers
Case study 1 - LOM • a improved version might be… … <general> <identifier>doi: 10. 1000/182</identifier> … </general> … the URI syntax provides the “catalogue” from the original example all XML-aware applications will know this is a URI, some will know it is a DOI DCC Persistent Identifiers
Case study 2 - DOI http: //dx. doi. org/10. 1000/182 • the DOI “ 10. 1000/182” can be encoded as a URI in several ways: – http: //dx. doi. org/10. 1000/182 – doi: 10. 1000/182 – urn: doi: 10. 1000/182 • however… Question: which of these – DOI-aware applications havemost persistent forms is to have knowledge of these encodings and why? into them (since hard-coded the DOI itself is just a string) – nothing in the URI specification indicates that these URIs are equivalent – note that the 2 nd and 3 rd forms are not registered DCC Persistent Identifiers
Case study 3 – ‘info’ URI http: //info-uri. info/registry/ • consider the following ‘info’ URI: – info: lccn/n 78890351 • ‘info’ URIs are explicitly defined to be non -dereferencable • therefore, there is no documented way of finding out what this URI identifies • there is no documented way of getting a representation of the resource it identifies • and there is no documented way of finding out any more about it Question: how is this useful? DCC Persistent Identifiers
But, what happens when… • • …the Internet disappears? who cares! we’ll deal with it we’ll be with the crowd there’ll be a global transition everyone will need to deal with it every software component on the whole Internet will need fixing • the people left behind will be the people who invented their own solutions DCC Persistent Identifiers
Conclusion digital tifier orm of iden ful f the only use is the URI, at RI is one th m of U ly useful for eme, the on gistered sch a re e conforms to scheme is th istered t useful reg the mos eme ttp’ URI sch ‘h really Is are only UR citly ntaxes expli ding sy seful if enco I” u “this is a UR indicate DCC Persistent Identifiers
Questions and discussion? DCC Persistent Identifiers 32
Discussion • What do we need to do to make ‘http’ URIs more persistent? (Are ‘http’ URIs the answer after all? ) • Do we have functional requirements that aren’t met by ‘http’ URIs? (When and why should we create new URI schemes? ) DCC Persistent Identifiers
6bd138084ef8636724777b87eeb39e63.ppt