e0a1502f517b43fb19b45038b5e6a804.ppt
- Количество слайдов: 61
Linked Library Data Modeling Metadata for the [Semantic] Web Presented 2010 -11 -19 Columbia University 2010 -11 -19 Digital Library Seminar Series Corey A Harper - Linked Library Data Columbia University
Topical Overview • Semantic Web Intro • Linked Open Data – Graphs: Entity – Attribute – Value – A Few Examples • Library Data 2010 -11 -19 Harper - Linked Library Data - Columbia University 2
Topical Overview (cont) • Linked Library Data – SKOS and Authority Control – FRBR and Bibliographic Data – National Libraries • Resource Description and Access (RDA) • Dublin Core Metadata Initiative 2010 -11 -19 Harper - Linked Library Data - Columbia University 3
Semantic Web • TBL’s original vision – “Weaving the Web” – 1999 • Then: Focus on Machine Reasoning – Scientific American Article • Now: Focus on things & links – Reasoning becoming lower level 2010 -11 -19 Harper - Linked Library Data - Columbia University 4
Semantic Web • Originally: – Metadata standard built on XML – Metadata about “Web” things • Eventually: – Metadata about all things – Metadata about relationships between things 2010 -11 -19 Harper - Linked Library Data - Columbia University 5
Semantic Web Terminology • • • Resource: Any thing Class: Abstraction of a type of thing Individual: An instance of a class Property: An attribute of an individual Ontology: A domain specific collection of classes and properties • Statement/Triple: – A Resource (subject) - Nodes – A Property (predicate) - Arcs – A Value (object) - Nodes 2010 -11 -19 Harper - Linked Library Data - Columbia University 6
Semantic Web Terminology • Graphs: Representations of statements about resources • Nodes: The Subjects and Objects in a Graph • Arcs: The Predicates in a Graph • Literals: “Objects” represented as strings (constant values) rather than things (URI References) • Domains and Ranges: Constraints on Nodes • For Example… 2010 -11 -19 Harper - Linked Library Data - Columbia University 7
2010 -11 -19 Harper - Linked Library Data - Columbia University 8
RDF • Resource Description Framework • Formally Begun in 1999 • Ideas from 1995 • Finalized in 2004 • Frighteningly complex at times… – “Directed Labeled Graphs” 2010 -11 -19 Harper - Linked Library Data - Columbia University 9
Sem. Web Value Proposition • Formally Modeled (Meta) Data • Formal Semantics Declaration • Increased Granularity compared to record-based Metadata • Improved Interoperability 2010 -11 -19 Harper - Linked Library Data - Columbia University 10
“The vast bulk of data to be on the Semantic Web is already sitting in databases … all that is needed [is] to write an adapter to convert a particular format into RDF and all the content in that format is available. ” -Tim Berners-Lee in an interview with the Consortium Standards Bulletin 2010 -11 -19 Harper - Linked Library Data - Columbia University 11
Linked Open Data • Use URIs as names for things • Use HTTP URIs so that people can look up those names. • When someone looks up a URI, provide useful information. • Include links to other URIs. so that they can discover more things. http: //www. w 3. org/Design. Issues/Linked. Data. html 2010 -11 -19 Harper - Linked Library Data - Columbia University 12
2010 -11 -19 Harper - Linked Library Data - Columbia University 13
2010 -11 -19 Harper - Linked Library Data - Columbia University 14
2010 -11 -19 Harper - Linked Library Data - Columbia University 15
Linked Data Cloud • Automated generation – Comprehensive Knowledge Archive Network (CKAN) – Vocabulary of Interlinked Datasets (voi. D) – Basically, catalog your metadata! • Recent criticism: data quality 2010 -11 -19 Harper - Linked Library Data - Columbia University 16
Data in the Cloud • Hubs in the May 2008 Version: – FOAF – DBPedia –Geonames –Music. Brains • Myriad Sources coming online: – Thompson Reuters – New York Times – British Broadcasting Corporation – Google and Facebook – More and More Library Data 2010 -11 -19 Harper - Linked Library Data - Columbia University 17
DBpedia • Structured Wikipedia Data • Genres, Influences, External Links • Multi-lingual / Multi-script labels • Rich Semantics • Many linkages to other datasets 2010 -11 -19 Harper - Linked Library Data - Columbia University 18
DBpedia • 3. 4 Million “things” described • Ontology based on “infoboxes” – 1. 5 million things classified • Approx. 50, 000 “Properties” – Approx. 1, 200 defined in ontology • Brief Example 2010 -11 -19 Harper - Linked Library Data - Columbia University 19
Domain Modeling • Starting from application / goal / function “To guide and evaluate our designs, we need objective criteria that are founded on the purpose of the resulting artifact, rather than based on a priori notions of naturalness or Truth. ” – Gruber, 1993 • Does this apply to Libraries? FRBRer? 2010 -11 -19 Harper - Linked Library Data - Columbia University 20
DBPedia Model • • Partial basis in data entry conventions Info. Box’s, and Info. Box Templates Metadata Entry Format Partial source of Ontology – Class Structure – Vocabulary Design 2010 -11 -19 Harper - Linked Library Data - Columbia University 21
DBpedia • 3. 4 Million “things” described • Ontology based on “infoboxes” – 1. 5 million things classified – http: //wiki. dbpedia. org/Ontology • Approx. 50, 000 “Properties” – Approx. 1, 200 defined in ontology 2010 -11 -19 Harper - Linked Library Data - Columbia University 22
2010 -11 -19 Harper - Linked Library Data - Columbia University 23
2010 -11 -19 Harper - Linked Library Data - Columbia University 24
2010 -11 -19 Harper - Linked Library Data - Columbia University 25
More Examples • British Broadcasting Corporation – Programmes, Music, Wildlife • Google Refine • Data. gov and data. gov. uk • NY Times 2010 -11 -19 Harper - Linked Library Data - Columbia University 26
What *things* are in our data? ? ? 2010 -11 -19 Harper - Linked Library Data - Columbia University 27
…Library data is extremely complicated 2010 -11 -19 Harper - Linked Library Data - Columbia University 28
Bibliographic Data • Rich stores of MARC, MODS, &c. • Robust Controlled Vocabularies – Subject Heading lists – Code lists – Thesauri • Emerging data model in FR* 2010 -11 -19 Harper - Linked Library Data - Columbia University 29
Bibliographic Vocabs • Bibliographic Ontology – Zotero, Omeka, EPrints and Others • FRBR – unofficial – And now Official (Thank you IFLA!) • ISBD 2010 -11 -19 Harper - Linked Library Data - Columbia University 30
Library Authority Data “Include links to other URIs. so that they can discover more things. ” Short of providing and linking to URIs, this *is* authority data. This is what our authority files are for. 2010 -11 -19 Harper - Linked Library Data - Columbia University 31
Library Controlled Vocabularies: Benefits • Reputation - Trusted Tradition • Mature - Time tested and carefully developed • General & Comprehensive - Cover large knowledge spaces 2010 -11 -19 Harper - Linked Library Data - Columbia University 32
SKOS • Simple Knowledge Organization System • Properties and Classes for describing Controlled Vocabulary RDF Page 2010 -11 -19 skos: primary. Topic skos: person Harper - Linked Library Data - Columbia University 33
LCSH in Dublin Core • Encoding Scheme for DC Subject • No easy way to draw on equivelent terms and cross-references • Abstract Model, RDF and SKOS could enable applications to make use of the whole vocabulary 2010 -11 -19 Harper - Linked Library Data - Columbia University 34
LCSH as a Web Service! • Uses principles of linked data • lcsh. info -> id. loc. gov • People noticed when taken down • Links to French Subject Headings • URIs for Literal String lookup • http: //id. loc. gov/authorities/label/World Wide Web 2010 -11 -19 Harper - Linked Library Data - Columbia University 35
2010 -11 -19 Harper - Linked Library Data - Columbia University 36
Other Vocabularies • • Thesaurus for Economics French Subject Headings Swedish Subject Headings Icon. Class (not on web yet) OCLC Terminology Services Dewey Decimal Classification Virtual International Authority File 2010 -11 -19 Harper - Linked Library Data - Columbia University 37
Linked Library Data • • VIAF, LCSH, MARC Codes Open Library, XC, Kualli OLE Library of Congress, OCLC Hungarian, German, British, Swedish National Libraries • Formalized Efforts: W 3 C, IFLA & RDA 2010 -11 -19 Harper - Linked Library Data - Columbia University 38
Image courtesy of Martin Malmstem http: //blog. libris. kb. se/semweb/? p=7 Kungliga Biblioteket 2010 -11 -19 Harper - Linked Library Data - Columbia University 39
National Széchényi Library “Our RDFDC, FAOF and SKOS statements are linked together. Our name authority is matched with the DBPedia name files and URI aliases are handled as owl: same. As statements. ” Adam Horvath 2010 -11 -19 Harper - Linked Library Data - Columbia University 40
W 3 C LLD XG • “Incubator Group” • Membership: – Researchers, Consultants, Librarians – National Libraries: Germany, France, Lo. C, Sweden – OCLC & IFLA 2010 -11 -19 Harper - Linked Library Data - Columbia University 41
2010 -11 -19 Harper - Linked Library Data - Columbia University 42
W 3 C LLD XG Goals • Collecting, Curating and Clustering over 50 Use Cases • Mining use cases for functional requirements and design patterns • Recommendations to W 3 C – Should lead to Working Groups 2010 -11 -19 Harper - Linked Library Data - Columbia University 43
• RDA elements, roles and vocabularies have been provisionally registered • IFLA FRBRer and ISBD elements and vocabularies have been officially registered • Discussions about long term maintenance of both RDA and the vocabularies • Effort to create multi-language RDA Vocabularies 2010 -11 -19 Harper - Linked Library Data - Columbia University 44 RDA Slides Adapted from Diane Hillmann RDA Development
RDA Elements Listing 334! 2010 -11 -19 Harper - Linked Library Data - Columbia University 45
RDA Elements Listing Base material 334! 2010 -11 -19 Harper - Linked Library Data - Columbia University 46
Detail: Base Material 2010 -11 -19 Harper - Linked Library Data - Columbia University 47
Detail: Base Material URI 2010 -11 -19 Harper - Linked Library Data - Columbia University 48
RDA Base Material Vocabulary 2010 -11 -19 Harper - Linked Library Data - Columbia University 49
RDA WEMI Relationships 2010 -11 -19 Harper - Linked Library Data - Columbia University 50
Detail: RDA WEMI Relationship 2010 -11 -19 Harper - Linked Library Data - Columbia University 51
Metadata Registries • Formerly NSDL Registry – Now “Open Metadata Registry” – Managing Vocabularies – Providing Vocabulary Services • DCMI Registry Community • DCMI Architecture Forum 2010 -11 -19 Harper - Linked Library Data - Columbia University 52
DCMI and the Semantic Web • Collaboration from the start • Libraries (esp. OCLC) were at the table • Perception of DCMI as DCMES – DCMI = Metedata Vocab / Framework – DCMES = Metadata Record Format 2010 -11 -19 Harper - Linked Library Data - Columbia University 53
DCMI and the Semantic Web • Every example above had dcterms • DCMI as Research Institute and Metadata Think Tank – Modeling Work – Metadata Registries – Application Profiles – Description Set Profiles – Singapore Framework 2010 -11 -19 Harper - Linked Library Data - Columbia University 54
Changing Role of DCMI • Mike Bergman at DC 2010: – Reference Metadata – Reference Concepts – Mapping Predicates • “Mappings should be approximate” – Usage Guidelines • Compliment to W 3 C Standards 2010 -11 -19 Harper - Linked Library Data - Columbia University 55
Why Does This Matter? Our descriptions no longer stand alone! Connect our data with the rest of the WEB Allow others to reuse more easily – – – – FOAF DBPedia Geonames Music. Brains New York Times Thompson Reuters Government Data - data. gov British Broadcasting Corporation 2010 -11 -19 Harper - Linked Library Data - Columbia University 56
Conclusions • Distributed bibliographic control environment – Linking Data – Focus on identification over description • “In short, by treating values as nonliteral resources and assigning URIs to them we give ourselves (and others) the hooks on which to hang further descriptions. ” - Andy Powell 2010 -11 -19 Harper - Linked Library Data - Columbia University 57
Endless possibilities • This barely scratches the surface • The Giant Global Graph!! • With more soundly modeled bibliographic and authority data… – – Mashups Web Services User Profiling Collaboration tools 2010 -11 -19 – Terminology Services – Context sensitive interfaces – Customized Exhibits Harper - Linked Library Data - Columbia University 58
Continuing Challenges • Emerging Technology • Design Patterns • Complexity (http-range 14) • Existing Technical Infrastructure • Bootstrapping • Business Cases 2010 -11 -19 Harper - Linked Library Data - Columbia University 59
More Information • W 3 C LLD XG: http: //www. w 3. org/2005/Incubator/lld/wiki/Main_Page • ALA LLD Interest Group: – http: //kcoyle. net/lld-ala. html • IFLA Semantic Web SIG – https: //wiki. d-nb. de/x/v. A 10 Ag 2010 -11 -19 Harper - Linked Library Data - Columbia University 60
Thanks! corey. harper@nyu. edu 212. 998. 2479 Questions? 2010 -11 -19 Harper - Linked Library Data - Columbia University 61
e0a1502f517b43fb19b45038b5e6a804.ppt