004ae808938e341e4610c1a335188f40.ppt
- Количество слайдов: 35
Finding and Ranking Knowledge on the Semantic Web Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam Kolari University of Maryland, Baltimore County UMBC an Honors University in Maryland http: //creativecommons. org/licenses/by-nc-sa/2. 0/ This work was partially supported by DARPA contract F 30602 -97 -1 -0215, NSF grants CCR 007080 and IIS 9875433 and grants from IBM, Fujitsu and HP. 1
This talk • • • UMBC an Honors University in Maryland Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions 2
Google has made us smarter UMBC an Honors University in Maryland 3
But what about our agents? tell register UMBC an Honors University in Maryland A Google for knowledge on the Semantic Web is needed by people and software agents 4
This talk • • • UMBC an Honors University in Maryland Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions 5
title • text UMBC an Honors University in Maryland 6
Swoogle Architecture data analysis metadata creation SWD discovery IR analyzer SWD analyzer interface Web Server SWD Cache SWD Metadata Web Service Agent Service SWD Reader Candidate URLs The Web Crawler Swoogle 2: 340 K SWDs, 48 M triples, 5 K SWOs, 97 K classes, 55 K properties, 7 M individuals (4/05) Swoogle 3: 700 K SWDs, 135 M triples, 7. 7 K SWOs, (11/05) UMBC an Honors University in Maryland 7
Demo 1 Find “Time” Ontology We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.
Demo 2(a) Digest “Time” Ontology (document view)
Demo 2(b) Digest “Time” Ontology (term view) Time. Zone before …………. int. After
Demo 3 Find Term “Person” Not capitalized! URIref is case sensitive!
Demo 4 Digest Term “Person” 167 different properties 562 different properties
Demo 5(a) Swoogle Today
Demo 5(b) Swoogle Statistics FOAF Trustix W 3 C Stanford UMBC an Honors University in Maryland 14
Swoogle’s Triple Store lets you shop And check out your triples into any of several reasoners UMBC an Honors University in Maryland 15
Summary 2004 Swoogle (Mar, 2004) Swoogle 2 (Sep, 2004) 2005 Swoogle 3 (July 2005) UMBC an Honors University in Maryland q Automated SWD discovery q SWD metadata creation and search q Ontology rank (rational surfer model) q Swoogle watch q Web Interface q Ontology dictionary q Swoogle statistics q Web service interface (WSDL) q Bag of URIref IR search q Triple shopping cart q Better (re-)crawling strategies q Better navigation models q Index instance data q More metadata (ontology mapping and OWL-S services) q Better web service interfaces q IR component for string literals 16
This talk • • • UMBC an Honors University in Maryland Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions 17
The Semantic Web Onion The “Semantic Web” (About 10 M documents) Universal RDF Graph Physically hosting knowledge (About 100 triples per SWD in average) RDF Document Literal Resource Class-instance Molecule Triple triples modifying the same subject Finest lossless set of triples Atomic knowledge block Swoogle maintains metadata about objects in different layers of the Semantic Web Onion. UMBC an Honors University in Maryland 18
Semantic Web Navigation Model same. Namespace, same. Localname Extends class-property bond Term Search 1 RDF graph Resource literal uses populates 2 SWT 3 is. Used. By is. Populated. By Web SWD defines official. Onto is. Defined. By rdfs: sub. Class. Of 6 rdfs: see. Also rdfs: is. Defined. By 5 4 SWO 7 Document Search owl: imports … Navigating the HTML web is simple; there’s just one kind of link. The SW has more kinds of links and hence more navigation paths. UMBC an Honors University in Maryland 19
Semantic Web Navigation Model same. Namespace, same. Localname Extends class-property bond Term Search 1 RDF graph Resource literal uses populates 2 SWT 3 is. Used. By is. Populated. By Web SWD defines official. Onto is. Defined. By rdfs: sub. Class. Of 6 rdfs: see. Also rdfs: is. Defined. By 5 4 SWO 7 Document Search owl: imports … Relations in 1 and 3 and parts of 4 require a global view to discover UMBC an Honors University in Maryland 20
This talk • • • UMBC an Honors University in Maryland Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions 22
Rank has its privilege • Google introduced a new approach to ranking query results using a simple “popularity” metric. – It was a big improvement! • Swoogle ranks its query results also – When searching for an ontology, class or property, wouldn’t one want to see the most used ones first? • Ranking SW content requires different algorithms for different kinds of SW objects – For SWDs, SWTs, individuals, “assertions”, molecules, etc… UMBC an Honors University in Maryland 23
Google’s Page. Rank • A page’s rank is a function of how many links point to it and the rank of the pages hosting those links. • The “random surfer” model provides the intuition: (1) Jump to a random page (2) Select and follow a random link on the page and repeat until ‘bored’ (3) If bored, go to (1) • Ranked pages by the relative frequency with which they are visited. UMBC an Honors University in Maryland Jump to a random page yes bored? no Follow a random link 24
Ranking Semantic Web Documents • Target: a pure SW dataset – Nodes: a collection of online SWDs (330 K SWDs, 1. 5% are labeled as ontologies) – Links: in addition to hyperlinks, term level relations are generalized into TM, EX, IM. • Rational surfer model (extension of weighted Page. Rank) – Semantic content (term level relations) encoded into links – rank of node iteratively spread via links – weight/capacity of link vary according to link semantics – propagate weight to imported ontologies • Evaluation – Method: Compare Onto. Rank with Page. Rank for promoting ontologies even using the same Pure SW Dataset UMBC an Honors University in Maryland 25
An Example http: //www. w 3. org/2000/01/rdf-schema w. PR =300 Onto. Rank =403 TM TM http: //xmlns. com/wordnet/1. 6/ w. PR =3 Onto. Rank =103 EX http: //xmlns. com/foaf/1. 0/ TM w. PR =100 Onto. Rank =100 http: //www. cs. umbc. edu/~finin/foaf. rdf w. PR =0. 2 UMBC an Honors University in Maryland Onto. Rank =0. 2 26
Ontology Dictionary • Motivation – One ontology does not always provide all needed vocabulary – There could be many scenario that requires assembling terms from multiple ontologies • DIY ontology engineering 1. Search an appropriate class C 2. Search for popular properties used for modifying C’s class instance 3. Go back to step 1 if more classes are needed UMBC an Honors University in Maryland 27
Ranking Semantic Web Terms • Pr(Term|Doc) can be measured by the normalized value of the product of the term’s – Popularity: how many SWDs is using the term. – Frequency: how many times the term is used in the SWD • SWDs are accessed non-uniformly by Onto. Rank • Term. Rank estimates a term’s importance as ∑ Pr(Term|Doc) * Onto. Rank(Doc) • Evaluation – Compare Term. Rank with Term’s popularity for the top 10 highest rated terms and compose analytical evaluation. UMBC an Honors University in Maryland 28
Class-Property Bonds Class-Property Bond (introduced by ontology) • foaf: mbox • foaf: name Class-Property Bond (introduced by instances) • foaf: name • dc: title SWD 1 foaf: mbox foaf: name rdfs: domain SWD 3 SWD 2 rdf: type an Honors University in Maryland rdfs: sub. Class. Of “Tim Finin” “Tim’s FOAF File” UMBC owl: Class foaf: Person foaf: name dc: title Class Definition • rdfs: sub. Class. Of -- foaf: Agent • rdfs: label – “Person” rdfs: comment foaf: Agent “a human being” 29
This talk • • • UMBC an Honors University in Maryland Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions 30
Applications and use cases • Supporting Semantic Web developers, e. g. , – Ontology designers – Vocabulary discovery – Who’s using my ontologies or data? – Etc. • Searching specialized collections, e. g. , – Proofs in Inference Web – Text Meaning Representations of news stories in Sem. News • Supporting SW tools, e. g. , – Discovering mappings between ontologies UMBC an Honors University in Maryland 32
This talk • • • UMBC an Honors University in Maryland Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions 36
Will it Scale? How? Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling System/date Terms Documents Individuals Triples Bytes Swoogle 2 1. 5 x 105 3. 5 x 105 7 x 106 5 x 107 7 x 109 Swoogle 3 2 x 105 7 x 105 1. 5 x 107 7. 5 x 107 1 x 1010 2005 2. 5 x 105 5 x 106 5 x 107 5 x 108 5 x 1010 2008 5 x 105 5 x 107 5 x 108 5 x 109 5 x 1011 We think Swoogle’s centralized approach can be made to work for the next few years if not longer. UMBC an Honors University in Maryland 37
How much reasoning? • Swoogle. N (N<=3) does limited reasoning – It’s expensive – It’s not clear how much should be done • More reasoning would benefit many use cases – e. g. , type hierarchy • Recognizing specialized metadata – E. g. , that ontology A some maps terms from B to C UMBC an Honors University in Maryland 38
Conclusion • The web will contain the world’s knowledge in forms accessible to people and computers – We need better ways to discover, index, search and reason over SW knowledge • SW search engines address different tasks than html search engines – So they require different techniques and APIs • Swoogle like systems can help create consensus ontologies and foster best practices UMBC an Honors University in Maryland 39
For more information http: //ebiquity. umbc. edu/ Annotated in OWL UMBC an Honors University in Maryland 40
004ae808938e341e4610c1a335188f40.ppt