79e578cd183976edcb02c80ba8b240ae.ppt
- Количество слайдов: 54
“YAGO – A Core of Semantic Knowledge" Max-Planck Institute for Computer Science Saarbrücken/Germany by Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Gaby Nativ, SDBI 2007
Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion
Motivation Which NASA astronaut was born when Elvis was born?
Motivation Problem : Web pages are designed to be read by people, not machines Solution : Semantic-Web Meaning of information and Services is defined People and machines can use web content
Ontology Knowledge representation language Individuals - instances or objects Classes - concepts or types of objects Relations – ways that classes and objects can related to one another. Facts - instance of relation between individuals , classes or relations (Elvis Presley, Isa, Singer)
Data Model Directed Labeled Multi Graph G = ( V, E, Lv, Le ) V is a set of vertices E V × V is a multi-set of edges Lv is a set of individual and class labels Le is a set of relation labels With each edge we associate a confidence value
Ontology entity subclass Classes person Relations type born Individuals 1935 means Words subclass "Elvis Presley" "The King" astronaut type born ?
Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion
Other Ontologies Assemble the ontology manually: Wordnet SUMO Gene. Ontology Etc’. . Problems: Usually low coverage
Word. Net Semantic lexicon for English language. Developed in Princeton University since 1985 Groups English words into synsets Providing short , general definition Records a various semantic relations. Contains about 150, 000 words organized in over 115, 000 synsets.
Word. Net
Suggested Upper Merged Ontology Concerned itself with meta-level concepts First released in December 2000 Maintained by Articulate Software
Domain Ontology - Gene. Ontology Part of large effort – Open Biomedical Ontologies. Constructed in 1998 – 3 models biological processes cellular components molecular function As of 2005 GO contained over 19, 000 terms
Other Ontologies Automated extraction of ontology Know. It. All University of Washington Text. To. Onto University of Karlsruhe Use pattern matching & machine learning techniques Problem: Usually low accuracy (50%-92%)
Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion
System architecture Interface Backend Browser User Query Input and Output Tunable Parameters YAGO KB NAGA Query Processing & Ranking LEILA Knowledge Acquisition Tools Web
Yet Another Great Ontology Based on decidable and simple model Extensible ontology High coverage YAGO knows over 1. 7 M entities , 14 M facts High quality Empirical evaluation : 95% accuracy
YAGO Approach Assemble the ontology from Wikipedia Good Coverage , 7. 83 M entities in all languages
Wikipedia -Use the category system Good Accuracy
Learning to Extract Information by Linguistic Analysis Uses a deep linguistic analysis Machine learning techniques (SVM) Input A binary target relation A set of Web Documents Extract All pairs of entities that are in the target relation
Results with different relations
Problem: The Upper Model Social_group Business People_by_occupation Classes American_singer type born ? 1935
Solution: Wordnet as Upper Model Each synset of Word-Net becomes a class of YAGO Extract only Wikipedia’s leaf categories Exclude Known Individuals in Wordnet e. g. Albert Einstein will be excluded 15, 000 cases Word. Net & Wikipedia Conflict in Meaning prefer Wordnet ”Time exposure” is a common noun for Word. Net, but an album title for Wikipedia.
Exploiting the Wikipedia category Elvis Pr blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories: 1935_births born. In. Year 1935 Exploit relational categories • born. In. Year • died. In. Year, • Established. In
Exploiting the Wikipedia category American_singer Elvis Pr type born blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter 1935 Exploit conceptual categories Categories: • sub. Class. Of American_singers • type
Problem: Thematic Categories Elvis Pr Rock'n_Roll_Music type American_singer type blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories: Rock'n_Roll_Music born 1935 Avoid thematic categories
Thematic vs Conceptual Categories Shallow linguistic noun phrase parsing: American singers of German origin Premodifier Head Postmodifier Heuristics: If the head is a plural word, the category is conceptual.
Algorithm Pling stemmer
The YAGO ontology Person subclass Singer means subclass "singer" American_singer type born 1935 means "Elvis Presley"
YAGO Meta Storing Witness Storing each individual the URL of the corresponding Wikipedia page Storing Confidence
The YAGO ontology Person#3 subclass LEILA Extacted. By Singer#1 means subclass "singer" American_singer type Found. In born wiki/Elvis_Presly 1935 "Elvis Presley" means YAGO - A Core of Semantic Knowledge 31
YAGO: Why binary is not enough ? singer Fact (Elvis, is_a, singer) But only from 1953 to 1977 type We know this from Wikipedia
YAGO: Why binary is not enough ? singer time type source 1953 -1977 Wikipedia 0. 93 LEILA #1 (Elvis, is_a, singer) #2 (#1, time, 1953 -1977) #3 (#1, source, Wikipedia)
The YAGO model formally A YAGO ontology over a set of relations R ( type, sub. Class. Of) a set of common entities C (entity, class, relation) a set of fact identifiers I Y : I (R C I) R (R I C) We can talk about : facts (#1, source, Wikipedia) additional arguments (#1, time, 1953 -1977) relations (time, has. Range, time_interval)
The YAGO model: Logical aspects Axioms & Rules: person sub. Class. Of singer type = (x, is_a, y) (y, subclass, z) => (x, is_a, z). . . subclass. Of type a. Cyclic. Transitive. Relation
The YAGO model: Logical aspects Relations Types
The YAGO model: Logical aspects {(r 1, sub. Relation. Of, r 2), (x, r 1, y)} -> (x, r 2, y) {(r, type, acyclic. Transitive. Relation), (x, r, y), (y, r, z)} -> (x, r, z)} {(r, domain, c), (x, r, c)} -> (x, type, c)} {(r, range, c), (x, r, y)} -> (y, type, c)} {(x, type, c 1), (c 1, sub. Class. Of, c 2)} -> (x, type, c 2)}
The YAGO model: Logical aspects f 1, f 2, f 3, f 4, f 5, f 6, f 7, f 8, f 9, f 10 finite, unique Axioms: derive facts (y, subclass, z) f 1, f 2, f 3, f 4, f 5 Eliminate facts f 1, f 2, f 3 (x, is_a, y) => (x, is_a, z). . . finite, unique
The YAGO model Consistency YAGO ontology is consistent iff x, r : (r , TYPE, acyclic. Transitive. Relation) D(y) (x, r, x) D(y) Since D(y) is finite, the consistency of a YAGO ontology is decidable.
The YAGO ontology: Accuracy Is Lake Victoria “located. In” Tanzania? When entity should be an individual or a class? e. g. Physics is individual of science
The YAGO ontology: Number of Facts 14, 000 2, 000 30, 000 60, 000 200, 000 300, 000 Know. It. All SUMO Word. Net Open. Cyc Yago
YAGO – Web interface http: //www. mpiinf. mpg. de/~suchanek/downloads/yago/ Which astronaut was born in the same year as Elvis? "Elvis Presley" born. In. Year $year $astro isa astronaut 20 Results
The Answer Roger Bruce Chaffee February 15 , 1935 was a U. S. Navy pilot who became an American astronaut in the Apollo program. Died during training in the Apollo 1 fire
Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA overiew NAGA overview Conclusion
System architecture Interface Backend Browser User Query Input and Output Tunable Parameters YAGO KB NAGA Query Processing & Ranking LEILA Knowledge Acquisition Tools Web
The Query Model EVIDENCE QUERY Physicist Is. A Max. Planck born. In Kiel Search the evidence for certain hypothesis DISCOVERY QUERY Physicist Is. A Max Planck born. In. Year $X Discover pieces of missing information Is. A $Y born. In. Year
The Query Model REGULAR EXPRESSION QUERY Is. A Liu Given. Name. Of|family. Name. O $X scientist Afric a f located. In* $X Is. A An expresion user might be interested in certain Path of relations between pieces of information river
The Query Model RELATEDNESS QUERY Einstein connect Bohr Find a broad relation between pieces of information. Both are physicists and both are scientists There are Moon craters and asteroid belts named after them Tom Cruise connects them by being a vegetarian
The Answer Model The answer to a query Q is a subgraph A of the knowledge graph that matches Q. type Physicist Q: type Max Planck A: born. In. Year $X type $Y Physicist Max Planck born. In. Year 0. 98 0. 96 0. 95 0. 97 born. In. Year 1858 type Mihajlo Puin born. In. Year
The Scoring Model Combines three measures: Extraction Confident The informativeness of a fact (e. g. the fact Albert_Einstein is. A physicist is more informative than Albert_Einstein is. A person) Compactness of answer graph (e. g “How are Einstein and Bohr related? Both Win Nobel then connected by Tom Cruze )
Ranking Performance 55 queries from TREC 2005/2006 12 queries from the work on Sphere. Search 18 regular expression queries The queries were posed to Google, Yahoo! Answers, and NAGA at the same time
Conclusions Semantic Web Vision System Overview YAGO bases on logically clean model accuracy of around 95% YAGO is 7 times larger than the largest competitor. Investigate the relationship OWL 1. 1 and YAGO model.
Reference “YAGO – A Core of Semantic Knowledge" “NAGA: Harvesting, Searching and Ranking Knowledge” “LEILA: Learning to Extract Information by Linguistic Analysis” (Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum …) Available at http: //www. mpii. mpg. de/~suchanek
The End! Questions ?