Скачать презентацию YAGO A Core of Semantic Knowledge Max-Planck Скачать презентацию YAGO A Core of Semantic Knowledge Max-Planck

79e578cd183976edcb02c80ba8b240ae.ppt

  • Количество слайдов: 54

“YAGO – A Core of Semantic Knowledge “YAGO – A Core of Semantic Knowledge" Max-Planck Institute for Computer Science Saarbrücken/Germany by Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Gaby Nativ, SDBI 2007

Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion

Motivation Which NASA astronaut was born when Elvis was born? Motivation Which NASA astronaut was born when Elvis was born?

Motivation Problem : Web pages are designed to be read by people, not machines Motivation Problem : Web pages are designed to be read by people, not machines Solution : Semantic-Web Meaning of information and Services is defined People and machines can use web content

Ontology Knowledge representation language Individuals - instances or objects Classes - concepts or types Ontology Knowledge representation language Individuals - instances or objects Classes - concepts or types of objects Relations – ways that classes and objects can related to one another. Facts - instance of relation between individuals , classes or relations (Elvis Presley, Isa, Singer)

Data Model Directed Labeled Multi Graph G = ( V, E, Lv, Le ) Data Model Directed Labeled Multi Graph G = ( V, E, Lv, Le ) V is a set of vertices E V × V is a multi-set of edges Lv is a set of individual and class labels Le is a set of relation labels With each edge we associate a confidence value

Ontology entity subclass Classes person Relations type born Individuals 1935 means Words subclass Ontology entity subclass Classes person Relations type born Individuals 1935 means Words subclass "Elvis Presley" "The King" astronaut type born ?

Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion

Other Ontologies Assemble the ontology manually: Wordnet SUMO Gene. Ontology Etc’. . Problems: Usually Other Ontologies Assemble the ontology manually: Wordnet SUMO Gene. Ontology Etc’. . Problems: Usually low coverage

Word. Net Semantic lexicon for English language. Developed in Princeton University since 1985 Groups Word. Net Semantic lexicon for English language. Developed in Princeton University since 1985 Groups English words into synsets Providing short , general definition Records a various semantic relations. Contains about 150, 000 words organized in over 115, 000 synsets.

Word. Net Word. Net

Suggested Upper Merged Ontology Concerned itself with meta-level concepts First released in December 2000 Suggested Upper Merged Ontology Concerned itself with meta-level concepts First released in December 2000 Maintained by Articulate Software

Domain Ontology - Gene. Ontology Part of large effort – Open Biomedical Ontologies. Constructed Domain Ontology - Gene. Ontology Part of large effort – Open Biomedical Ontologies. Constructed in 1998 – 3 models biological processes cellular components molecular function As of 2005 GO contained over 19, 000 terms

Other Ontologies Automated extraction of ontology Know. It. All University of Washington Text. To. Other Ontologies Automated extraction of ontology Know. It. All University of Washington Text. To. Onto University of Karlsruhe Use pattern matching & machine learning techniques Problem: Usually low accuracy (50%-92%)

Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA NAGA Conclusion

System architecture Interface Backend Browser User Query Input and Output Tunable Parameters YAGO KB System architecture Interface Backend Browser User Query Input and Output Tunable Parameters YAGO KB NAGA Query Processing & Ranking LEILA Knowledge Acquisition Tools Web

Yet Another Great Ontology Based on decidable and simple model Extensible ontology High coverage Yet Another Great Ontology Based on decidable and simple model Extensible ontology High coverage YAGO knows over 1. 7 M entities , 14 M facts High quality Empirical evaluation : 95% accuracy

YAGO Approach Assemble the ontology from Wikipedia Good Coverage , 7. 83 M entities YAGO Approach Assemble the ontology from Wikipedia Good Coverage , 7. 83 M entities in all languages

Wikipedia -Use the category system Good Accuracy Wikipedia -Use the category system Good Accuracy

Learning to Extract Information by Linguistic Analysis Uses a deep linguistic analysis Machine learning Learning to Extract Information by Linguistic Analysis Uses a deep linguistic analysis Machine learning techniques (SVM) Input A binary target relation A set of Web Documents Extract All pairs of entities that are in the target relation

Results with different relations Results with different relations

Problem: The Upper Model Social_group Business People_by_occupation Classes American_singer type born ? 1935 Problem: The Upper Model Social_group Business People_by_occupation Classes American_singer type born ? 1935

Solution: Wordnet as Upper Model Each synset of Word-Net becomes a class of YAGO Solution: Wordnet as Upper Model Each synset of Word-Net becomes a class of YAGO Extract only Wikipedia’s leaf categories Exclude Known Individuals in Wordnet e. g. Albert Einstein will be excluded 15, 000 cases Word. Net & Wikipedia Conflict in Meaning prefer Wordnet ”Time exposure” is a common noun for Word. Net, but an album title for Wikipedia.

Exploiting the Wikipedia category Elvis Pr blah blub Elvis (don't read this! Better listen Exploiting the Wikipedia category Elvis Pr blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories: 1935_births born. In. Year 1935 Exploit relational categories • born. In. Year • died. In. Year, • Established. In

Exploiting the Wikipedia category American_singer Elvis Pr type born blah blub Elvis (don't read Exploiting the Wikipedia category American_singer Elvis Pr type born blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter 1935 Exploit conceptual categories Categories: • sub. Class. Of American_singers • type

Problem: Thematic Categories Elvis Pr Rock'n_Roll_Music type American_singer type blah blub Elvis (don't read Problem: Thematic Categories Elvis Pr Rock'n_Roll_Music type American_singer type blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories: Rock'n_Roll_Music born 1935 Avoid thematic categories

Thematic vs Conceptual Categories Shallow linguistic noun phrase parsing: American singers of German origin Thematic vs Conceptual Categories Shallow linguistic noun phrase parsing: American singers of German origin Premodifier Head Postmodifier Heuristics: If the head is a plural word, the category is conceptual.

Algorithm Pling stemmer Algorithm Pling stemmer

The YAGO ontology Person subclass Singer means subclass The YAGO ontology Person subclass Singer means subclass "singer" American_singer type born 1935 means "Elvis Presley"

YAGO Meta Storing Witness Storing each individual the URL of the corresponding Wikipedia page YAGO Meta Storing Witness Storing each individual the URL of the corresponding Wikipedia page Storing Confidence

The YAGO ontology Person#3 subclass LEILA Extacted. By Singer#1 means subclass The YAGO ontology Person#3 subclass LEILA Extacted. By Singer#1 means subclass "singer" American_singer type Found. In born wiki/Elvis_Presly 1935 "Elvis Presley" means YAGO - A Core of Semantic Knowledge 31

YAGO: Why binary is not enough ? singer Fact (Elvis, is_a, singer) But only YAGO: Why binary is not enough ? singer Fact (Elvis, is_a, singer) But only from 1953 to 1977 type We know this from Wikipedia

YAGO: Why binary is not enough ? singer time type source 1953 -1977 Wikipedia YAGO: Why binary is not enough ? singer time type source 1953 -1977 Wikipedia 0. 93 LEILA #1 (Elvis, is_a, singer) #2 (#1, time, 1953 -1977) #3 (#1, source, Wikipedia)

The YAGO model formally A YAGO ontology over a set of relations R ( The YAGO model formally A YAGO ontology over a set of relations R ( type, sub. Class. Of) a set of common entities C (entity, class, relation) a set of fact identifiers I Y : I (R C I) R (R I C) We can talk about : facts (#1, source, Wikipedia) additional arguments (#1, time, 1953 -1977) relations (time, has. Range, time_interval)

The YAGO model: Logical aspects Axioms & Rules: person sub. Class. Of singer type The YAGO model: Logical aspects Axioms & Rules: person sub. Class. Of singer type = (x, is_a, y) (y, subclass, z) => (x, is_a, z). . . subclass. Of type a. Cyclic. Transitive. Relation

The YAGO model: Logical aspects Relations Types The YAGO model: Logical aspects Relations Types

The YAGO model: Logical aspects {(r 1, sub. Relation. Of, r 2), (x, r The YAGO model: Logical aspects {(r 1, sub. Relation. Of, r 2), (x, r 1, y)} -> (x, r 2, y) {(r, type, acyclic. Transitive. Relation), (x, r, y), (y, r, z)} -> (x, r, z)} {(r, domain, c), (x, r, c)} -> (x, type, c)} {(r, range, c), (x, r, y)} -> (y, type, c)} {(x, type, c 1), (c 1, sub. Class. Of, c 2)} -> (x, type, c 2)}

The YAGO model: Logical aspects f 1, f 2, f 3, f 4, f The YAGO model: Logical aspects f 1, f 2, f 3, f 4, f 5, f 6, f 7, f 8, f 9, f 10 finite, unique Axioms: derive facts (y, subclass, z) f 1, f 2, f 3, f 4, f 5 Eliminate facts f 1, f 2, f 3 (x, is_a, y) => (x, is_a, z). . . finite, unique

The YAGO model Consistency YAGO ontology is consistent iff x, r : (r , The YAGO model Consistency YAGO ontology is consistent iff x, r : (r , TYPE, acyclic. Transitive. Relation) D(y) (x, r, x) D(y) Since D(y) is finite, the consistency of a YAGO ontology is decidable.

The YAGO ontology: Accuracy Is Lake Victoria “located. In” Tanzania? When entity should be The YAGO ontology: Accuracy Is Lake Victoria “located. In” Tanzania? When entity should be an individual or a class? e. g. Physics is individual of science

The YAGO ontology: Number of Facts 14, 000 2, 000 30, 000 60, 000 The YAGO ontology: Number of Facts 14, 000 2, 000 30, 000 60, 000 200, 000 300, 000 Know. It. All SUMO Word. Net Open. Cyc Yago

YAGO – Web interface http: //www. mpiinf. mpg. de/~suchanek/downloads/yago/ Which astronaut was born in YAGO – Web interface http: //www. mpiinf. mpg. de/~suchanek/downloads/yago/ Which astronaut was born in the same year as Elvis? "Elvis Presley" born. In. Year $year $astro isa astronaut 20 Results

The Answer Roger Bruce Chaffee February 15 , 1935 was a U. S. Navy The Answer Roger Bruce Chaffee February 15 , 1935 was a U. S. Navy pilot who became an American astronaut in the Apollo program. Died during training in the Apollo 1 fire

Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA overiew NAGA overview Conclusion Overview Motivation Other Ontologies System overview YAGO Dive IN LEILA overiew NAGA overview Conclusion

System architecture Interface Backend Browser User Query Input and Output Tunable Parameters YAGO KB System architecture Interface Backend Browser User Query Input and Output Tunable Parameters YAGO KB NAGA Query Processing & Ranking LEILA Knowledge Acquisition Tools Web

The Query Model EVIDENCE QUERY Physicist Is. A Max. Planck born. In Kiel Search The Query Model EVIDENCE QUERY Physicist Is. A Max. Planck born. In Kiel Search the evidence for certain hypothesis DISCOVERY QUERY Physicist Is. A Max Planck born. In. Year $X Discover pieces of missing information Is. A $Y born. In. Year

The Query Model REGULAR EXPRESSION QUERY Is. A Liu Given. Name. Of|family. Name. O The Query Model REGULAR EXPRESSION QUERY Is. A Liu Given. Name. Of|family. Name. O $X scientist Afric a f located. In* $X Is. A An expresion user might be interested in certain Path of relations between pieces of information river

The Query Model RELATEDNESS QUERY Einstein connect Bohr Find a broad relation between pieces The Query Model RELATEDNESS QUERY Einstein connect Bohr Find a broad relation between pieces of information. Both are physicists and both are scientists There are Moon craters and asteroid belts named after them Tom Cruise connects them by being a vegetarian

The Answer Model The answer to a query Q is a subgraph A of The Answer Model The answer to a query Q is a subgraph A of the knowledge graph that matches Q. type Physicist Q: type Max Planck A: born. In. Year $X type $Y Physicist Max Planck born. In. Year 0. 98 0. 96 0. 95 0. 97 born. In. Year 1858 type Mihajlo Puin born. In. Year

The Scoring Model Combines three measures: Extraction Confident The informativeness of a fact (e. The Scoring Model Combines three measures: Extraction Confident The informativeness of a fact (e. g. the fact Albert_Einstein is. A physicist is more informative than Albert_Einstein is. A person) Compactness of answer graph (e. g “How are Einstein and Bohr related? Both Win Nobel then connected by Tom Cruze )

Ranking Performance 55 queries from TREC 2005/2006 12 queries from the work on Sphere. Ranking Performance 55 queries from TREC 2005/2006 12 queries from the work on Sphere. Search 18 regular expression queries The queries were posed to Google, Yahoo! Answers, and NAGA at the same time

Conclusions Semantic Web Vision System Overview YAGO bases on logically clean model accuracy of Conclusions Semantic Web Vision System Overview YAGO bases on logically clean model accuracy of around 95% YAGO is 7 times larger than the largest competitor. Investigate the relationship OWL 1. 1 and YAGO model.

Reference “YAGO – A Core of Semantic Knowledge Reference “YAGO – A Core of Semantic Knowledge" “NAGA: Harvesting, Searching and Ranking Knowledge” “LEILA: Learning to Extract Information by Linguistic Analysis” (Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum …) Available at http: //www. mpii. mpg. de/~suchanek

The End! Questions ? The End! Questions ?