ad7cb327f03d386cedbcc3bb3afb4d1e.ppt
- Количество слайдов: 40
Learning the Semantic Meaning of a Concept from the Web Yang Yu Master’s Thesis Defense August 03, 2006
The Problem n Manually preparing training data for text classification based ontology mapping is expensive. LIVING_THINGS ANIMAL HUMAN PLANT CAT WOMAN TREE ARBOR GRASS FRUTEX 2
The Thesis n Solution q n Automatically collecting training data for the concept defined in an ontology. Contribution q q Reduce the amount of human work Fully automated ontology mapping http: //www. google. com/ 3
Overview n Background q q n n n Proposal System Experimental Results q q n The semantic Web and ontology Ontology Mapping WEAPONS ontology LIVING_THINGS ontology Discussions and Conclusion 4
Semantic Web and Ontology n What is it? q n Find all types of jets that are made in the USA “an extension of the current web” An Example Made-in WA part. Of USA 5
Ontology Mapping n Definition q q n r = f (Ci, Cj) where i=1, …, n and j=1, …, m; r {equivalent, sub. Class. Of, super. Class. Of, complement, overlapped, other} Interoperability problem q Independently developed ontologies for the same or overlapped domain 6
Approaches to Ontology Mapping n n n Manual mapping String Matching Text classification q q the semantic meaning of a concept is reflected in the training data that use the concept Probabilistic feature model Classification Results highly depend on training data 7
Motivation n Preparing exemplars manually is costly n Billions of documents available on the web q Search engines 8
The Proposal n n Using the concept defined in an ontology as a query and processing the search results to obtain exemplars Verification q q Build a prototype system Check ontology mapping results 9
System overview – Part I Parser Queries Links to Web Pages Retriever Search Engine Ontology A Retriever Text Files WWW Processor HTML Docs 10
The parser (Query expansion) FOOD FRUIT Concepts living+things animal plant cat human woman tree grass frutex arbor Queries living+things+animal living+things+plant living+things+animal+cat living+things+animal+human+man living+things+animal+human+woman living+things+plant+tree living+things+plant+grass living+things+plant+tree+Frutex living+things+plant+tree+arbor FOOD+FRUIT+APPLE ORANGE APPLE 11
The retriever 12
The processor 13
Naïve Bayes text classifier n Bow toolkit q Mc. Callum, Andrew Kachites, Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering, http: //www. cs. cmu. edu/~mccallum/bow 1996. q q n rainbow -d model --index dir/* rainbow –d model –query Bayes Rule q Naïve Bayes text classifier 14
Bayes Rule n P (A | B) = P (B | A) * P (A) P (B) posterior Prior Normalizing constant P(A, B) B A P (B | A) = P (A, B) / P (A) P (A | B) = P (A, B) / P (B) Mitchell Tom, Machine Learning, Mc. Graw Hill, 1997 15
Naïve Bayes classifier n A text classification problem q “What’s the most probable classification of the new instance given the training data? ” n n q vj: category j. (a 1, a 2, …, an): attributes of a new document So Naïve (Mitchell Tom, Machine Learning, Mc. Graw Hill) 1997 16
System overview– Part II Text Files (A) Ontology A Model Builder Ontology B Text Files (B) Rainbow Feature Model Calculator Rainbow Mapping Results 17
The model builder n Mutually exclusive and exhaustive Leaf classes q C+ and Cq LIVING_THINGS ANIMAL HUMAN MAN PLANT CAT WOMAN TREE ARBOR GRASS FRUTEX 18
The calculator n n Naïve Bayes text classifier tends to give extreme values (1/0) Tasks q q q Feed exemplars to the classifier one by one Keep records of classification results Take averages and generate report 19
An Example of the Calculator TANK-VEHICLE APC Classifier 200 AIR-DEFENSE-GUN SAUDI-NAVALMISSILE-CRAFT Categories in Weapons. A. n 3 Num. of exemplars P(TANK-VEHICLE | APC) = 170 /200= 0. 85 TANK-VEHICLE 170 P(AIR-DEFENSE-GUN | APC) = 0. 10 AIR-DEFENSE-GUN 20 P(SAUDI-NAVAL-MISSILE-CRAFT| APC) = 0. 05 SAUDI-NAVALMISSILE-CRAFT 10 20
Experiments with WEAPONS ontology n n Information Interpretation and Integration Conference (http: //www. atl. lmco. com/projects/ontology/i 3 con. html) Weapons. A. n 3 and Weapons. B. n 3 q q q Both over 80 classes defined More than 60 classes are leaf classes Similar structure 21
Weapons. A. n 3 Part of Weapons. A. n 3 WEAPON CONVENTIONALWEAPON ARMOREDCOMBAT-VEHICLE TANK-VEHICLE - MODERNNAVAL-SHIP AIRCRAFTCARRIER PATROL-CRAFT WARPLANE SUPERETENDARD 22
Weapons. B. n 3 Part of Weapons. B. n 3 WEAPON CONVENTIONALWEAPON ARMOREDCOMBAT-VEHICLE AIRCRAFTCARRIER TANK-VEHICLE - LIGHT-TANK MODERNNAVAL-SHIP APC WARPLANE PATROLWARTER-CRAFT FIGHTER-PLANE FIGHTER-ATTACK -PLANE LIGHTAIRCRAFTCARRIER PATROLBOATRIVER PATROLBOAT SUPERETENDARDFIGHTER 23
Expected Results TANK-VEHICLE APC AIRCRAFTCARRIER LIGHTAIRCRAFTCARRIER PATROLCRAFT PATROLWARTERCRAFT LIGHT-TANK PATROLBOATRIVER PATROLBOAT SUPERETENDARD FIGHTERPLANE FIGHTERATTACK-PLANE SUPERETENDARDFIGHTER 24
A Typical Report P(APC | Ci) where i = 1 … 63 APC SELF-PROPELLEDARTILLERY 0. 357180681 TANK-VEHICLE 0. 277139274 ICBM 0. 10423636 MRBM 0. 080615147 TOWED-ARTILLERY 0. 054724102 SUPPORT-VESSEL 0. 023265054 PATROL-CRAFT 0. 019570325 MOLOTOV-COCKTAIL 0. 015032411 TORPEDO-CRAFT 0. 013677696 SUPER-ETENDARD 0. 009856519 MORTAR AIR-DEFENSE-GUN 0. 00772997 0. 002997109 . . . …… MACHINE-GUN 0. 000211772 MOLOTOVCOCKTAIL 0. 000187578 TRUCK-BOMB 0. 000171675 AS-9 -KYLE-ALCM 0. 000156403 ARABIL-100 MISSILE 0. 000111953 AL-HIJARAHMISSILE 7. 65 E-05 OGHAB-MISSILE 7. 12 E-05 BADAR-2000 4. 28 E-05 25
classes with highest conditional probability New Classes Whole file Prob Sentences with Keywords Prob LIGHT-AIRCRAFT-CARRIER 0. 65 AIRCRAFT-CARRIER P(TANK-VEHICLE | APC ) = 0. 28 0. 57 APC SILKWORM-MISSILEMOD 0. 46 SELF-PROPELLEDARTILLERY 0. 36 SUPER-ETENDARD-FIGHTER SILKWORM-MISSILEMOD 0. 66 MRBM 0. 51 SILKWORM-MISSILEP(SUPER-ETENDARD | SUPER-ETENDARD-FIGHTER ) = 0. 21 FIGHTER-ATTACK-PLANE MOD 0. 83 MRBM 0. 38 PATROL-WATERCRAFT SILKWORM-MISSILEMOD 0. 28 PATROL-CRAFT 0. 52 PATROL-BOAT-RIVER SILKWORM-MISSILEMOD 0. 65 PATROL-CRAFT 0. 54 PATROL-BOAT SILKWORM-MISSILEMOD 0. 51 PATROL-CRAFT 0. 66 LIGHT-TANK SILKWORM-MISSILEMOD 0. 56 TANK-VEHICLE 0. 3 FIGHTER-PLANE AIRCRAFT-CARRIER 0. 49 MRBM 0. 38 26
different numbers of exemplars (whole) New Classes Group-whole-50 LIGHT-AIRCRAFTCARRIER SILKWORMMISSILE-MOD APC SILKWORMMISSILE-MOD SUPER-ETENDARDFIGHTER SILKWORMMISSILE-MOD FIGHTER-ATTACK-PLANE SILKWORMMISSILE-MOD PATROL-WATERCRAFT SILKWORMMISSILE-MOD PATROL-BOAT-RIVER SILKWORMMISSILE-MOD PATROL-BOAT SILKWORMMISSILE-MOD LIGHT-TANK SILKWORMMISSILE-MOD FIGHTER-PLANE SILKWORMMISSILE-MOD Prob Group-whole-100 Prob 0. 60 AIRCRAFTCARRIER 0. 65 SILKWORMMISSILE-MOD 0. 46 0. 74 SILKWORMMISSILE-MOD 0. 66 0. 83 SILKWORMMISSILE-MOD 0. 83 0. 64 SILKWORMMISSILE-MOD 0. 28 0. 89 SILKWORMMISSILE-MOD 0. 65 0. 64 SILKWORMMISSILE-MOD 0. 51 0. 62 SILKWORMMISSILE-MOD 0. 56 0. 80 AIRCRAFTCARRIER 0. 49 27
different numbers of exemplars (sentence) New Classes Group-sentence 50 LIGHT-AIRCRAFTCARRIER Prob Group-sentence 100 Prob 0. 44 AIRCRAFTCARRIER 0. 57 0. 54 SELFPROPELLEDARTILLERY 0. 36 0. 4 MRBM 0. 51 0. 19 MRBM 0. 38 0. 49 PATROLCRAFT 0. 52 0. 36 PATROLCRAFT 0. 54 0. 37 PATROLCRAFT 0. 66 0. 3 APC TANKVEHICLE SUPER-ETENDARDFIGHTER HY-4 -C-201 MISSILE FIGHTER-ATTACK-PLANE ICBM PATROL-WATERCRAFT PATROL-BOAT-RIVER PATROLCRAFT PATROL-BOAT PATROLCRAFT LIGHT-TANKVEHICLE 0. 59 TANKVEHICLE FIGHTER-PLANE MRBM 0. 38 28
Comparison of mapping accuracy of different groups of experiments Groups of experiments Mapping accuracy judged by desired class mapped Group-whole-50 0% Group-whole-100 11% Group-sentence-50 67% Group-sentence-100 56% Higher Conditional Probability 29
Experiment with LIVING_THINGS ontology WOMAN n n n P(MAN | HUMAN) P (WOMAN | HUMAN) Find a mapping for GIRL HUMAN 30
Actual Experiment Results: L-1 WOMAN HUMAN Results of experiment (1) Conditional Probability Using first 50 exemplars Using first 100 exemplars Using first 200 exemplars P(MAN | HUMAN) 0. 75 0. 58 0. 62 P(WOMAN | HUMAN) 0. 24 0. 41 0. 38 31
Actual Experiment Results: L-2 Without clustering on exemplars P(ANIMAL | GIRL) 0. 76 P(PLANT | GIRL) 0. 23 P(HUMAN | GIRL) 0. 70 P(CAT | GIRL) 0. 30 P(MAN | GIRL) 0 P(WOMAN | GIRL) 1 With clustering on exemplars P(ANIMAL | GIRL) 0. 83 P(PLANT | GIRL) 0. 17 P(HUMAN | GIRL) 0. 92 P(CAT | GIRL) 0. 08 P(WOMAN | GIRL) 0. 63 P(MAN | GIRL) 0. 37 with additional classes P(DOG | GIRL) 0. 56 P(CAT | GIRL) 0. 01 P(HUMAN | GIRL) 0. 43 P(PYCNOGONID | GIRL) 0 32
Actual Experiment Results: Different Queries augmented with class properties Concepts Queries living+things Living+things animal Living+things+animal+Animalia plant Living+things+plant+Plantae cat Living+things+animal+Animalia+cat+Felidae human Living+things+animal+Animalia+human+intelligent+man+male woman Living+things+animal+Animalia+human+intelligent+woman+female tree Living+things+plant+Plantae+tree grass Living+things+plant+Plantae+grass frutex Living+things+plant+Plantae+tree+Frutex arbor Living+things+plant+Plantae+tree+arbor 33
Actual Experiment Results: L-4 Results of experiment (1) with new queries Conditional Probability Whole Keyword Sentence s P(MAN | HUMAN) 0. 91 0. 93 P(WOMAN | HUMAN) 0. 09 0. 07 WOMAN HUMAN Results of experiment (2) with new queries Conditional Probability Whole Keyword Sentences P(ANIMAL | GIRL) 0. 9 0. 83 P(PLANT | GIRL) 0. 17 P(HUMAN | GIRL) 0. 78 0. 83 P(CAT | GIRL) 0. 22 0. 17 P(MAN | GIRL) 0. 14 0. 16 P(WOMAN | GIRL) 0. 86 0. 84 34
Limitation 1: An exemplar is not a sample of a concept n n n An exemplar is a combination of strings that represent some usage of a concept. An exemplar is not an instance of a concept. The way we calculate conditional probability is an estimation. WOMAN HUMAN 35
Limitation 2: Popularity does not equal relevancy n Limited by a search engine’s algorithm q Page. Rank™ n q Popularity does not equal relevancy Weight cannot be specified for words in a search query 36
Limitation 3: Relevancy does not equal to similarity Search Results for concept A Text related to concept A Text for concept A i. e. desired exemplars Text against concept A Text for related concept B 37
Related Research n UMBC Onto. Mapper q Sushama Prasad, Peng Yun and Finin Tim, A Tool for Mapping between Two Ontologies Using Explicit Information, AAMAS 2002 Workshop on Ontologies and Agent Systems, 2002. n CAIMEN q n GLUE q n Lacher S. Martin and Groh Georg , Facilitating the Exchange of Explicit Knowledge through Ontology Mappings, Proc of the Fourteenth International FLAIRS conference, 2001. Doan Anhai, Madhavan Jayant, Dhamankar Robin, Domingos Pedro, and Halevy Alon, Learning to Match Ontologies on the Semantic Web, WWW 2002, May, 2002. Google Conditional Probability q q q P(HUMAN | MAN) = 1. 77 billion / 2. 29 billion = 0. 77 P(HUMAN | WOMAN) = 0. 6 billion / 2. 29 billion = 0. 26 Wyatt D. , Philipose M. , and Choudhury T. , Unsupervised Activity Recognition Using Automatically Mined Common Sense. Proceedings of AAAI-05. pp. 21 -27. 38
Conclusion and Future Work n Text retrieved from the web can be used as exemplars for text classification based ontology mapping q n n Many parameters affect the quality of the exemplars There are noise contained in the processed documents Future work q Clustering 39
Questions 40