Скачать презентацию Ontology Generation — surveys Yihong Ding CS 652 Скачать презентацию Ontology Generation — surveys Yihong Ding CS 652

f6fc6b9ae0fde4cd765114409da5d870.ppt

  • Количество слайдов: 52

Ontology Generation -- surveys Yihong Ding CS 652 Spring 2004 Ontology Generation -- surveys Yihong Ding CS 652 Spring 2004

Three Papers Mariano Fernández-López. Overview of Methodologies for Building Ontologies. In IJCAI-99 Workshop on Three Papers Mariano Fernández-López. Overview of Methodologies for Building Ontologies. In IJCAI-99 Workshop on Ontologies and Problem-solving Methods, 1999. Borys Omelayenko. Learning of Ontologies for the Web: the Analysis of Existent Approaches. In International Workshop on Web Dynamics held in conj. with the 8 th International Conference on Database Theory (ICDT'01), 2001. Ying Ding and Schubert Foo. Ontology research and development. Part 1: A review of ontology generation. In Journal of Information Science, 2002. 2

Mariano Fernández-López, 1999 Propose lots of guidelines based on IEEE Standard 1074 -1995 for Mariano Fernández-López, 1999 Propose lots of guidelines based on IEEE Standard 1074 -1995 for manual ontology development Examine the methodologies for five different projects Uschold and King 1995 Grüninger And Fox, 1995 Berneras et. al. , 1996 METHONTOLOGY, 1996 SENSUS, 1997 3

IEEE Standard 1074 -1995 The standard for developing software life cycle Software life cycle IEEE Standard 1074 -1995 The standard for developing software life cycle Software life cycle model processes (identify and select life cycle) Project management processes (create framework of project) Software development-oriented processes Pre-development processes (study the environment) Development processes • Requirement process (develop software requirements specification) • Design process (develop software representation that meets the requirements) • Implementation process (transform representation to programming language) Post-development processes (install, operate, support, and maintenance) Integral process (ensure the completion and quality) 4

Criteria for Analyzing Methodologies C 1. Inheritance from Knowledge Engineering C 2. Detail of Criteria for Analyzing Methodologies C 1. Inheritance from Knowledge Engineering C 2. Detail of the methodology C 3. Recommendation for knowledge formalization C 4. Strategy for building ontologies Application dependent, semi-dependent, or independent C 5. Strategy for identifying concepts Bottom-up, top-down, or middle-out C 6. Recommended life cycle C 7. Differences between the methodology and IEEE 10741995 C 8. Recommended techniques C 9. Ontology and system built 5

Uschold and King Description: developing the Enterprise Ontology for enterprise modeling processes Building process Uschold and King Description: developing the Enterprise Ontology for enterprise modeling processes Building process (middle-out) Ontology capture • Identify key concepts and relationships • Produce precise unambiguous text definitions • Identify other terms refer to identified concepts and relationships Coding Integrating existing ontologies 6

Uschold and King Analysis of Methodology C 1. partial: identifies an acquisition, coding and Uschold and King Analysis of Methodology C 1. partial: identifies an acquisition, coding and evaluation stage, but without feasibility study and prototyping C 2. very little C 4. application-independent C 5. middle-out: from most important to less important, the others from generalization and specialization C 7. Processes missing: management, pre-development, and postdevelopment, design Activities missing: environment study, feasibility study, training and configuration management C 8. technical details are unclear 7

Grüninger And Fox Description: developing the TOVE (TOronto Virtual Enterprise) project ontology within the Grüninger And Fox Description: developing the TOVE (TOronto Virtual Enterprise) project ontology within the domain of business processes and activities modeling Building process (middle-out) Capture of motivating scenarios • Motivating scenarios: problems or examples which are not adequately addressed by existing ontologies • Motivating scenario provides possible solutions • Solutions provide an informal intended semantics for the objects and relations Formulation of informal competency questions • Based on the motivating scenarios • Serve as constraints rather than determining a particular design • Evaluate ontological commitment Specification of the terminology of the ontology within a formal language • Getting informal terminology: terms extracted from the questions • Specification of formal terminology: formalizing terms Formulation of formal competency questions using the terminology of the ontology Specification of axioms and definitions for the terms in the ontology within the formal language Establish conditions for characterizing the completeness of the ontology 8

Grüninger And Fox Analysis of Methodology C 1. small: this is a question-answer-pair driven Grüninger And Fox Analysis of Methodology C 1. small: this is a question-answer-pair driven approach, not very much involved in knowledge-based system development C 2. little C 3. logic C 4. application-semidependent (scenarios) C 5. middle-out C 7. Processes missing: management, pre-development, and postdevelopment, design Activities missing: training and configuration management C 8. technical details are unclear 9

Berneras et. al Description: developing the Esprit KACTUS project to investigate the feasibility of Berneras et. al Description: developing the Esprit KACTUS project to investigate the feasibility of knowledge reuse in complex technical systems and the role of ontologies to support it Building process (top-down) Specification of the application Preliminary design based on relevant top-level ontological categories • It involves searching ontologies developed for other applications, which are refined and extended for use in the new application. Ontology refinement and structuring 10

Berneras et. Al Analysis of Methodology C 1. big: follow the tradition of knowledge Berneras et. Al Analysis of Methodology C 1. big: follow the tradition of knowledge engineering C 2. very little C 4. application-dependent C 5. top-down C 7. Processes missing: management, pre-development, and postdevelopment Activities missing: training, documentation, configuration management, verification, and validation C 8. technical details are unclear 11

METHONTOLOGY Description Enabling the construction of ontologies at the knowledge level Supported by Ontology METHONTOLOGY Description Enabling the construction of ontologies at the knowledge level Supported by Ontology Design Environment (ODE) Including • • • Identification of the ontology development process A life cycle based on evolving prototypes Particular techniques for carrying our each activity Ontologies developed • • CHEMICALS Environment pollutants ontologies The Reference-Ontology The restructured version of (KA) 2 ontology Building process (middle-out): refers to which activities are carried out Project management activities • • • Planning: identify tasks Control: guarantee planned tasks being completed when intended Quality Assurance: assure the quality of outputs Development-oriented activities • Specification, conceptualization, formalization, and implementation Support activities • Knowledge acquisition, evaluation, integration, documentation, and configuration management 12

METHONTOLOGY Analysis of Methodology C 1. big: it has its roots in a methodology METHONTOLOGY Analysis of Methodology C 1. big: it has its roots in a methodology for developing knowledge-based systems C 2. a lot C 3. flexible C 4. application-independent C 5. middle-out: most relevant concepts are identified first C 6. evolving prototypes C 7. Processes missing: software life cycle model, and pre-development Activities missing: project initiation, installation, support, retirement, and training C 8. technical details are unclear 13

SENSUS Description Developed for natural language processing Content obtained by extracting and merging information SENSUS Description Developed for natural language processing Content obtained by extracting and merging information from various electronic sources of knowledge • PENMAN Upper Model, ONTOS, manually built semantic categories, Word. Net, Spanish and Japanese lexical entries Including • More than 50, 000 concepts organized in a hierarchy • Both high and medium level of abstraction • Generally not cover terms from specific domains Building process (bottom-up) Take a series of seed terms, linked to SENSUS by hand Specify paths from the seed terms to the root Add more relevant terms Prune any irrelevant terms 14

SENSUS Analysis of Methodology C 1. none: based on adding terms into an existing SENSUS Analysis of Methodology C 1. none: based on adding terms into an existing ontology C 2. medium: not very detailed C 3. semantic networks C 4. application-semidependent C 5. bottom-up C 7. Processes missing: management, pre-development, and postdevelopment, design Activities missing: training, documentation, configuration management, verification, and validation C 8. technical details are unclear 15

Summary None of the methodologies are fully mature comparing with the IEEE standard The Summary None of the methodologies are fully mature comparing with the IEEE standard The proposals are not unified SENSUS is completely different from the others It suggests we adopt several widely accepted methodologies than on standardized one Interpretability between systems is allowed 16

Borys Omelayenko 2001 Learning-based ontology development Examine eleven different approaches Bisson et. al. 2000 Borys Omelayenko 2001 Learning-based ontology development Examine eleven different approaches Bisson et. al. 2000 Faure and Poibeau, 2000 Agirre et. al. , 2000 Junker et. al. , 1999 Craven et. al. , 2000 Bowers et. al. , 2000 Taylor et. al. , 1997 Webb, Wells, Zheng, 1999 Soderland et. al. , 1995 Maedche and Staab, 2000 Suryanto and Compton 2000 17

Semantic Querying over the Web 18 Semantic Querying over the Web 18

Ontological Components Natural language ontologies (horizontal) Contain lexical relations between language concepts Large in Ontological Components Natural language ontologies (horizontal) Contain lexical relations between language concepts Large in size and do not require frequent updates Used to expand user queries Capture concepts but not provide detailed descriptions Domain ontologies (vertical) Capture knowledge of a particular domain Provide detailed descriptions of the domain Ontology instances (dot) Main piece of knowledge presented in the future Semantic Web Serve for Web pages Contain links to other instances 19

Ontology Learning Tasks Ontology acquisition Ontology creation Ontology schema extraction Extraction of ontology instances Ontology Learning Tasks Ontology acquisition Ontology creation Ontology schema extraction Extraction of ontology instances Ontology maintenance Ontology integration and navigation Ontology update Ontology enrichment 20

Machine Learning Techniques Ontology representation requires symbolic learning methods Skip neural networks, genetic algorithm, Machine Learning Techniques Ontology representation requires symbolic learning methods Skip neural networks, genetic algorithm, and the family of ‘lazy learners’. Methods studies in this paper Propositional rule learning (zero-order logic) First-order logic rules learning Bayesian learning Clustering algorithms 21

ML vs. Manually Modeling primitives ML: simple and limited (usually simple rules) Man: rich ML vs. Manually Modeling primitives ML: simple and limited (usually simple rules) Man: rich (frames, subclasses, rules with rich set of operations, functions, etc. ) Knowledge base structure ML: flat and homogeneous Man: hierarchical, consisting of various components with subclass-of, part-of, and other relations Tasks ML: categorize objects into a limited and unstructured set of classes Man: classify objects into a tree of structured classes Problem-solving methods ML: very primitive, based on simple search strategies Man: complicated, inference over a knowledge base with rich structure Solution space ML: non-extensible, fixed set of class labels Man: extensible set of primitive and compound solutions Readability of the knowledge bases to a human Not required 22

Requirements for OL Aim: automatically construct ontologies with the properties of manually constructed ontologies Requirements for OL Aim: automatically construct ontologies with the properties of manually constructed ontologies Requirements Ability to interact with a human Readability of internal and external results of the learner Ability to use complex modeling primitives Ability to deal with complex solution spaces 23

Requirements for Ontological Components NLO Hierarchical clustering of language concepts Limited set of relations Requirements for Ontological Components NLO Hierarchical clustering of language concepts Limited set of relations Ability to link to specific domain ontologies ML focus: enrichment based on domain texts is popular Do not require frequent or automatic updates DO Use the whole set of modeling primitives Complex in structure ML focus: discovering statistically valid patterns for creation Require more updates OI Concepts mark-up of the underlying domain ontology in Web pages ML focus: IE and annotation Require frequent updates 24

Leaning of NLO Bisson et. al. 2000 (Mo’K tool) Human-assisted bottom-up clustering of conceptual Leaning of NLO Bisson et. al. 2000 (Mo’K tool) Human-assisted bottom-up clustering of conceptual hierarchies from corpora Human selects input examples and attributes, level of pruning, and distance evaluation functions Group ‘similar’ objects to create the classes Group ‘similar’ classes to form the hierarchy No human interaction during clustering process Further study on integrating NLO enrichment with the Web search of relevant texts 25

Leaning of NLO Agirre et. al. , 2000 Enrich Word. Net by exploiting texts Leaning of NLO Agirre et. al. , 2000 Enrich Word. Net by exploiting texts from the Web Construct lists (topic signatures) of topically related words (with weight/strength) for each concept in Word. Net Each word sense has one associated list of related words Related Web pages from Alta. Vista search engine by specifying particular queries Query refers to a particular sense but not others Example: waiter AND and (restaurant OR menu) AND NOT (station OR airport) 26

Leaning of NLO Faure and Poibeau, 2000 (Asium) Creating domain-specific NLO by unsupervised domain-specific Leaning of NLO Faure and Poibeau, 2000 (Asium) Creating domain-specific NLO by unsupervised domain-specific clustering of texts from corpora Generate syntactical structure of texts by Sylex Cooperative learning of semantic knowledge from parsed texts Bottom-up, breadth-first clustering form the hierarchy Expert validate and label concepts 27

Learning of DO Maedche and Staab, 2000 Semiautomatically ontology learning from texts Input : Learning of DO Maedche and Staab, 2000 Semiautomatically ontology learning from texts Input : a set of transactions Transaction: contain a set of items appearing together Association rule: sets of items that appear together sufficiently often ML: discover generalized association rule Final: present the rules to the knowledge engineer 28

Learning of DO Suryanto and Compton 2000 First attempt of using ML to discover Learning of DO Suryanto and Compton 2000 First attempt of using ML to discover hierarchical relations between textually described classes Discovery class relations between classification rules Three basic relations: intersection, mutual-exclusion, similarity Each relation is defined a measure of degree for three basic relations 29

Learning of DO Taylor et. al. , 1997 Ontology-based induction of high-level classification rules Learning of DO Taylor et. al. , 1997 Ontology-based induction of high-level classification rules Ontologies not only for explaining rules but also to guide learning algorithm Algorithm generates queries for an external learner Parka. DB DO and input data check consistency of queries Consistent queries become classification rules Query generation continues until the set of rules covers the whole data set 30

Learning of DO Webb, Wells, Zheng, 1999 ML plus knowledge acquisition from experts improves Learning of DO Webb, Wells, Zheng, 1999 ML plus knowledge acquisition from experts improves the accuracy of developed domain ontology and reduce development time Three types of knowledge acquisition systems • Manually based on experts • ML systems • Integrated system ML method: C 4. 5 decision tree 31

Learning of OI Bowers et. al. , 2000 Replacing the attribute-value dictionary with a Learning of OI Bowers et. al. , 2000 Replacing the attribute-value dictionary with a more expressive one that consists of simple data types, tuples, sets and graphs Using modified C 4. 5 learner 32

Learning of OI Soderland et. al. , 1995 (CRYSTAL) Formalize ontology instances from text Learning of OI Soderland et. al. , 1995 (CRYSTAL) Formalize ontology instances from text and generate a concept hierarchy from the instances Given domain model as input Use a richer set of modeling primitives Generalize semantic mark-up of the manually markedup training corpora Formalize the instance level of hierarchy Searched-based generalization of concept nodes 33

Learning of OI Craven et. al. , 2000 (Web-KB) Systematic study of the extraction Learning of OI Craven et. al. , 2000 (Web-KB) Systematic study of the extraction of OI from Web documents Ontology as an academic web-site to populate it with actual instances and relations from CS departments’ web sites Three learning tasks • Recognize class instances from hypertext documents guided by the ontology • Recognize relation instance from the chains of hyperlinks • Recognize class and relation instances from the pieces of hypertext Two supervised learning methods • Naïve Bayes learner • Modified FOIL (first-order rule learner) Automatically create mapping between the manually constructed domain ontology and the Web pages by generalizing from the training instances 34

Summary Main problem of OL: flat and homogeneous structure learned Learning of NLO General-purpose Summary Main problem of OL: flat and homogeneous structure learned Learning of NLO General-purpose NLO exists Mainly enrichment Most popular ML algorithm: clustering Learning of DO Human-guided learning Learning plays only a minor role in knowledge acquisition Most popular ML algorithm: propositional learning Learning of OI The structure of OI is too rich to be adequately captured by propositional rules Multiple different ML algorithm are applied 35

Ying Ding and Schubert Foo 2002 Methods used and problems encountered in many recent Ying Ding and Schubert Foo 2002 Methods used and problems encountered in many recent ontology generation approaches Examine seven main collection of approaches Info. Sleuth (MCC) SKC (Stanford) Ontology Learning (AIFB) ECAI 2000 Inductive logic programming (UT) Library Science and Ontology Others 36

Info. Sleuth A research project at MCC (Microelectronics and Computer Technology Corporation) Develop and Info. Sleuth A research project at MCC (Microelectronics and Computer Technology Corporation) Develop and deploy new technologies for finding information available both in corporate networks and external networks Description Locating, evaluating, retrieving, and merging information in a frequently updating environment Build up an ontology-based agent architecture Been successfully implemented in • • • Knowledge management Business intelligence Logistics Crisis management Genome mapping Environment data exchange network 37

Info. Sleuth: method Input resources Human expert feeds system a small set of seedwords Info. Sleuth: method Input resources Human expert feeds system a small set of seedwords (high-level concept) IR engine feeds relevant documents (with or without POS tagged) automatically System process Parse documents Extract phrases with seedwords Generate concept terms Place them into ontology Collect candidate seedwords for next round of processing Relationship retrieving is-a, part-of, manufactured-by, owned-by, etc. assoc-with is used to define relations except is-a Use linguistic properties to identify relations Human experts evaluate and adjust results Special features Expand ontology with new concepts and alert human expert to update Discover attributes associated with certain concepts Index documents for future retrieval Allow users to decide between precision and completeness by browsing 38

Info. Sleuth: problems Syntactic structure ambiguity (concept token identification) image process software Different phrases Info. Sleuth: problems Syntactic structure ambiguity (concept token identification) image process software Different phrases refer to the same concept Word sense disambiguation Proper attachment of adjective modifier may help avoid non-concepts Heterogeneous resources (inconsistent terminologies) Automatically constructed ontology can be too prolific and deficient at the same time (because of the seedwords) 39

SKC (Scalable Knowledge Composition) A research project at Stanford Resolve semantic heterogeneity in information SKC (Scalable Knowledge Composition) A research project at Stanford Resolve semantic heterogeneity in information systems Description Derive general methods for ontology integration Application-independent Develop an ontology algebra Convert Webster’s dictionary to a graph structure Funded by • AFOSR, DARPA, HPKB 40

SKC: method Concept graph technique detail is unknown Use a novel algebraic extraction technique SKC: method Concept graph technique detail is unknown Use a novel algebraic extraction technique to generate the graph structure and create thesaurus entries for all words including some stopwords Idea from Page. Rank algorithm Arc. Rank algorithm to extract relations Basic hypothesis: structural relationships between terms are relevant to their meaning Pattern/Relation extraction algorithm Compute a set of nodes that contain arcs comparable to seed arc set Threshold them according to Arc. Rank value Extend seed arc set, when nodes contain further commonality If the node set increased in size repeat from the first step The algorithm is self-limited via threshold and distinguish senses 41

SKC: problems Syllable and accent markers in head words Misspelled head words Mis-tagged fields SKC: problems Syllable and accent markers in head words Misspelled head words Mis-tagged fields Stemming and irregular verbs Common abbreviations in definitions Undefined words with common prefixes Multi-word head words Undefined hyphenated and compound words 42

Ontology Learning A project in AIFB (Institute of Applied Informatics and Formal Description Methods, Ontology Learning A project in AIFB (Institute of Applied Informatics and Formal Description Methods, University of Karlsruhe, Germany) Extract ontology from domain data Description To learn both taxonomic and non-taxonomic relations for ontologies 43

OL: method Shallow text processing Implement on top of SMES (text process for German) OL: method Shallow text processing Implement on top of SMES (text process for German) Use weighted finite state transducers to process phrasal and sentential patterns Output dependency relations Learning algorithm Input dependency relations Select the set of documents Define association rules Determine confidence for the rules Output association rules exceeding the user-defined confidence 44

OL: problems Lightweight ontology contains too many noisy data Word sense problem generates lots OL: problems Lightweight ontology contains too many noisy data Word sense problem generates lots of ambiguity Refinement of the lightweight ontologies is a trickle issue (need future work) Relationship learning is not trivial 45

ECAI 2000 Ontology Learning Workshop of ECAI 2000 (European Conference on Artificial Intelligence) Description ECAI 2000 Ontology Learning Workshop of ECAI 2000 (European Conference on Artificial Intelligence) Description Use NLP techniques Extract important (high frequency) words or phrases to define concepts Use general top-level ontology (Word. Net, SENSUS) to assist disambiguation Problem: relation extraction 46

Inductive Logic Programming WOLFIE (WOrd Learning From Interpreted Examples) at Machine Learning Group in Inductive Logic Programming WOLFIE (WOrd Learning From Interpreted Examples) at Machine Learning Group in University of Texas at Austin Description Learn semantic lexicon from a corpus of sentences Learned lexicon • Consist of words with meaning • Allow synonym and ploysymy Ultimate goal: learn to parse novel sentences into their meaning representations Have the potential to be a workbench for ontological concept extraction and relation detection Problem: how to deploy their methods for ontology concept and rule learning to make the workbench work 47

Library Science and Ontology Digital Library + Semantic Web Digital libraries use various forms Library Science and Ontology Digital Library + Semantic Web Digital libraries use various forms of vocabularies instead of formal ontologies Kwasnik (1999) convert a controlled vocabulary scheme into an ontology Higher levels of conception of descriptive vocabulary Deeper semantics for class/subclass and cross-class relationships Ability to express concepts and relationship in a description language Reusable and sharable of the ontological constructs Strong inference and reasoning functions Problems Different ways of modeling knowledge (shallow or deeper semantics) Different ways of representing knowledge (lexical-flavored or mathematical and logical-flavored) To merge or create a common standard for the two fields will be a long way 48

Others Borgo 1997 Use lexical semantic graphs to create ontology Based on Word. Net Others Borgo 1997 Use lexical semantic graphs to create ontology Based on Word. Net Yamaguchi 1999 Construct domain ontologies Based on a machine-readable dictionary Kashyap 1999 Construct ontology for IR Based on database schema 49

Ontology Learning (Research Location Index) [34] Europe France (7) Germany (5) Spain (3) Others: Ontology Learning (Research Location Index) [34] Europe France (7) Germany (5) Spain (3) Others: Italy (2), Austria, Greece, Netherlands, Portugal, Switzerland, UK *European Union (2): • Onto. Web: University of Karlsruhe • On-To-Knowledge: many countries USA Stanford (2) Austin (2): UT, MCC Dallas (2): UT, Southern Methodist University Other: UC Berkeley, Mississippi State University, BYU, UW Others Australia, Canada, Israel, Japan, Taiwan (China) 50

Conclusion Top-level NLO: manual construction required, need human experts Domain-level NLO: learnable, fed by Conclusion Top-level NLO: manual construction required, need human experts Domain-level NLO: learnable, fed by Top-level NLOs Domain descriptions Domain ontology: learnable, fed by Domain description Training documents Instance ontology: learnable, fed by Domain ontology Specified instance Web pages 51

Conclusion Source data Semi-structured documents (more or less) Seedwords Existing generic ontologies (Word. Net) Conclusion Source data Semi-structured documents (more or less) Seedwords Existing generic ontologies (Word. Net) Concept extraction IE, NLP, ML (mostly clustering and inductive learning), existing digital resource assistance High precision, not bad completeness Relationship extraction Complex and not well-solved Ontology reuse is another important issue To map ontologies to different representations may be valuable (like conceptual graph, conceptual hierarchy, description logic, ontology language) 52