1a491634be1dbd91c6f9832bd99b117e.ppt
- Количество слайдов: 25
Modular Ontology architecture for using human defined sets of concepts Presentation by Ontology. Stream Inc Paul Stephen Prueitt, Ph. D Ontology Tutorial 5, copyright, Paul S Prueitt 2005
The best example of an ontology is the set of positive integers Set of positive integers Accounting Geographical positions Quantitative measurement Mathematical models of natural systems Arrow of time Instances in the world where the concepts of a counting number are essential
The concept of an integer is used without the specific use of a concept effecting the definition of the concept, of “two-ness” for example. The existence of this set of concepts allows a great diversity of human activities. The “ontology standard” is enforced by the correctness of the concepts and by the ease in which new applications can be found. The standard is ultra-stable and resilient because the concepts are correct. The standard is not owned by anyone. Set of positive integers Instances in the world where the concepts of a counting number are essential
Modular ontology is used to measure the properties of events with sets of concepts. Semantic extraction processes { w(i) } { e(i) } Notation e(i) = w(i)/s(i) The measurement of an event has a weakly structured and a structure part { s(i) } Discrete analysis Events occur in a real world as part of complex processes. Largely because events are seen as having patterns and structure, software engineers can build relational databases, or XML repositories to help us understand interact with information that is situation specific. With ontology, human communities will be able to reveal a set of concepts, and define regular relationships between concepts. We call this “Ontology mediation of information flow”. The formal representations of the concepts are used to organize data and to move data from one place to another. This has to be demonstrated. We will illustrate Ontology mediation of information flow, as an example, in the development and use of Harmonized Trade Tariff Schedule Administrative Rulings. A HTS Administrative Ruling is a short public document that ties together a code used to determine duties on imported or exported commodities. A second example is suggested whereas Selectivity and Targeting reports are seen as measurement of selectivity and targeting events by Custom and Border Protection.
processes Semantic extraction Ontology Framework A framework holds a higher level abstraction representing an analysis of how things follow each other. Example: event-Structure Ontology Framework (e-SOF) has 18 cells developed from the cross product of the three dimensions : {past, present, future}; {people, places, things}; {how, why} Example: risk/gains Ontology Framework (rg-OF) has 40 cells developed from the cross product of the three dimensions: {Risk, Gain}; {Anomaly, Trend}; { measurement/assessment, name/group, event/context, rule, policy/component, function/behavior }
processes Explicit ontology such as OWL DL Ontology Framework By aligning the internal (implicit set of concepts) in a semantic extraction computation with the explicit form of concept representation, provided by the OWL DL standard, one is able to organize information expressed as concepts in free form text. One is able to use look up tables, lists, controlled vocabularies and taxonomies to expand that statement of these conceptual expressions so that the expression is as clear, complete and consistent as possible. One is able to move the information from a single event into a computational space where specific structure is available to bring relevant information to the report development process. One is able to, after the fact, create a better report about an event, such as an administrative ruling or a selectivity and targeting action. One is able to develop long term trending and analogy detection using specific information about how things are related to each other in the real world.
A modular ontology management infrastructure provides various services in the context of field reporting over transactions Later application areas “other” upper level ontology Law governing US Customs Advanced Trade Data Economic Supply Chain Data upper level ontologies Location ontology Gain/Risk ontology HTS Ontology sources of data Findings ontology Entities ontology
In our work, human knowledge is captured separately in two computer computable forms: implicit (semantic extraction ontology) and explicit OWL DL ontology Written reports Structural Event . Ontology Framework Written reports Gain / Risk Ontology Framework.
event Structure Ontology Framework (e-SOF) ** The classical, existing from Greek times, six interrogatives is partitioned into three parts; {people, places, things} + { event structure with causality } + time event structure { people, places, things } 18 questions from frames (past, who, how), (past, who, why), (present, who, how) { who, where, what, how, why } x { past, present, future} (present, who, why) (future, who, how) Structural Event (future, who, why) Ontology Framework Etc… ** e-SOF was “discovered” by Dr. Paul S. Prueitt while thinking about a US Customs ontology prototype in March 2005
By internally adjusting the rules within any one of the commercially available semantic extraction (implicit) ontology we measure text, or structured data in a single record, using a three element frame ( y, x, z) where x is from the set { people, places, things } where y is form the set { past, present, future } and where z is from the set { how, why } There are 3*3*2 = 18 of these three element frames, each which can be seen to ask a question. The measurement using linguistic and structural knowledge to answer those questions that can be answered. Those that are not answered are left blank. Other semantic extraction tools can be similarly manipulated to produce an alignment between internal ontology (not often OWL) and external OWL DL ontology (which is our standard). Scoped Ontology Individuals Ontology Framework Knowledge Engineer visualization Ontology Reasoner Knowledge Management visualization
Ontology Framework with Differential Ontology Expressions US Customs cultural viewpoints expressed as sets of concepts Entity histories Shipping manifests { concepts } Entry Reports and Findings Commodity history analysis Ontology expression about the risks measured from historical analysis of commodities High Risk Ontology Expression informs Bio-systems Weapon-systems aligns
Rapid knowledge acquisition and reporting about a transaction US Customs cultural viewpoints expressed as sets of concepts A transaction: Nautilus Explorer (“Nautilus”) owns and operates the M/V NAUTILUS EXPLORER, a 116 -foot Canadian-flagged long-range dive boat. Nautilus would like to embark passengers in San Diego, California, on two separate occasions, for three days of diving in Mexican waters before returning to San Diego. The passengers would be embarked and disembarked at the same location in San Diego. { concepts } Commodity history analysis Ontology expression about the risks measured from historical analysis of commodities Entry Reports and Findings High Risk Ontology Expression Bio-systems Weapon-systems Semi-automated generation of Reports
gain/risk Ontology Framework (gf-OF) ** We take the first two dimensions of a framework to be { Anomality, Trend } union { Gain, Risk } And the other dimension to be: { measurement, assessment, name, group, event, context, rule, policy, component, function/behavior } Then, in the cross product, we have four sets of ten concepts. In fact the ten concepts are five sets of two concepts – each with an interesting “oppositional scale type” relationship. { measurement, assessment, name, group, event, context, rule, policy, component, function/behavior } ** This Gain/Risk Ontology Framework was “discovered” by Dr Prueitt in March 2005 while thinking about possible US Customs Selectivity and Targeting enhancements. Dr Peter Stephenson and Dr Prueitt are extending this in the context of Cyber Security ontology mediation data analysis.
Possible deployment as U. S. Custom’s Total Information Awareness (TIA) capability Integrated collection of reified ontologies with some specific inferences and some information organization and retrieval Ontology Tools Semantic Extraction Link Analysis Pattern recognition Statistics Detailed work with tools over available data Harmonized Tariff Schedule Advanced Trade Data Practical problem: Provide three Cs, clarity, consistency, and completeness in EACH judicial review of a commodity in passage across national boarders.
Ontology Individuals have a subsumption relationship to upper abstract ontologies Transactions Findings An event Entry Summary Data Transfer Object Ontology Framework SOI pushes information Script pulls information (SOI) Scoped Ontology Individual Portal pulls information Human machine interface databases client visualization Knowledge Engineer visualization Ontology Reasoner Knowledge Management visualization Scoped Ontology Individuals
Human machine interface SOI design by-passes the critical “visualization” choke point Ontology Framework SOI SOI Ontology reasoning Scoped Ontology Individuals SOI Stack of SOIs supporting analysis of analysis The mental event is the model for the Scoped Ontology Individual (SOI). The SOI is a minimal formal ontology (defined in OWL DL) that binds the concepts and data together about a single event. The Framework’s small number of concepts organize the organization of everything that is known about the data elements that occur in a Harmonized Tariff Schedule administrative ruling. Once the data elements have been used as the initial conditions for SOI formation, additional SQL queries may be made, or additional ontology subsetting may be made so as to bring new information or information that was not initially known “into the visualized frame”.
Human machine interface Ontology Framework SOI SOI Ontology reasoning Scoped Ontology Individuals SOI Stack of SOIs supporting analysis of analysis Visualization of ontology: The concept of a Scoped Ontology Individual (SOI) opens up a visualization paradigm that has never been exposed before (it is an original concept that is based on decades of work in cognitive neuroscience) SOI design by-passes the critical “visualization” choke point that occurs when Ontology Systems are built on the relational data base model (as is done in our ontology augmentation of rule engines). This by-pass is created when data elements in a report is used to subset upper ontologies and domain ontologies to produce the minimal set of “concepts” needed to frame the data. If Framework Ontology is being used, then this subsetting process has an expansion / contraction cycle that produces very small SOI objects. (see previous slide)
MITi Inc and In. Orb Technologies have teamed to develop a demonstration capability based on the use of Readware internal ontology API to create text elements that populate the 18 cells of the e-SOF. Ontology Framework We use the triple: ( y, x, z) Readware where x is from the set { people, places, things } where y is form the set { past, present, future } and where z is from the set { how, why } Ontology Reasoner This involves three steps: 1) Coding eight probes that use the internal Readware stem-based text understanding computations to find information and classify this information as answers to people, places, things, past, present, future, how or why questions. 2) There are some options, but the one we are investigating first is to use the People Places and Things probes first. This is a well know “Named entity extraction” approach. Knowledge Management visualization 3) Then when one of these three probes “finds” something; then the local neighborhood (in the Readware stem structure) is examined to see if more of one or more of the 18 questions can be answered. Scoped Ontology Individuals Custom’s analyst
The other choke point is dependency on a relational database Transactions Findings An event Entry Summary Data Transfer Object Script pulls information databases (SOI) Scoped Ontology Individual ILOG Rulebase Reasoner Ontology Augmentation of a rule based engine
On the relational database dependency For complex reasons, demonstration about how to use ontology have often used a fixed data set with doctored data to pretend as if scalability issues have been solved or are not relevant. These demonstrations fail far short of correctness and hid specific known weaknesses of classical IT architecture. The scalability issue comes from the need to extend ontology or XML , add delete or modify concepts. These extension requirements come from many different origins, different communities of practice, and as circumstances change. Extensibility is the key contribution that XML has brought. For example without a common data encoding paradigm, the scalability issue creates a second choke point. The relational database must have a fixed data schema. The work on such a solution is under the XML Meta. Data Repository standards process: http: //hpcrd. lbl. gov/SDM/XMDR/arch/ XMDR, RDF or OWL DL may, or may not, solve this problem. Modular ontology helps, but the principles developed in differential ontology, formative ontology and Framework Ontology seem essential to solving the whole problem as completely as possible. With these approaches, we find bypasses to technology problems that are seen now by the XMDR standards committee as being unsolvable. The definition of a event specific Scoped Ontology Individual is one of those by-passes.
There are some existing software products, Convera, Aero. Text, MITi, Semagix, Autonomy, and others; were a common data encoding solution exists. • A data encoding solution is generally protected by patents, and is used to provide computational efficiency; one of the best examples is Pri. Mentia's Hilbert engine were a key-less hash table type data encoding allows contextual search in the most natural fashion. Autonomy has also the technology that Michael Lynch developed in the Autonomy spin-off N-Corp. Semagix, Applied technical Systems, and 15 or 20 others have excellent data encoding solutions. • If an government agency selected the two or three best technologies, the communications between the internal representation would be required. This may or may not be easy, depending on the specific technologies. In Summary: These software products create an integration of classically understood methods using a common data encoding. Each COTS product uses a different internal data representation, and so the use of more than one COTS product will create binding issues. A modular ontology management architecture can be used to integrate technologies like • semantic extraction and related knowledge discovery in data technology (implicit ontology) • ontology development and editing (explicit ontology) • advanced algorithms related to risk definition and decision support • visualization technology
So government agencies really have two solution paths: 1) Choice one or two vendors after actually understanding what each vendor provides and create a complete solution with that tool set. The requires integration architecture. 2) Learn from a Trade Study process what the methods are that make COTS semantic extraction work, move around the patents and other IP; and develop a unique application that is specific to that government agency. In either case, the greater challenge is the technology transition challenge. If the technology is not a LOT better than the current beta sites and doctored demonstrations, then the transition effort will fail. But, leaving transition issues aside, let us look closer that these two options High level view of integration architecture
So we have two solution paths: 1) Choice one vendor after actually understanding what each vendor provides and create a complete solution with that tool set. But how to select? Current generation may not solve all problems in an optimal fashion Next generation tools are no yet ready to produce systems Current generation best of bread technology The list of possible qualified candidates for offering a complete solution might be less than 20 companies. In many cases, these companies are highly capitalized and would provide stability for some period of time. However, the underlying XML and ontology standards are not stable. One would expect that better global solutions will exist within five years. So one needs to know that the sets of concepts can be exported and transformed as the market matures. Core. System Core. Ontology first takes on the underlying stability issue by moving forward a design time Iconic language that may revolutionize how society uses computers.
So we have two solution paths: 2) Learn from a Trade Study process what the methods are that make COTS semantic extraction work, move around the patents and other IP; and develop a unique application that is specific to Customs. These two diagrams are from Ontology. Stream Inc. There is no suggestion that this non-capitalized small company has the management skills required to build out an application specifically designed from the principles discussed by Prueitt and his colleagues. So we have sought the support and guidance from SAIC or IBM to bring a small team together to develop a government owned system based on these principles and at the smallest possible cost.
Summary • Current contractors almost always treat ontology and XML technology as if the same as relational database technology. • Current contractors are gaming the contracts so that maximum Time and Materials resources can be expended. • Ontology and XML standards committees struggle with the issues of private intellectual property and hidden agendas. • Ontology visualization by users is required to find optimal solutions consist with cultural expectations. • Ontology and XML standards have not been able to address ontology visualization or process models that place Ontology and XML into complex work flow. • A single payer entity is needed to bind together the best technology and to resolve IP and philosophical differences.