Скачать презентацию A Common Concept Description of Natural Language Texts Скачать презентацию A Common Concept Description of Natural Language Texts

d3a89caf917813136d45aecc8dcb0e93.ppt

  • Количество слайдов: 60

A Common Concept Description of Natural Language Texts as a Foundation of Semantic Computing A Common Concept Description of Natural Language Texts as a Foundation of Semantic Computing on the Web Mitsuru Ishizuka Dept. of Creative Informatics & Dept. of Info. and Communication Eng. School of Information Science and Technology

Semantic Computing Initiative lay a foundation that allows computers to understand the semantic meaning Semantic Computing Initiative lay a foundation that allows computers to understand the semantic meaning of Web contents so that they can perform semantic computing on the Web. The aims of CDL are 1) to realize machine understandability of Web text contents, and 2) to overcome language barrier on the Web. 2

Major Differences from Semantic Web l l l Target of representation: Meta-data extracted from Major Differences from Semantic Web l l l Target of representation: Meta-data extracted from Web contents. Domain-dependent ontologies (which cause the difficulty of wide interboundary usage) RDF / OWL (description logic is hard for ordinary people to understand) Tim Berners-Lee says that: “the Data Web” is more adequate rather than “the Semantic Web”. (2007) Semantic Computing Initiative l l l Target of representation: Semantic concepts expressed in texts. Universal vocabulary (+ additional specific vocabulary in a domain if necessary), and pre-defined relation set. CDL. nl (richer than RDF) Main body: Institute of Semantic Computing (ISe. C) in Japan 3 Int’l Standardization Activity: W 3 C Common Web Language(CWL)-XG

Incubator Group Activity at W 3 C from Oct. 2006 to March 2008 4 Incubator Group Activity at W 3 C from Oct. 2006 to March 2008 4

2 nd Incubator Group at W 3 C from May 2008 5 2 nd Incubator Group at W 3 C from May 2008 5

CDLs and Semantic Web 6 Tim Berners-Lee(2007): The Semantic Web The Data Web (more CDLs and Semantic Web 6 Tim Berners-Lee(2007): The Semantic Web The Data Web (more adequate)

Another Broader View of CDL Development l In 1960 s – 1970 s The Another Broader View of CDL Development l In 1960 s – 1970 s The foundation on the common representation and manipulation (retrieval) of Database l In 2000 s – 2010 s The foundation of the common representation and manipulation of Semantic Information. Common Concept Base It is preferable that this is language independent; in other words, Computer Esperanto Language which is understandable by computers. 7

Functions and Supporting Standards 8 Functions and Supporting Standards 8

From Machine Translation English Japanese Chinese Transfer method Pivot Language UNL (Universal CDL (Concept From Machine Translation English Japanese Chinese Transfer method Pivot Language UNL (Universal CDL (Concept Networking Language) Description Language) could be Minimal sufficient relations have been chosen to represent the surface-level concept meaning of texts. Computer Esperanto Language 9

UNL (Universal Networking Language) l l The development started in 1997 at the United UNL (Universal Networking Language) l l The development started in 1997 at the United Nations Univ. (Tokyo). The chief scientist has been Dr. Hiroshi Uchida. It is now continuously developed under the UNDL foundation. The purpose is to let people in the world exchange and share textural info. on the Web beyond language barrier. The design is based on the results of Machine Translation (especially, Pivot method) and Electric Dictionaries. There have been activities wrt English, Japanese, Chinese, Spanish, French, Arabic, etc. 10

The defining method of one unique sense of a word in UW (Patent of The defining method of one unique sense of a word in UW (Patent of UN Univ. ) l Defining category swallow(icl>bird) swallow(icl>action) swallow(icl>quantity) l the bird “One swallow does not make a summer” the action of swallowing “at one swallow” the quantity “take a swallow of water” Defining possible case relations spring(agt>thing, obj>wood) spring(agt>thing, obj>mine)) spring(agt>thing, obj>person, src>prison)) spring(agt>thing, gol>place) spring(agt>thing, gol>thing) spring(obj>liquid) bending or dividing something blasting something escaping (from) prison jumping up “to spring up” jumping on “to spring on” gushing out “to spring out” 11

UW (Universal Words) in UNL Universal Word uw{(equ>Universal Word)} adjective concept{(icl>uw)}    uw(aoj>thing{, and>uw, ben>thing, UW (Universal Words) in UNL Universal Word uw{(equ>Universal Word)} adjective concept{(icl>uw)}    uw(aoj>thing{, and>uw, ben>thing, cao>thing, cnt>uw, cob>thing, con>uw, coo>uw, dur>period, man>    how, obj>thing, or>uw(aoj>thing), plc>thing, plf>thing, plt>thing, rsn>uw(aoj>thing), rsn>do, icl>adjective concept}) Achaean({icl>uw(}aoj>thing{)}) Afghan({icl>uw(}aoj>thing{)}) African-American({icl>uw(}aoj>thing{)}) Ainu({icl>uw(}aoj>thing{)}) Alaskan({icl>uw(}aoj>thing{)}) Albanian({icl>uw(}aoj>thing{)}) Aleutian({icl>uw(}aoj>thing{)}) Alexandrian({icl>uw(}aoj>thing{)}) Algerian({icl>uw(}aoj>thing{)}) Altaic({icl>uw(}aoj>thing{)}) American({icl>uw(}aoj>thing{)}) Anglian({icl>uw(}aoj>thing{)}) Anglo-American({icl>uw(}aoj>thing{)}) 40, 000 lexicons are Anglo-Catholic({icl>uw(}aoj>thing{)}) Anglo-French({icl>uw(}aoj>thing{)}) open to public. Anglo-Indian({icl>uw(}aoj>thing{)}) Anglo-Irish({icl>uw(}aoj>thing{)}) The full vocabulary Anglo-Norman({icl>uw(}aoj>thing{)}) includes 200, 000 Arab({icl>uw(}aoj>thing{)}) Arab-Israeli({icl>uw(}aoj>thing{)}) lexicons as of 2007. Arabian({icl>uw(}aoj>thing{)}) Arabic({icl>uw(}aoj>thing{)}) 12

CDLs l CDL. core l l defines the basic format. CDL. nl (or Common Web CDLs l CDL. core l l defines the basic format. CDL. nl (or Common Web Language) l l describes every possible concept expressed in natural language text in any languages. It provides a concept description scheme wrt word, phase, sentence and documents in any languages, based on the CDL concept model. A basic vocabulary set including relation vocabulary is given. l CDL. jpn, CDL. eng, CDL. chi, CDL. spa, etc. l Articulate Japanese (明晰日本語) l defines a style of Japanese sentence expressions which suppress ambiguity referring to CDL. nl 13

CDL. core l Basic Description Elements: Entity       l l Elementary Entity Composite CDL. core l Basic Description Elements: Entity       l l Elementary Entity Composite Entity     Relation      l     Node Hyper-Node Link Attribute-Value pairs can be added to the entity (and the relation). l These are quasi-relations; if the value takes a complex entity, then we can treat it as a relation. 14

Representation with CDL. nl l <John reported to Alice that he bought a computer Representation with CDL. nl l l {#A 01 Event tmp=‘past’; {#B 01 Event tmp=‘past’; {#b 01 buy; } {#b 02 computer ral=‘def’; } {#b 03 yesterday; } [#b 01 agt John] [#b 01 obj #b 02] [#b 01 tim #b 03] } {#John; } {#Alice; } {#a 01 report; } [#a 01 agt #John] [#a 01 gol #Alice] [#a 01 obj #B 01] } Event#A 01 tmp=‘past’ agt report#a 01 gol John# obj Event#B 01 tmp=‘past’ agt buy#b 01 tim Alice# obj computer#b 02 ral=‘def’ yesterday#b 03 15

Semantic Role Labels in Prop. Bank The focus is on Predicate-Argument Structure. l l Semantic Role Labels in Prop. Bank The focus is on Predicate-Argument Structure. l l l l l Arg 0 (prototypical agent) Arg 1 (prototypical patient) These are defined Arg 2 (indirect object/benefactive/instrument/attribute/end state) wrt each word sense. Arg 3 (start point/benefactive/instrument/attribute) Ex) buy: : Arg 4 (end point) Arg 0: buyer Arg 5 ( ) Arg 1: thing bought TMP (time) Arg 2: seller (bought-from) LOC (location) Arg 3: price paid DIR (direction) Arg 4: benefactive (bought-for) MNR (manner) PRP (purpose) CAU (cause) This set is not sufficient for representing MOD (modal verb) every concept expressed in natural NEG (negative marker) language texts. ADV (general-purpose modifier) It cannot be used for every language due DIS (discourse particle and clause) to its language (English) dependency. 16 PRD (secondary predication)

CDL. nl Relations (1) l RELATION l Elemental. Reration 要素関係 [Place. Relation 場所関係] Functional. Relation 機能的観点 § CDL. nl Relations (1) l RELATION l Elemental. Reration 要素関係 [Place. Relation 場所関係] Functional. Relation 機能的観点 § plc(place:場所) [Agent. Relation 主体関係] § plf(initial place:起点) § agt(agent:動作主) § plt(final place:終点) § cag(co-agent:並行動作主) § vip (intermediate place, via place:場 § aoj(thing with attribute:属性主) 所経由) § cao(co-thing with attribute:並行属性主) [State. Relation 状態関係] § ptn(partner:相手) § sta (state:状態) [Patient. Relation 被行為体関係] § src(source, initial state:始状態) § obj(affected thing:対象) § cob(affected co-thing:並行対象) § gol(goal, final state:終状態) § opl(affected place:場所対象) § vis(intermediate place or state:経由) § ben(beneficiary:受益者) [Time. Relation 時間関係] [Instrument. Relation 道具関係] § tim(time:時間) § ins(instrument:道具) § tmf(initial time:始時間) § mat (material:材料) § tmt(final time:終時間) 17 § met(method or means:方法) § dur(duration:期間) l

CDL. nl Relations (2) [Scene. Relation 場面関係] § pos(possessor:所有者) § scn (scene:場面) § cnt(content, CDL. nl Relations (2) [Scene. Relation 場面関係] § pos(possessor:所有者) § scn (scene:場面) § cnt(content, namely:内容) § vic (via scene:場面経由) § nam(name:名前) [Causal. Relation 原因関係] § per(proportion, rate or distribution:単位) § con(condition:条件) § fmt (range/from to:範囲) § pur(purpose or objective:目的) § frm (origin:起源点) § rsn(reason:理由) § to (destination:目的点) [Order. Relation 順序関係] [Logical. Relation 論理関係] § coo(co-occurrence:同起) § and(conjunction:連言的) § seq(sequence:先行) § or (disjunction, alternative:選言的) [Manner. Relation 様態関係] § not (complement:補集合) § mal (qualitative manner:質的仕方) [Concept. Relation 概念関係] § mat (quantitative manner:量的仕方) § equ (equivalent:同義) § bas(basis for expressing a standard: § icl (included / a kind of:上位) 基準) § tof (type-of:具体化) [Modification. Relation 限定関係] § pof(part-of:部分) § mod (modification:限定) § qua (quantity:量的限定) 18

CDL. nl Relations (3) l Inter. Thing. Or. Inter. Event. Relation 間事物・ 間事象関係 [Connecting. Relation CDL. nl Relations (3) l Inter. Thing. Or. Inter. Event. Relation 間事物・ 間事象関係 [Connecting. Relation 接続関係] § cau (causal:順接) § adv (adversative:逆接) § adt (aditive:添加) § cot (contrastive:対比) § par (parallel:同列) § att (attached:補足) [Reffering. Relation 参照関係] § rfi (reffered by identically:同一参照) § rfp (reffered by partially:部分参照) § rfw (reffered by wholly:包含参照) l Attension. Relation 注目関係 § ent (entry:主概念) § foc (focus:焦点) § qfo (question focus:質問焦点) § tpc (topic:トピック) § com (comment:コメント) 19

Rough Correspondence between Semantic Relations of Prop. Band CDL. nl (1) l l l Rough Correspondence between Semantic Relations of Prop. Band CDL. nl (1) l l l l Arg 0 (prototypical agent) agt (agent), cag (co-agent), aoj (thing with attribute), cao (co-thing with attribute) Arg 1 (prototypical patient) obj (affected thing), cob (affected co-thing) Arg 2 (indirect object/benefactive/instrument/attribute/end state) ---, ben (beneficiary), ins (instrument), mat (material), met (method or means), sta (state), gol (goal, final state) Arg 3 (start point/benefactive/instrument/attribute) plf (initial place), ben (beneficiary), ins (instrument), mat (material), met (method or means), sta (state) Arg 4 (end point) plt (final place), to (destination) TMP (time) tim (time), tmf (initial time), tmf (final time), dur (duration) LOC (location) plc (place) DIR (direction) to (destination) MNR (manner) mal (qualitative manner), mat (quantitative manner) PRP (purpose) pur (purpose or objective) CAU (cause) rsn (reason) MOD (modal verb) an attribute in CDL. nl NEG (negative marker) an attribute in CDL. nl ADV (general-purpose modifier) mod (modification), qua (quantity), pos (possessor), cnt (content), nam (name), per (proportion, rate or distribution), fmt (range/from to), frm (origine) DIS (discourse particle and clause) [inter-sentence relation] PRD (secondary predication) [unique in English] 20

Rough Correspondence between Semantic Relations of Prop. Band CDL. nl (2) Other CDL. nl Rough Correspondence between Semantic Relations of Prop. Band CDL. nl (2) Other CDL. nl Relations l [Agent. Relation] ptn (partner): an indispensable non-focused initiator of an action Ex) He competes with John. Mary collaborates with him. l [Patient. Relation] CDL. nl’s Relations other than Predicate-Argument Relations l coo (co-occurrence) seq (sequence) l opl (affected place): a place in focus affected by an event. Ex) He cut the paper in middle in the room. l [Place. Relation] vip (intermediate place, via place) l l l [State. Relation] src (source, initial state) vis (instermediate place or state) l [Scene. Relation] scn(scene): a scene where an event occurs, or state is true, or a thing exists. A scene is different from plc in that plc is the real place something happens, whereas scn is an abstract or metaphorical world. Ex) He won a prize in a contest. He played in the movie. vic (via scene) [Order. Relation] l l [Logical. Relaion] and (conjunction) or (disjunction, atternative) not (complement) [Concept. Relation] equ (equivalent) icl (included/a kind of) tof (type of) pof (part of) [Connecting. Relaion] cau (causal) adv (adversative) adt (additive) cot (contrastive) par (parallel) att (attached) [Referring. Relaion] rfi (referred by identically) rfp (referred by partially) rfw (referred by wholly) [Attension. Relation] ent (entry) main (main element) in Connexor foc (focus) qfo (question focus) tpc (topic) com (comment) 21

Rich Attributes in UNL and CDL. nl Express subjectivity evaluation of the writer/speaker for Rich Attributes in UNL and CDL. nl Express subjectivity evaluation of the writer/speaker for the sentence. l Ex. ) tense, aspect, mood, etc. l Writer’s feeling and judgements Time with respect to speaker l l @ability @get-benefit @give-benefit @conclusion @consequence @sufficient @grant-not @although @discontented @expectation @wish @insistence @intention @want @will @need @obligation-not @should @unavoidable @certain @inevitable @may @possible @probable @rare @regret @unreal @admire @blame @contempt @regret @surprised @troublesome @past @present @future l Writer’s view on aspect of event @begin @complete @continue @custom @end @experience @progress @repeat @state l Writer’s view of reference @generic @def @indef @not @ordinal l Writer’s view of emphasis, focus and topic @emphasis @entry @qfocus @theme @title @topic l l @transitive @symmetric @identifiable @disjoint Writer’s attitudes @affirmative @confirmation @exclamation @imperative @interrogative @invitation @politeness @respect @vocative l l Writer’s View of reference l @generic @def @indef @not @ordinal Describing logical characters and properties of concepts Modifying attribute on aspect @just @soon @yet @not Attribute for convention @passive @pl @angle_bracket @brace 22 @double_parenthesis @double_quote @parenthesis @single_quote @square_bracket

Discourse (Inter-sentence) Relations are missing in current CDL. nl Discourse Relations at ISO/TC 37/SC Discourse (Inter-sentence) Relations are missing in current CDL. nl Discourse Relations at ISO/TC 37/SC 4/TDG 3 (34 types) l l l derivation causes conditional inference purpose trigger l l l compromise conflict contrast unconditional l comparison disjunction dissimilar manner otherwise proportion similar strong. Comparison l l l l detail element example extraction general-specific minimum part process-step Restatement constraint supplement background content evaluation 23

Concept Description Levels Surface Level Concept Description Deep Semantic Level l l There are Concept Description Levels Surface Level Concept Description Deep Semantic Level l l There are several choices for the deep semantic-level description depending on applications. On the other hand, a certain consensus has been made wrt “Concept Description” which is slightly below the surface level, through decades -long researches on NLP, machine translation and electric dictionaries. Whereas a complete consensus has not been achieved yet regarding the Concept Description level and its description scheme, it is meaningful to set up a common concept description format as an international standard today. 24

Hierarchical Construction of Concept Representation in CDL. nl situation (discourse) temporal and causal relations, Hierarchical Construction of Concept Representation in CDL. nl situation (discourse) temporal and causal relations, etc. , and coreference composite concept/event (complex sentence) agent-patient relation, phrasal relation, etc. single event (single sentence) consisting of proposition and modality components composite entity elementary thing/entity corresponding to disambiguated word sense predicate, case components, predicate-modification components, etc. 25

Current Major Issues in CDL. nl l Semi-automatic Conversion from Text. (Text generation from Current Major Issues in CDL. nl l Semi-automatic Conversion from Text. (Text generation from CDL. nl is not so difficult. ) l Semantic Retrieval of CDL Data (The design of a CDL Query Language (CDQL) and its processing mechanism) l Killer Application(s) -- information exchange and share beyond language barrier. -- semantic patent document retrieval. 2018/3/18 26 26

Approaches for Generating CDL Data l Manual Coding & Editing l Even in this Approaches for Generating CDL Data l Manual Coding & Editing l Even in this case, a graphical input editor is necessary. l Graphical Input & Editing (Hasida’s Semantic Authoring) l Some Manual Tagging to Text, then Conversion into CDL. l Semi-automatic Conversion from Text (1) l l Semi-automatic Conversion from Text (2) l l Graphical interface for selecting a right one among possible candidates. Manual disambiguation of the word sense (a pull-down menu selection), then automatic conversion into CDL. Our current approach An approach taken at UNDL foundation Full Automatic Conversion (ultimate goal) 27

Semantic Authoring (by K. Hasida): A Graphical Input Approach Coarse Grain Fine Grain 28 Semantic Authoring (by K. Hasida): A Graphical Input Approach Coarse Grain Fine Grain 28

Semantic Parsing l Language processing is going through: l Syntactic parsing l Dependency parsing Semantic Parsing l Language processing is going through: l Syntactic parsing l Dependency parsing l Shallow semantic parsing l Semantic Role Labeling l Given a sentence: l l The brave soldiers fought with their enemies for their country in the War Assign predicate-argument roles to sentence elements. (Who did What to Whom, When, Where, Why, How, etc. ) l [ARG 0 The brave soldiers] [rel fought] with [ARG 1 -with their enemies] for [ARG 2 -for their country] in [ARGM-loc the War] (in Prop. Bank) l Corpora: Prop. Bank, Frame. Net, … 29

Dependency Parser as a lower basis of our Semantic Analysis Connexor Machines Text Analyser Dependency Parser as a lower basis of our Semantic Analysis Connexor Machines Text Analyser l Dependency Functions close to the semantic role main (main element) agt (agent) : The agent by-phase in passive sentences. Ex) The dog was chased by the boy. ins (instrument) tmp (time) dur (duration) Ex). . . experience in the past 10 years. man (manner) loc (location) sou (source) goa (goal) pth (path) Ex). . . move away from the street. Ex). . . shift to a full power. Ex). . . travel from Tokyo to Beijing. cnt (contingency (purpose or reason)) Ex). . . unable to say why he was too. . cod (condition) qn (quantifier) l Syntactic Functions pcomp (prepositional complement) Ex) They are in that red car. phr (verb particle) Ex) She looked up the word in the dictionary. subj (subject) obj (object) comp (subject complement) Ex) John remains a boy. dat (indirect object) Ex) John gave her an apple. oc (object complement) Ex) John called him a fool. corpred (copredicative) Ex) John regards him as foolish. com (comitative) Ex) Drinking with you is nice. voc (vocative) Ex) John, come here! frq (frequency) qua (quantity) meta (clause adverbial) Ex) So far, he has been …. cla (clause initial adverbial) Ex) Under his guidance, they can. . ha (heuristic prepositional phrase attachment) Ex) escape trough. . . , fight for. . . det (determiner) neg (negator) not attr (attributive nominal) Ex) industrial editor mod (other postmodifer) Ex) … of … ad (attributive adverbial) Ex) So much for modern technology, cc (coordination) Ex) and 30

Named Entity Recognition l In Connexor Machinese Text Analyser l l l + org Named Entity Recognition l In Connexor Machinese Text Analyser l l l + org (organization, company) + loc (location) + ind (individual) + name (name) + role (occupation, title) This info. is useful as a lexical feature for the semantic parsing. 31

Conversion of text into CDL. nl through Shallow Semantic Parsing l Original text The Conversion of text into CDL. nl through Shallow Semantic Parsing l Original text The records retrieved in answer to queries become information that can be used to make decisions. # l Separate each word with a ID The w 37 records w 38 retrieved w 39 in w 40 answer w 41 to w 42 queries w 43 become w 44 information w 45 that w 46 can w 47 be w 48 used w 49 to w 50 make w 51 decisions w 52. w 53 # l Connexor Analiser’s output det: ( w 38 w 37 ) subj: ( w 44 w 38 ) mod: ( w 38 w 39 ) loc: ( w 39 w 40 ) pcomp: ( w 40 w 41 ) mod: ( w 41 w 42 ) pcomp: ( w 42 w 43 ) main: ( w 36 w 44 ) comp: ( w 44 w 45 ) subj: ( w 47 w 46 ) v-ch: ( w 48 w 47 ) v-ch: ( w 49 w 48 ) mod: ( w 45 w 49 ) pm: ( w 51 w 50 ) cnt: ( w 49 w 51 ) obj: ( w 51 w 52 ) obj: ( w 51 w 53 ) # l Hand-coded partial CDL. nl relations obj(w 39 w 38) gol(w 39 w 41) pur(w 39 w 43) aoj(w 44 w 38) obj(w 44 w 45) obj(w 49 w 45) pur(w 49 w 51) obj(w 51 w 52) # 32

Relation/Role Set Comparison l Propbank l l describes how a verb relates to its Relation/Role Set Comparison l Propbank l l describes how a verb relates to its arguments. Frame. Net describes how to describe words with its arguments in a related common scenario. Common disadvantages of Frame. Net & Propbank role set: l The set covers only predicate-argument roles; they don’t consider any other types of relationships between entities. l l l CDL. nl CDL relation set describes how words are correlated and what the meanings of their relationships are. Advantages of CDL. nl relation set l Each relation in the set is pre-defined along with distinctive information from other similar relations. l It describes not only predicate-argument relations, but also those between each pair of entities there exists a meaningful relationship. Thus it has better coverage. l The set has been chosen so that every concept expressed in texts can be sufficiently encoded. l It is universal, i. e. , language independent, and can be applied to any language. l l 33

CDL Relation Set l Used to describes shallow semantic structure of text. l Relations CDL Relation Set l Used to describes shallow semantic structure of text. l Relations have been chosen to be able to sufficiently represent the semantic concepts of texts, and are predefined. l The set of relations contains all relation types which are organized roughly into three groups: l intra-event relations (22) l l inter-entity relations (13) l l agt(agent), aoj(thing with attribute), cag(co-agent), cao(co-thing with attribute), ptn(partner), …. . and(conjunction), con(condition), seq(sequence), …. . qualification relations (9) l mod(modification), pos(possessor), qua(quantity), …. . 34

Frequencies of CDL Relations l Data sparseness : l The whole number of relation: Frequencies of CDL Relations l Data sparseness : l The whole number of relation: 13487 Relation type: 44 l Average num per relation: 306. 5 l nam Mod Obj Aoj And Agt Man Plc Gol Tim Pur Qua #rel 3128 2697 2069 1122 1046 788 446 395 321 289 269 nam Pos Scn Rsn Src Cnt Dur Bas Met Equ Nam Con #rel 86 71 65 63 61 58 49 47 46 41 41 nam Ben Tmt Pof Frm Or Fmt Tmf Seq To Iof Cag #rel 27 25 24 23 21 20 19 17 12 11 10 nam Icl Via Coo Per Ins Plt Ptn Plf Cao Opl Cob #rel 10 9 8 8 8 7 6 4 2 1 0 35

Feature spaces Combine information from different language processes: l l l syntactic analysis l Feature spaces Combine information from different language processes: l l l syntactic analysis l tells the details of the word forms used in the text and the syntactic structures among words. dependency analysis l A dependency relation specifies an asymmetric relationship between words, where one word is a dependent of the other word, which is called its governor. lexical construction l Lexical meaning contains two parts of information: word sense and semantic behavior which is all the semantic relationships the word may contain. 36

Syntax and Dependency Features l Connexor Machinese Text Analyser l l Syntax Features l Syntax and Dependency Features l Connexor Machinese Text Analyser l l Syntax Features l l l based on a functional dependency grammar Morphology features Syntactic features Dependency Features l l Relation type Dependency Path 37

Identification of Entity Pair with a Semantic Relation Testing all possible pairs is not Identification of Entity Pair with a Semantic Relation Testing all possible pairs is not efficient. l Step 1: For each input sentence, generate a dependency tree that specifies the syntactic head in the sentence. l Step 2: Find a head. Node set from the dependency tree. Each can be a headword of a head entity to govern a relation. We select nodes which have subtree, and omit those which cannot be head. Nodes by creating a head stoplist. l l Step 3: For each head. Node, check its subtrees to find those that can be tail entities to the head. Node. We create a tail stoplist containing those that cannot be root nodes of subtrees of tail entities. Repeat this process. Step 4: A simple post-processing is applied to correct the boundaries within which the dependency tree does not show correct relationship. A dependency tree generated from Connexor Machinese Analyser Entity pairs [fought, (the brave soldiers)] [fought, (their enemies)] [fought, (their country)] [fought, (the War)] [soldiers, brave] [enemies, their] [country, their] ……. . 38

Features for CDL Relation Recognition root Syntactic and Dependency-path features main: fought subj: soldiers Features for CDL Relation Recognition root Syntactic and Dependency-path features main: fought subj: soldiers attr: brave det: The ha: loc: in pcomp: phr: War det: for with pcomp: enemies attr: their pcomp: the country attr: their Lexical features from Word. Net, Verb. Net and UNLKB. Some labels of Connexor Machinese Analyser: ha (prepositional phase attachment), phr (verb particle), pcomp (subject complement) 39

Experimental Setting l Datasets: l Manual-annotated dataset (1700 sentences documents) l Contains 13487 CDL. Experimental Setting l Datasets: l Manual-annotated dataset (1700 sentences documents) l Contains 13487 CDL. nl relations (44 types of relations) l We choose to train on top 36 relation types with large number of training instances. l Tools l l l SVM-light software Connexor Machinese Text Analyser Scheme l l 10 -fold cross validation One-vs-all 40

Results The table shows the performance of the SVM using individual kernels incrementally. Kernel   Results The table shows the performance of the SVM using individual kernels incrementally. Kernel   Precision     Recall    F-value   KS 80. 10    86. 35       83. 11 KD      85. 43      83. 57     84. 49 KL      73. 98      82. 75     78. 12 KS+D      87. 19      86. 27     86. 73 KS+D+L     87. 35      88. 07     87. 71 S: syntactic features D: dependency features L: lexical features 41

Supporting Input Editor l l Selection among possible candidates proposed by a computer analysis. Supporting Input Editor l l Selection among possible candidates proposed by a computer analysis.  (Like Japanese Input Front-end Processor. ) Graphical verification and editing. 42

Semantic Search with CDL. nl beyond Keyword-based Search l l l l Baseline: Combination Semantic Search with CDL. nl beyond Keyword-based Search l l l l Baseline: Combination of keywords (The use of bi-gram, tri-gram, … does not lead to the improvement of performance. ) A pair of dependent words leads to a slight improvement. Search using natural language queries is preferable in a sense. However, when the search result is unsatisfactory, it is not obvious for a user how to modify the query sentence. The CDL. nl-based Search allows a more specified search based on a set of words with named dependency relations, rather than with the simple (non-named) dependency. It also allows a search with using more specific word concepts such as one modified by attributes and/or larger concept units than a single word. It allows a search taking account of semantic relevancy, such as similarity between two words, a relation between words in a sentence, etc. 43

Interface for CDL. nl Data Retrieval (Query) l Query by Natural Language l SQL-like Interface for CDL. nl Data Retrieval (Query) l Query by Natural Language l SQL-like Query Language: CDQL l Graphical Query Interface for CDL. nl data 44

Approach to Implementing CDQL l 1 st Step l It is not easy to Approach to Implementing CDQL l 1 st Step l It is not easy to implement it from scratch. l Thus we utilize SPARQL which is the query language for RDF data. l SPARQL is backed by Jena RDB. l Next Step l Maybe original implementation for the CDQL processing. 45

CDL. nl Data Retrieval System 46 CDL. nl Data Retrieval System 46

CDL to RDF 47 CDL to RDF 47

CDL. nl Data converted into RDF Graphical Form 48 CDL. nl Data converted into RDF Graphical Form 48

CDL Data Retrieval via SPARQL : : a simple case Query (the Realization. Label CDL Data Retrieval via SPARQL : : a simple case Query (the Realization. Label of) a person to whom John reported. 49

CDL. nl Data Retrieval via CDQL (an Extended SPARQL) Query: : What did John CDL. nl Data Retrieval via CDQL (an Extended SPARQL) Query: : What did John report. 50

Semantically Flexible Matching Query: : What did John take? 51 Semantically Flexible Matching Query: : What did John take? 51

Toward the Foundation of Next-generation Web 52 Toward the Foundation of Next-generation Web 52

Immediate Applications of Relation Extraction from Texts Jie Yang, Dat Nguyen and Mitsuru Ishizuka Immediate Applications of Relation Extraction from Texts Jie Yang, Dat Nguyen and Mitsuru Ishizuka Dept. of Creative Informatics & Dept. of Info. and Communication Eng. School of Information Science and Technology

Relation Extraction from Wikipedia William Henry Gates III (born October 28, 1955) is the co-founder, Relation Extraction from Wikipedia William Henry Gates III (born October 28, 1955) is the co-founder, chairman, former chief software architect, and former CEO of Microsoft Corporation. He is also the founder of Corbis, a digital image archiving company… (Microsoft, founder, Bill Gates) (Microsoft, chairman, Bill Gates) (Microsoft, CEO, Bill Gates) (Corbis, founder, Bill Gates) … Microsoft Corporation, … Headquartered in Redmond, Washington, USA, its best selling products are the Microsoft Windows operating system and the Microsoft … Office suite of productivity software. (Microsoft, location, Redmond) (Microsoft, product, MS Windows) (Microsoft, product, MS Office) … 54

System Framework Wikipedia Principal Entity Detector Sentence Detector Keyword Extractor Pre-processing Tag & link System Framework Wikipedia Principal Entity Detector Sentence Detector Keyword Extractor Pre-processing Tag & link extractor Sentence Splitter Tokenizer Phrase Chunker Microsoft was… The company was FOUNDER: found, establish… LOCATION: headquartered, situated… MS co-founded… Bill Gates MS … in Albuquerque… …by Bill Gates … in Albuquerque… Secondary Entity Detector Dependency trees Entity Classifier Structured knowledge Microsoft: ORG Bill Gates: PER Albuquerque: LOC FOUNDER Entity type feature LOCATION Sub tree feature SVM classifiers Core Trees 55 Relation Extractor

Triple Tagging allowing rich info. in social tagging (folksonomy) 56 Triple Tagging allowing rich info. in social tagging (folksonomy) 56

Triple Tag Extraction 57 Triple Tag Extraction 57

Triple. Tag Editor 58 Triple. Tag Editor 58

Semantic Retrieval of Patent Documents l l Represent patent texts in CDL. nl (started Semantic Retrieval of Patent Documents l l Represent patent texts in CDL. nl (started in 2008). Also contributes to the translation of patents. 59

I have introduced our research on Semantic Computing centered around CDL (Concept Description Language). I have introduced our research on Semantic Computing centered around CDL (Concept Description Language). Thank You Mitsuru Ishizuka Univ. of Tokyo 60