36adbda038c48105dce70e38aa6383bc.ppt
- Количество слайдов: 49
Ontology 和 How. Net 董振东 董强 dongqiang@keenage. com www. keenage. com dzd@keenage. com Research Centre of Computer & Language Engineering Chinese Academy of Sciences 哈尔滨 2003. 08
提纲 u Ontology u How. Net vs SUMO/Word. Net/Verb. Net
Ontology u 什么是Ontology u Ontology与IT/NLP
什么是Ontology u Ontology是学问 u Ontology是资源
Ontology是学问 u u u 哲学上的Ontology AI/KR上的Ontology 数学上的Ontology 软件 程上的Ontology 语言学上的Ontology IT上的Ontology
Ontology定义涉及的问题 u u u 内在的涵义 外在的表示 作为术语的中文翻译
Ontology与IT/NLP u similar to a dictionary or glossary, but with greater detail and structure that enables computers to process its content. An ontology consists of a set of concepts, axioms, and relationships that describe a domain of interest. An upper ontology is limited to concepts that are meta, generic, abstract and philosophical … -- Standard Upper Ontology (SUO) Working Group u 是一个以汉语和英语的词语所代表的概念为描述对 象,以揭示概念与概念之间以及概念所具有的属性 之间的关系为基本内容的常识知识库。 --《知网》
典型的Ontology u Cyc: http: // www. cyc. com u IFF: The IFF Foundation Ontology u Word. Net: http: //www. cogsci. princeton. edu u Euro. Word. Net: http: //www. hum. uva. nl/ewn/ u How. Net: http: //www. keenage. com u SUMO: http: //ontology. teknowledge. com u EDR: http: //www. iijnet. or. jp u Verb. Net: http: //www. cis. upenn. edu/verbnet/ u Prototype(sinica): http: //ckip. iis. sinica. edu. tw/CKIP/ontology/
How. Net vs SUMO/Word. Net/Verb. Net u SUMO – Suggested Upper Merged Ontology u Mapping Word. Net to SUMO
SUMO – Suggested Upper Merged Ontology u SUMO Sources u SUMO Subclass Hierarchy Tree
SUMO Subclass Hierarchy Tree making constructing manufacture publication cooking searching pursuing investigating diagnostic process social interaction change of possession giving unilateral giving lending getting unilateral getting borrowing
Motivation for Mapping u How can a formal ontology be used effectively by those who lack extensive training in logic and mathematics? u How can an ontology be used automatically by applications? u How can we know when an ontology is complete?
《知网》的架构 D-relation Trigger (Application Tools) S-relation Trigger (Browser) Basic Data (Concept Definitions / Taxonomies)
Basic Data – Sememes Entity thing (physical, mental, fact) component (part, fitting) time space (direction, location) Event (relation, state、action) Attribute Value Secondary feature 2219 154 818 248 892 107
Basic Data – Concept Definition NO. =020957 W_C=大学生 G_C=N E_C= W_E=college student G_E=N E_E= DEF={human|人 : {study|学 习 : agent={~}, location={Institute. Place|场 所 : domain={education|教 育 }, modifier={High. Rank|高 等 }, {study| 学习: location={~}}, {teach|教: location={~}}}}}
Basic Data – Taxonomies - {thing|万物} {entity|实体: {Exist. Appear|存现: existent={~}}} - {physical|物质} {thing|万物: Host. Of={Appearance|外观}, {perception|感知: content={~}}} - {animate|生物} {physical|物质: Host. Of={Age|年龄}, {alive|活着: experiencer={~}}, {die|死: experiencer={~}}, {metabolize|代谢: experiencer={~}}, {reproduce|生殖: agent={~}, Patient. Product={~}}} - {Animal. Human|动物} {animate|生物: Host. Of={Sex|性别}, {Alter. Location|变空间位置: agent={~}}, {State. Mental|精神 状态: experiencer={~}}} - {human|人} {Animal. Human|动物: Host. Of={Name|姓名} {Wisdom|智慧}{Ability|能力}, {think|思考: agent={~}}, {speak|说: agent={~}}}
S-relation Trigger -- Browser
D-relation Trigger -- Application Tools u Relevant Concept Field Builder (相关概念场构造器) Cf. “seed list” Bonnie Dorr & Tiejun Zhao: “化学”/“射击” u Sense Similarity Calculator (语义相似度计算器) “毛衣”Vs“手套”/“醋” u Chinese Chunk Extractor (中文语块抽取器)
知网在海内外的应用 (1) u Semantic Web ontology annotation Ø thesaurus Ø 陈文鋕: Semantic Processing && Semantic Web Service (台湾财团法人资讯 业策进会) Named Entity Recognition Tianfang Yao, Wei Ding, Gregor Erbach: CHINERS: A Ø Chinese Named Entity Recognition System for the Sports Domain
知网在海内外的应用 (2) u Word Sense Disambiguation Chi-Yung Wang: Knowledge-based Sense Pruning using the How. Net: an Alternative to Word Sense Disambiguation Wong Ping Wai: A Maximum Entropy Approach to How. Net Based Chinese Word sense Disambiguation u Word Similarity Computing Liu Qun Li Su Jian: Word Similarity Computing Based on How. Net
知网在海内外的应用 (3) u Sense Annotation u Dependency Relation Annotation Li Ming. Qin, LI Juanzi : Building A Large Chinese Corpus Annotated with Semantic Dependency u Cross-language Developing 授权给台湾中央研究院资讯所合作开发How. Net Big 5+版 数位典藏国家型计划(NDAP) http: //ndap. org. tw/News. Letter/content. html? subuid=559&uid=26
Thank you
当前研究的趋势 u u 理论或哲学上的探索 做mapping、linking、merging 在应用中研究 建设常识性的或专门领域的知识体系
关于建设知识体系的一些看法 u u u 理论与 程的关系 – 把 程放在首位 研究与应用的关系 – 着眼于应用 分清什么是接轨和什么是“接鬼” 五年前有人建议我们把知网改成Word. Net Ø 最近有人建议我们按SUMO来改知网的义原 Ø 把知网这件旗袍改成两件套的西服裙 – 就是接 鬼 Ø
Hownet? 在中文方面,也已有了一个类似词汇网路的资源,叫做《知网》 (How. Net, http: //www. keenage. com)。由大陆的董振东先生在 1995年自力着手进行。它是中英/英中的一个双语词汇网路。早 期版是开放不用收费的。2002起新版改由中国科学院软件所管理 后,就需要付费使用了。 《知网》做法的特色是独树一帜;不采用英文词汇网路的架构只 要采取他自己的架构。而且他先把世界知识本体做个定义,在这 定义里再去做区分。这个由上而下的方法,与英语与欧语词汇网 路由下而上的方法不同,当然有其可取之处。可惜的是,由于当 年资源与讯息的限制,董振东教授与它的儿子董强,基本上是凭 着信念与热诚完成《知网》的,过程中绝少外界的奥援,也并未 与世界相关的研究接轨。他跟他儿子花了约有七、八年的功夫来 做这个事。但是,基本上跟其他语言的词汇网路连接,并无架构 上的基础,而其上层知识分类,也是两人的自由心证,不能说错, 却也缺乏理论的基础,面临一些其他系统互通性(interoperability)的问题。
Records in Word. Net / How. Net Record in Word. Net 03592879 06 n 02 watch 0 ticker 1 012 @ 03506835 n 0000 ~ 02187181 n 0000 %p 02529205 n 0000 ~ 02570752 n 0000 %p 02659936 n 0000 ~ 02841320 n 0000 %p 03021820 n 0000 ~ 03104263 n 0000 ~ 03150171 n 0000 ~ 03410656 n 0000 %p 03593482 n 0000 ~ 03636122 n 0000 | a small portable timepiece Record in How. Net NO. =007738 W_C=表 G_C=N E_C=手~,怀~,钟~,电子~,机械~,带钻石的~,这块~不防水 W_E=watch G_E=N E_E= DEF={tool|用具: {tell|告诉: content={time|时间}, instrument={~}}}
Axiom in SUMO / How. Net (1) See SUMO_buy. doc Cf. How. Net Event Relation & Role shifting {buy|买} <----> {obtain|得到} [consequence]; agent OF {buy|买}=possessor OF {obtain|得到}; possession OF {buy|买}=possession OF {obtain|得到}. {buy|买} (X) <----> {sell|卖} (Y) [mutual implication]; agent OF {buy|买}=target OF {sell|卖}; source OF {buy|买}=agent OF {sell|卖}; possession OF {buy|买}=possession OF {sell|卖}; cost OF {buy|买}=cost OF {sell|卖}.
Axiom in SUMO / How. Net (2) {buy|买} [entailment] <----> {choose|选择}; agent OF {buy|买}=agent OF {choose|选择}; possession OF {buy|买}=content OF {choose|选择}; source OF {buy|买}=location OF {choose|选择}. {buy|买} [entailment] <----> {pay|付}; agent OF {buy|买}=agent OF {pay|付}; cost OF {buy|买}=possession OF {pay|付}; source OF {buy|买}=taget OF {pay|付}.
Thematic Roles in Verb. Net / How. Net See Verb. Net_buy. doc Thematic Roles Agent[+animate OR +organization] Asset[+currency] Beneficiary[+animate OR +organization] Source[+concrete] Theme[] Cf. How. Net Event Role with Typical Actors │ ├ {buy|买} {take|取: agent={human|人}{group|群体->}, possession={artifact|人 物->}, source={human|人} {Institute. Place|场所}, cost={money|货币}, beneficiary={human|人}{group|群体->}, domain={economy|经济}}
Components of How. Net Taxonomy(义原层级规范) u Roles and Features(角色与特征规范) u Specifications of KDML(知识描述语言规范) u Knowledge Database(知识库) u Event Relations & Role Shifting u (事件关系与角色转换) u Maintenance Tools(维护管理 具) u APIs (应用接口)
Nature of How. Net An online knowledge-base which reveals the relationship among concepts, and the relationship among attributes of concepts -- Dong Zhendong, "Knowledge Description: What, How and who? ", Proceedings of International Symposium on Electronic Dictionary, Tokyo, 1988, p. 18
Theory of How. Net Knowledge is a system of relationships among concepts and among attributes of concepts u u Everything is constantly changing in a specific time and space, and converts from one state to another. The conversion embodies the change of its attributes
Guidelines of Design u Computer-oriented u Relationship is the key; to reveal the relationship is the main objective of How. Net u Based on sememes u Use of KDML u Defining concepts in a static & isolate way u Relationship is activated in a dynamic way
Concept Definitions in How. Net (1) 医生:DEF={human|人: domain={medical|医}, Host. Of={Occupation|职位}, {doctor| 医治: agent={~}}} 患者:DEF={human|人: domain={medical|医}, {Suffer. From|罹患: experiencer={~}}, {doctor|医治: patient={~}}} 医院: DEF={Institute. Place|场所: {doctor|医治: location={~}, content={disease|疾病}}, domain={medical|医}}
Concept Definitions in How. Net (2) 病历:DEF={document|文书: {record|记录: content={disease|疾病 }, Location. Fin={~}}, domain={medical|医}} 健康:DEF={Health|健康: host={Animal. Human|动物}} 多病:DEF={unhealthy|不健} │ │ ├ {Health. Value|健康值} │ │ │ ├ {healthy|康健} │ │ │ └ {unhealthy|不健}
Concept Definitions in How. Net (3) 病:{disease|疾病} {phenomena|现象: {doctor|医治: content={~}}, {Suffer. From|罹患 : content={~}}, Relate. To={medicine|药物} {Health|健康}{Health. Value|健康值}, domain={medical|医}} 药: {medicine|药物} {artifact|人 物: {doctor|医治 : instrument={~}}, Relate. To={disease|疾病}, domain={medical|医}{chemistry|化学}}
Identity of description in different language structures (1) W_C=劫 G_C=V E_C= W_E=rob G_E=V E_E= DEF={rob|抢} W_C=飞机 G_C=N E_C= W_E=plane G_E=N E_E= DEF={aircraft|飞行器}
Identity of description in different language structures (2) W_C=劫机 G_C=V E_C= W_E=hijack a plane G_E=V E_E= DEF={rob|抢: possession={aircraft|飞行器}}
Identity of description in different language structures (3) W_C=劫机犯 G_C=N E_C= W_E=hijacker G_E=N E_E= DEF={human|人: {rob|抢: agent={~}, possession={aircraft|飞行器}}}
Identity of description in different language structures (4) W_C=抓获劫机犯 G_C=V E_C= W_E=catch a hijacker G_E=V E_E= DEF={catch|捉住: patient={human|人: {rob|抢: agent={~}, possession={wealth|钱财}}}}
Identity of description in different language structures (1) W_C=机敏地抓获女劫机犯 G_C=V E_C= W_E=catch a woman hijacker cleverly G_E=V E_E= DEF={catch|捉住: manner={clever|灵}, patient={human|人: {rob|抢: agent={~}, possession={wealth|钱财}}, modifier={female|女}}}
Applications of How. Net 1. Semantic tagging 2. WSD,Sense Pruning 3. Sensitive information detection 4. Information filtering 5. Similarity of words 6. Semantic Web 7. Match of Word. Net
Future work u Construction of resouces Ø English How. Net Ø Chinese message structure bank Ø Increase of languages u Developing more APIs and tools u Administration Ø Membership
Ontology定义的附录 (1) a specification of a conceptualization u theory of objects and their ties u similar to a dictionary or glossary, but with greater detail and structure that enables computers to process its content. An ontology consists of a set of concepts, axioms, and relationships that describe a domain of interest. An upper ontology is limited to concepts that are meta, generic, abstract and philosophical … u
Ontology定义的附录 (2) the study of what there is, an inventory of what exists …What we may call ontology is the attempt to say what entities exist. Metaphysics, by contrast, is the attempt to say, of those entities, what they are. u the study of the categories of things that exist or may exist in some domain u The word ontology comes from the Greek ontos for being and logos for word. u
Euro. Word. Net For the development of French language, here were 2 partners: Avignon (AVI) and Memodata (MEM). The following was requested : AVI MEM Personnel 72000 Equipment 3000 Travel & assistance 5000 Consumables & computing 3000 Overheads 16600 Total 99600 85000 0 1500 300 17100 104400 Since Memodata was a private company, only 50% of its request could be funded by the EC. So the total of the request was: Total AVI 99600 MEM 52200 Notes: 1) validation is not included in this table. This has be done by Xerox and Bertin globallyfor several languages. 2) These amounts constitued a previsional budget corresponding to some 20 000 synsets.
Demo of Tools (1) Relevant Concept Field (2) Similarity of Words (3) Chinese Chunk Extractor (4) Smart Word finder
Overview of How. Net u Components of How. Net u Nature of How. Net u Theory of How. Net u Guidelines of Design u Sememes and Relations
需要的备用文件 How. Net Browser (桌面) Relevant concept field (桌面) – “行” Similarity computing (桌面) – 数位典藏计划 (目录 “ontology”) Prof. Huang’s comment on How. Net (桌面) U 32下:Taxonomy Event Relation & Role Shifting Taxonomy Typical Actors Papers (Applications about How. Net)
36adbda038c48105dce70e38aa6383bc.ppt