SCONLI 3 JNU NEW DELHI Automatic Extraction and Incorporation of Purpose Data into Purpose. Net P. Kiran Mayee Rajeev Sangal Soma Paul
INTRODUCTION Purpose Need for a knowledge base of objects and actions in which the knowledge is organized around purpose.
Purpose. Net is an intelligent knowledgebased system dealing with specialized attributes of artifacts – namely, their purpose, purpose of their types, components, accessories, as also data about their birth, processes, side-effects, maintenance and result on destruction.
Purpose. Net
Building the Purpose. Net Template Designing Revision & Refinement of template Selection of Domain Information Retrieval from Web Ontology population Testing
Need for Automation Acquisition bottleneck Massive availability of text Availability of purpose cues
Purpose data required Artifact -- garage Purpose Action -- store Upon -- vehicle
Purpose Cues Word(s) Lexical entities in a particular order Classification Sentences beginning with artifact name Sentences ending with artifact name Sentence containing artifact name Hidden Cues
Sentences commencing with artifact name
Sentences ending with artifact name We cut action trees upon with an axe. artifact
Sentences containing artifact name Use the air+pump to fill the tyre. Use the to the
Methodology for purpose data extraction
Algorithm for Purpose Data Extraction Algorithm Purp. Data. Extract(corpus) Step 1 : Read first sentence in Corpus. Step 2 : Loop until end-of-corpus – 2 a. if contains(sentence, artifact) and match( sentence, cuetab then extract(sentence, artifact) extract(sentence, to_action) extract(sentence, to_upon) add_to_ontology(artifact, to_action, to_upon) else 2 b. goto step 3. Step 3 : Read next sentence
Data Wikipedia Wordnet – 249 files – 81, 837 descriptions Princeton sentences noun-artifact corpus – 82, 115
Observations – summary results
Purpose Data Extraction Misses
IE Metrics for Extraction
Result Break. Up per Cue Class
Comparison with manually built Ontology Exponential High increase in speed Error Rate
Issues Redundancy Primary purpose not always obtained Pronouns and brand names Correctness and consistency not guaranteed One-to-one Other mapping assumed sentence manifestations
Further Enhancements Parsed Cues input for hidden case Better artifact lookup list Multipage Cloud lookup for consistency computing Automating other attributes of Purpose. Net
Conclusions A methodology was proposed for automated ontology population of purposenet The methodology was implemented on three corpora The time-taken for purposenet 'purpose' ontology population was a fraction of that by manual methods The Error rate was found to be high
Thank You