f46c149f1c5d5d56859ff2b2a1dcdd03.ppt
- Количество слайдов: 18
Communicative evolution: from strings to words to expressions to concepts to intentions Piek Vossen ©Irion Technologies ICT Kenniscongres, April 11 th, 2006
Irion Technologies: n n n The company q Founded in 2000 as a spin-off from TNO Multimedia Technology q 5 investors: Parcom Ventures, FLV, Twinning, TNO, Van Dale The people q About 10 language technology and computer science specialists. Collaboration with teams at Van Dale and TNO. The mission q Equal access to the knowledge and information on the Internet to all people, regardless of language and background q Develop systems that understand language ICT Kenniscongres, April 11 th, 2006
Product Twenty. One Search Paper documents Match / Mining Conceptual Indexing (NLP) Web documents Capture Word Processor Documents Dialogue Crawl Copy Convert Split XML Publishing Platform Databases Cockpit AV Documents ICT Kenniscongres, April 11 th, 2006 Concept extraction Translation Indexing Classification Summarization
Concept Expression in language Words…. …. Words Index of Strings Information Seeker Strings Query ape …. energy …. mass …. …. zebra Strings Information ICT Kenniscongres, April 11 th, 2006 Information Provider
Conceptual match Concept Expression in language my cell phone…. …. mobile Index of Strings Information Seeker Strings Query ape …. …. …. mobile …. …. zebra Strings Information Linguistic mismatch ICT Kenniscongres, April 11 th, 2006 Information Provider
Conceptual mismatch Concept Expression in language my cell phone…. …. nerve cells Index of Strings Information Seeker Strings Query ape …. cell …. …. zebra Strings Information Linguistic match ICT Kenniscongres, April 11 th, 2006 Information Provider
Conceptual mismatch Concept Expression in language police cell …. nerve cells Index of Strings Information Seeker Strings Query ape …. cell …. …. zebra Strings Information Linguistic match ICT Kenniscongres, April 11 th, 2006 Information Provider
Conceptual match Concept Expression in language neuron …. nerve cells Index of Strings Information Seeker Strings Query ape …. cell …. …. zebra Strings Information Linguistic mismatch ICT Kenniscongres, April 11 th, 2006 Information Provider
Recall & Precision Search engine for fatabase with all documents “nerve cell” “police cell” found query: “cell” “cell phone” intersection “mobile phones” relevant recall = Recall < 20% fordoorsnede / relevant basic search engines! precision = doorsnede / gevonden (Blair & Maron 1985) ICT Kenniscongres, April 11 th, 2006
Language technology: a hole in one! Tiger Woods thesaurus golf club(s) bs clu golf for golf sticks Linguistic analysis golf clubs Synonyms, Semantic network ICT Kenniscongres, April 11 th, 2006 at the club
Information systems lack a communicative model n Language is an instrument for communication: q q n Not fully descriptive Minimal & sufficient information for a communicative effect Speakers/writers make assumptions about the addressee: q q q Knowledge of the world Knowledge of language Knowledge about the communicative settings ICT Kenniscongres, April 11 th, 2006
Communicative models in a robust and scalable system n Index of concepts instead of strings q Meaning of a word in context: n Domain of the document: q n Topic of the paragraph: q n [wing player]football player in [police cell]jail Topic of the query: q n transfer scandal => business, crime Phrase: linguistically-motivated combination of words: q n Juventus => football Can I order chicken wings? => food Phrase: q [chicken wings]dish Multilingual semantic networks in many languages to map words to concepts Concept matching calculus for comparing query phrases with phrases in documents q n ICT Kenniscongres, April 11 th, 2006
Communicative models in a robust and scalable system n Dialogue system that cooperates with user: q q q Detect intention: complaint, buy, support, information Measure satisfaction: happy, emotions Avoid deadlocks: n n n Detect vagueness or ambiguity (what meaning of cell? ) Detect topic shifts Handle negative information: “No phones, I want jails!” Allow to change perspective Ask user for help, directions, confirmation and explication Create more context than simple key words and deliver more precision: answers instead of hits. ICT Kenniscongres, April 11 th, 2006
Dialogue system Utterance Typer Dialogue Manager • Can I help you? • My head phone does not work? • Are you looking for support or products? • I want to buy a new one. • Can you tell me more about the product? • It is for my cell phone • Can you give me more details? • It is a Nokia 338 • I found the following accessories for you. Please have a look. • Thats not what I want Retrieval Engine Classifier Engine Phrases Concepts support cell phone accessories repair User Model -Intention -Satisfaction -Emotion ICT Kenniscongres, April 11 th, 2006 Information State: -Positives -Negatives -Relations Facts Model Price In stock
Research & Development n Starting point in 2000: q q n Retrieval technology from TNO (10 years research) Language resources from Van Dale (decades of work) Research projects: q q q q MEANING (IST-2001 -34460), 2002 -2005 PIDGIN (CIC-programme), 2002 -2004 Global Wordnet Association, 2000 – ongoing Aarhus (Provincie Gelderland ICT), 2005 - 2006 Kenniswijk (Senter Novem), 2005 -2006 Gemeente Connect (STEVIN), 2005 -2006 Cornetto (STEVIN), 2006 - 2008 ICT Kenniscongres, April 11 th, 2006
MEANING (IST-2001 -34460) n n n Funded by the EU, 2002 -2005 Conceptual index and conceptual matching Extended search engine (EN, NL, DE, FR, IT, ES) to cover also Basque and Catalan Doubled (!) production in end-user scenario of Spanish publisher EFE. Extended in Aarhus project ICT Kenniscongres, April 11 th, 2006
PIDGIN n n n Funded in the CIC-programme, 2002 -2004 Cross-lingual chat application English-Dutch Sign-post dialogue system that searches information: q q q n In a collaborative task between user and machine Without the need to build a model of the world Can be applied to unlimited amounts of unstructured data Extension: Kenniswijk, Gemeente. Connect ICT Kenniscongres, April 11 th, 2006
Global Wordnet Association: http: //www. globalwordnet. org n n n Stimulates the development and interlinking of semantic networks for all languages in the world World-wide Semantic Grid: mapping of all languages to a single set of concepts Currently 39 languages and extending Bi-annual conference: India (2002), Czech (2004), Korea (2006) Extended for Dutch in Cornetto ICT Kenniscongres, April 11 th, 2006


