Working with Discourse Representation Theory Patrick Blackburn

Working with Discourse Representation Theory Patrick Blackburn & Johan Bos Lecture 5 Applying DRT

Today q. Given what we know about DRT, both from a theoretical and practical perspective, can we use it for practical applications?

Outline q. Spoken dialogue system with DRT § Using DRT and inference to control a mobile robot q. Wide coverage parsing with DRT § Recognising Textual Entailment

Human-Computer Dialogue q. Focus on small domains § Grammatical coverage ensured § Background knowledge encoding q. Spoken Dialogue system § Godot the robot § Speech recognition and synthesis § People could give Godot directions, ask it questions, tell it new information § Godot was a REAL robot

Godot the Robot Godot with Tetsushi Oka

Typical conversation with Godot q. Simple dialogues in beginning § § § Human: Robot? Robot: Yes? Human: Where are you? Robot: I am in the hallway. Human: OK. Go to the rest room!

Adding DRT to the robot q The language model that the robot used for speech recognition returned DRSs q We used the model builder MACE and theorem prover SPASS for inference q The model produced by MACE was used to find out what the robot should do q This was possible as MACE produces minimal models q Of course we also checked for consistency and informativity

Advanced conversation with Godot q. Dealing with inconsistency and informativity § § § Human: Robot, where are you? Robot: I am in the hallway. Human: You are in my office. Robot: No, that is not true. Human: You are in the hallway. Robot: Yes I know!

Videos of Godot Video 1: Godot in the basement of Bucceuch Place Video 2: Screenshot of dialogue manager with DRSs and camera view of Godot

Minimal Models q. Model builders normally generate models by iteration over the domain size q. As a side-effect, the output is a model with a minimal domain size q. From a linguistic point of view, this is interesting, as there is no redundant information

Using models q. Examples: Turn on a light. Turn on everything except the radio. Turn off the red light or the blue light. Turn on another light.

Adding presupposition q Godot was connected to an automated home environment q One day, I asked Godot to switch on all the lights q However, Godot refused to do this, responding that it was unable to do so. q Why was that? § At first I thought that theorem prover made a mistake. § But it turned out that one of the lights was already on.

Intermediate Accommodation q Because I had coded to switch on X having a precondition that X is not on, theorem prover found a proof. q Coding this as a presupposition, would not give an inconsistency, but a beautiful case of intermediate accommodation. q In other words: § Switch on all the lights! [ All lights are off; switch them on. ] [=Switch on all the lights that are currently off]

Sketch of resolution x Robot[x] e y Light[y] => Off[y] switch[e] Agent[e, x] Theme[e, y]

Global Accommodation x Robot[x] Off[y] e y Light[y] => switch[e] Agent[e, x] Theme[e, y]

Intermediate Accommodation x Robot[x] e y Light[y] Off[y] => switch[e] Agent[e, x] Theme[e, y]

Local Accommodation x Robot[x] e y Light[y] => switch[e] Agent[e, x] Theme[e, y] Off[y]

Outline q. Spoken dialogue system with DRT § Using DRT and inference to control a mobile robot q. Wide coverage parsing with DRT § Recognising Textual Entailment

Wide-coverage DRT q. Nowadays we have robust widecoverage parsers that use stochastic methods for producing a parse tree q. Trained on Penn Tree Bank q. Examples are parsers like those from Collins and Charniak

Wide-coverage parsers q. Say we wished to produce DRSs on the output of these parsers q. We would need quite detailed syntax derivations q. Closer inspection reveals that many of the parsers use many [several thousands] phrase structure rules q. Often, long distance dependencies are not recovered

Combinatory Categorial Grammar q. CCG is a lexicalised theory of grammar (Steedman 2001) q. Deals with complex cases of coordination and long-distance dependencies q. Lexicalised, hence easy to implement § English wide-coverage grammar § Fast robust parser available

Categorial Grammar q. Lexicalised theory of syntax § Many different lexical categories § Few grammar rules q. Finite set of categories defined over a base of core categories § Core categories: s np n pp § Combined categories: np/n snp (snp)/np

CCG: type-driven lexicalised grammar Category Name Examples N noun Ralph, car NP noun phrase Everyone NP/N determiner a, the, every SNP intrans. verb walks, smiles (SNP)/NP transitive verb loves, hates (SNP)(SNP) adverb quickly

CCG: combinatorial rules q. Forward Application (FA) q. Backward Application (BA) q. Generalised Forward Composition (FC) q. Backward Crossed Composition (BC) q. Type Raising (TR) q. Coordination

CCG derivation NP/N: a N: spokesman SNP: lied

CCG derivation NP/N: a N: spokesman SNP: lied ---------------- (FA)

CCG derivation NP/N: a N: spokesman SNP: lied ---------------- (FA) NP: a spokesman

CCG derivation NP/N: a N: spokesman SNP: lied ---------------- (FA) NP: a spokesman -------------------- (BA)

CCG derivation NP/N: a N: spokesman SNP: lied ---------------- (FA) NP: a spokesman -------------------- (BA) S: a spokesman lied

Coordination in CCG np: Artie (snp)/np: likes (xx)/x: and np: Tony (snp)/np: hates np: beans ---------------- (TR) s/(snp): Artie s/(snp): Tony ------------------ (FC) -------------------- (FC) s/np: Artie likes s/np: Tony hates ---------------------------- (FA) (s/np)(s/np): and Tony hates ----------------------------------------- (BA) s/np: Artie likes and Tony hates --------------------------- (FA) s: Artie likes and Tony hates beans

The Glue q. Use the Lambda Calculus to combine CCG with DRT q. Each lexical entry gets a DRS with lambda-bound variables, representing the “missing” information q. Each combinatorial rule in CCG gets a semantic interpretation, again using the tools of the lambda calculus

Interpreting Combinatorial Rules q. Each combinatorial rule in CCG is expressed in terms of the lambda calculus: § Forward Application: FA( , ) = @ § Backward Application: BA( , ) = @ § Type Raising: TR( ) = x. x@ § Function Composition: FC( , ) = x. @x@

CCG: lexical semantics Category N NP/N SNP Semantics x. Example spokesman(x) p. q. (( X ; p@x); q@x) e x. x@ y. lie(e) agent(e, y) a lied

CCG derivation NP/N: a N: spokesman e x p. q. SNP: lied ; p@x; q@x z. spokesman(z) x. x@ y. lie(e) agent(e, y)

CCG derivation NP/N: a N: spokesman e x p. q. SNP: lied ; p@x; q@x z. spokesman(z) x. x@ y. lie(e) agent(e, y) ------------------------ (FA) NP: a spokesman p. q. x ; p@x; q@x@ z. spokesman(z)

CCG derivation NP/N: a N: spokesman SNP: lied e x p. q. ; p@x; q@x z. spokesman(z) x. x@ y. lie(e) agent(e, y) ---------------------------- (FA) NP: a spokesman q. x ; spokesman(x) ; q@x

CCG derivation NP/N: a N: spokesman SNP: lied e x p. q. ; p@x; q@x z. spokesman(z) x. x@ y. lie(e) agent(e, y) ---------------------------- (FA) NP: a spokesman x q. spokesman(x) ; q@x

CCG derivation NP/N: a N: spokesman SNP: lied e x p. q. ; p@x; q@x z. spokesman(z) x. x@ y. lie(e) agent(e, y) ---------------------------- (FA) NP: a spokesman x q. spokesman(x) ; q@x ---------------------------------------- (BA) S: a spokesman lied e x. x@ y. lie(e) agent(e, y) @ q. x spokesman(x) ; q@x

CCG derivation NP/N: a N: spokesman SNP: lied e x p. q. ; p@x; q@x z. spokesman(z) x. x@ y. lie(e) agent(e, y) ---------------------------- (FA) NP: a spokesman x q. spokesman(x) ; q@x ---------------------------------------- (BA) S: a spokesman lied q. x spokesman(x) e ; q@x @ y. lie(e) agent(e, y)

CCG derivation NP/N: a N: spokesman SNP: lied e x p. q. ; p@x; q@x z. spokesman(z) x. x@ y. lie(e) agent(e, y) ---------------------------- (FA) NP: a spokesman x q. spokesman(x) ; q@x ---------------------------------------- (BA) S: a spokesman lied x spokesman(x) e ; lie(e) agent(e, x)

CCG derivation NP/N: a N: spokesman SNP: lied e x p. q. ; p@x; q@x z. spokesman(z) x. x@ y. lie(e) agent(e, y) ---------------------------- (FA) NP: a spokesman x q. spokesman(x) ; q@x ---------------------------------------- (BA) S: a spokesman lied xe spokesman(x) lie(e) agent(e, x)

The Clark & Curran Parser q. Use standard statistical techniques § Robust wide-coverage parser § Clark & Curran (ACL 2004) q. Grammar derived from CCGbank § 409 different categories § Hockenmaier & Steedman (ACL 2002) q. Results: 96% coverage WSJ § Bos et al. (COLING 2004) § Example output:

Applications q. Has been used for different kind of applications § Question Answering § Recognising Textual Entailment

Recognising Textual Entailment q A task for NLP systems to recognise entailment between two (short) texts q Introduced in 2004/2005 as part of the PASCAL Network of Excellence q Proved to be a difficult, but popular task. q Pascal provided a development and test set of several hundred examples

RTE Example (entailment) RTE 1977 (TRUE) His family has steadfastly denied the charges. --------------------------The charges were denied by his family.

RTE Example (no entailment) RTE 2030 (FALSE) Lyon is actually the gastronomical capital of France. --------------------------Lyon is the capital of France.

Aristotle’s Syllogisms ARISTOTLE 1 (TRUE) All men are mortal. Socrates is a man. ---------------Socrates is mortal.

How to deal with RTE q. There are several methods q. We will look at five of them to see how difficult RTE actually is

Recognising Textual Entailment Method 1: Flipping a coin

Flipping a coin q. Advantages § Easy to implement q. Disadvantages § Just 50% accuracy

Recognising Textual Entailment Method 2: Calling a friend

Calling a friend q. Advantages § High accuracy (95%) q. Disadvantages § Lose friends § High phonebill

Recognising Textual Entailment Method 3: Ask the audience

Ask the audience RTE 893 (? ? ) The first settlements on the site of Jakarta were established at the mouth of the Ciliwung, perhaps as early as the 5 th century AD. --------------------------------The first settlements on the site of Jakarta were established as early as the 5 th century AD.

Human Upper Bound RTE 893 (TRUE) The first settlements on the site of Jakarta were established at the mouth of the Ciliwung, perhaps as early as the 5 th century AD. --------------------------------The first settlements on the site of Jakarta were established as early as the 5 th century AD.

Recognising Textual Entailment Method 4: Word Overlap

Word Overlap Approaches q. Popular approach q. Ranging in sophistication from simple bag of word to use of Word. Net q. Accuracy rates ca. 55%

Word Overlap q. Advantages § Relatively straightforward algorithm q. Disadvantages § Hardly better than flipping a coin

RTE State-of-the-Art q. Pascal RTE challenge q. Hard problem q. Requires semantics

Recognising Textual Entailment Method 5: Using DRT

Inference q. How do we perform inference with DRSs? § Translate DRS into first-order logic, use off-the-shelf inference engines. q. What kind of inference engines? § Theorem Provers § Model Builders

Using Theorem Proving q. Given a textual entailment pair T/H with text T and hypothesis H: § Produce DRSs for T and H § Translate these DRSs into FOL § Give this to theorem prover: T’ H’ q. If theorem prover finds a proof, then T entails H

Vampire (Riazanov & Voronkov 2002) q. Let’s try this. We will use theorem prover Vampire (currently the best known theorem prover for FOL) q. This gives us good results for: § § § apposition relative clauses coodination intersective adjectives/complements passive/active alternations

Example (Vampire: proof) RTE-2 112 (TRUE) On Friday evening, a car bomb exploded outside a Shiite mosque in Iskandariyah, 30 miles south of the capital. --------------------------A bomb exploded outside a mosque.

Example (Vampire: proof) RTE-2 489 (TRUE) Initially, the Bundesbank opposed the introduction of the euro but was compelled to accept it in light of the political pressure of the capitalist politicians who supported its introduction. --------------------------The introduction of the euro has been opposed.

Background Knowledge q. However, it doesn’t give us good results for cases requiring additional knowledge § Lexical knowledge § World knowledge q. We will use Word. Net as a start to get additional knowledge q. All of Word. Net is too much, so we create Mini. Word. Nets

Mini. Word. Nets q. Mini. Word. Nets § Use hyponym relations from Word. Net to build an ontology § Do this only for the relevant symbols § Convert the ontology into first-order axioms

Mini. Word. Net: an example q. Example text: There is no asbestos in our products now. Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes.

x(user(x) person(x)) x(worker(x) person(x)) x(researcher(x) person(x))

x(person(x) risk(x)) x(person(x) cigarette(x)) …….

Using Background Knowledge q. Given a textual entailment pair T/H with text T and hypothesis H: § § Produce DRS for T and H Translate drs(T) and drs(H) into FOL Create Background Knowledge for T&H Give this to theorem prover: (BK & T’) H’

Mini. Word. Nets at work RTE 1952 (TRUE) Crude oil prices soared to record levels. --------------------------Crude oil prices rise. q Background Knowledge: x(soar(x) rise(x))

Troubles with theorem proving q. Theorem provers are extremely precise. q. They won’t tell you when there is “almost” a proof. q. Even if there is a little background knowledge missing, Vampire will say: NO

Vampire: no proof RTE 1049 (TRUE) Four Venezuelan firefighters who were traveling to a training course in Texas were killed when their sport utility vehicle drifted onto the shoulder of a Highway and struck a parked truck. --------------------------------Four firefighters were killed in a car accident.

Using Model Building q. Need a robust way of inference q. Use model builder Paradox § Claessen & Sorensson (2003) q. Use size of (minimal) model § Compare size of model of T and T&H § If the difference is small, then it is likely that T entails H

Using Model Building q Given a textual entailment pair T/H with text T and hypothesis H: § § Produce DRSs for T and H Translate these DRSs into FOL Generate Background Knowledge Give this to the Model Builder: i) BK & T’ ii) BK & T’ & H’ q If the models for i) and ii) are similar in size, then we predict that T entails H

Example 1 q. T: John met Mary in Rome H: John met Mary q. Model T: 3 entities Model T+H: 3 entities q. Modelsize difference: 0 q. Prediction: entailment

Example 2 q. T: John met Mary H: John met Mary in Rome q. Model T: 2 entities Model T+H: 3 entities q. Modelsize difference: 1 q. Prediction: no entailment

Model size differences q. Of course this is a very rough approximation q. But it turns out to be a useful one q. Gives us a notion of robustness q. Of course we need to deal with negation as well § Give not T and not [T & H] to model builder § Not necessarily one unique minimal model

Lack of Background Knowledge RTE-2 235 (TRUE) Indonesia says the oil blocks are within its borders, as does Malaysia, which has also sent warships to the area, claiming that its waters and airspace have been violated. -------------------------------There is a territorial waters dispute.

How well does this work? q We tried this at the RTE 2004/05 q Combined this with a shallow approach (word overlap) q Using standard machine learning methods to build a decision tree q Features used: § § § Proof (yes/no) Model size difference Word Overlap Task (source of RTE pair)

RTE Results 2004/5 Accuracy CWS Shallow 0. 569 0. 624 Deep 0. 562 0. 608 Hybrid (S+D) 0. 577 0. 632 Hybrid+Task 0. 612 0. 646 Bos & Markert 2005

Conclusions q. We have got the tools for doing computational semantics in a principled way using DRT q. For many applications, success depends on the ability to systematically generate background knowledge § Small restricted domains [dialogue] § Open domain

What we did in this course q We introduced DRT, a notational variant of first-order logic. q Semantically, we can handle in DRT anything we can in FOL, including events. q Moreover, because it is so close to FOL, we can use first-order methods to implement inference for DRT. q The DRT box syntax, is essentially about nesting contexts, which allows a uniform treatment of anaphoric phenomena. q Moreover, this works not only on theoretical level, but is also implementable, and even applicable.

What we hope you got out of it q First, we hope we made you aware that nowadays computational semantics is able to handle some difficult problems. q Second, we hope we made you aware that DRT is not just a theory. It is a complete architecture allowing us to experiment with computational semantics. q Third, we hope you are aware that state-ofthe-art inference engines can help to study or apply semantics.

Where you can find more q For more on DRT read the standard textbook devoted to DRT by Kamp and Reyle. This book discusses not only the basic theory, but also plurals, tense, and aspect.

Where you can find more q For more on the basic architecture underlying this work on computational semantics, and particular on implementations on the lambda calculus, and parallel use of theorem provers and model builders, see: www. blackburnbos. org

Where you can find more q All of theory we discussed in this course is implemented in Prolog. This software can be downloaded from www. blackburnbos. org. For an introduction to Prolog written very much with this software in mind, try Learn Prolog Now! www. learnprolognow. org