26ae6309b967159d706ba01b05a34284.ppt
- Количество слайдов: 36
Information Status
Varieties of Information Status – Contrast John wanted a poodle but Becky preferred a corgi. – Topic/comment The corgi they bought turned out to have fleas. – Theme/rheme The corgi they bought turned out to have fleas. – Focus/presupposition It was Becky who took him to the vet. – Given/new Some wildcats bite, but this wildcat turned out to be a sweetheart.
Today: Given/New • Why do we care about Given/New? • Defining Given/New: why is this hard? – Hearer-based and Discourse-based models • Uses of Given/New information in NLP • Identifying Given/New information automatically – Rule-based – Corpus-based – The Boston Directions Corpus – Laboratory studies suggest new directions
Why do we care about the given/new distinction? • Building a model of the discourse – What do S and H believe to be true? – What is in their consciousness now? – What is ‘grounded’? • Speech technologies – TTS: Given information is often deaccented while new information is usually accented – ASR?
Defining Given/New • Halliday ‘ 67: – Given: Recoverable from some form of context – New: Not recoverable • Chafe ’ 74 ’ 76: – Given: what S believes is in H’s consciousness – New: what S believes is not… – “Chafe-givenness” Yesterday I had my class disrupted by a bulldog/dog. I’m beginning to dislike dogs/bulldogs. • But not vice versa….
Prince ’ 81: A Given/New Taxonomy • Text as set of instructions from S to H on how to construct a discourse model – Model includes discourse entities, attributes, and links between entities – Discourse entities: individuals, classes, exemplars, substances, concepts (NPs) – Entities as ‘hooks’ on which to hang attributes (Webber ’ 78) • Entities when first introduced are new
– Brand-new (H must create a new entity) I saw a dinosaur today. – Unused (H already knows of this entity) I saw your mother today. • Evoked entities are old -- already in the discourse – Textually evoked The dinosaur was scaley and gray. – Situationally evoked The light was red when you went through it. • Inferrables – Containing
I bought [a carton of eggs]. One of them was broken. [The door of the Bastille] was painted purple. – Non-containing A bus pulled up beside me. The driver was a monkey.
Given/New and Definiteness/Indefiniteness – Definiteness: subject NPs tend to be syntactically definite and old – Indefiniteness: object NPs tend to be indefinite and new I saw a black cat yesterday. The cat looked hungry. • Definite articles, demonstratives, possessives, personal pronouns, proper nouns, quantifiers like all, every signal definiteness…but… There were the usual suspects at the bar. • Indefinite articles, quantifiers like some, any, one signal indefiniteness…but…. This guy came into the room
What’s wrong with a simple Hearer-centric model of given/new? • Hearer-centric information status: – Given: what S believes H has in his/her consciousness – New: what S believes H does not have in his/her consciousness • But discourse entities may also be given and new wrt the current discourse – Discourse-old: already evoked in the discourse – Discourse-new: not evoked
(1) A: I’ve decided to make an appointment with Lee Bollinger. (2) B: Why do you want to see Bollinger? • Hearer status of discourse entities in 1? 2? – If B is your roommate? your mother? a guy on the subway? • Discourse status of discourse entities in 1? 2? • What would be the hearer/discourse status of discourse entities in this version? (1) A: I’ve decided to make an appointment with Lee Bollinger. (2 a) B: Why do you want to see the president? (2 b) B: Have you talked to his secretary?
What does this new Hearer/Discourse given/new distinction provide? • A way to separate what is explicit in the discourse model from what is believed to be in speaker/hearer cognitive model • A way to explain given/new in more complex terms – To identify coreference relations – To explain deaccenting in ASR and TTS
Gross Oversimplification: Given Items Tend to be Deaccented • Accenting and deaccenting: making items intonationally prominent or not • Critical to get this distinction ‘right’ in TTS – Accenting everything makes it hard for people to understand anything, e. g. I like my cat and my cat adores me. One potato, two potato, three potato, … If a discourse entity is given for one speaker then it may or may not be given for another speaker.
How can we determine automatically whether a discourse entity is given or new? • A rule-based approach: – Stem the content words in the discourse – Select a window within which incoming items with the same stem as a previous entity and within this window will be labeled ‘given’ • Other items are ‘new’ • Is this hearer-based? Discourse-based? • How well does it work? – 65 -75% accurate (precision) depending on genre, domain
Boston Directions Corpus (Hirschberg & Nakatani ’ 96) • Experimental Design • 12 speakers: 4 used • Spontaneous and read versions of 9 direction-giving tasks • Corpus: 50 m read; 67 m spon • Labeling – Prosodic: To. BI intonational labeling – Discourse: Grosz & Sidner – Given/new (Prince ’ 92), grammatical function, p. o. s. , …
Boston Directions Corpus: Describe how to get to MIT from Harvard d 1: dsp 1: step 1: enter and get token first enter the Harvard Square T stop and buy a token d 2: dsp 2: inbound on red line then proceed to get on the inbound um Red Line uh subway
dp 3 dsp 3: take subway from hs, to cs to ks and take the subway from Harvard Square to Central Square and then to Kendall Square dp 4: dsp 4: get off T. then get off the T
Hearer and Discourse Given/New Labeling first enter then proceed to get on and take
What could we do with this labeled data? • Can we predict given/new? • Can we predict what will be accented and what will be deaccented?
Does Given/New Status Predict Deaccenting? NPa Deaccented Total HG HI HN DG DN 37. 1% 53. 9% 26. 2% 43. 3% 38. 8% 1009 406 130 596 950
What else might be at work? • Given/new and grammatical function • Hypothesis: how discourse entities are evoked in a discourse influences how ‘given’ they are • E. g. , How might grammatical function and surface position interact with the accentuation of ‘given’ items? • Cases: – X has not been mentioned in the prior context – X has been mentioned, with the same grammatical function/surface position – X has been mentioned but with a different grammatical function/surface position
Experimental Design • Major problem: – How to elicit ‘spontaneous’ productions while varying desired phenomena systematically? – Key: simple variations and actions can capitalize upon natural tendency to associate grammatical functions with particular thematic roles for a given set of verbs
Rectangle Triangle Cylinder Octagon Diamond
Context 1 Rectangle Triangle Cylinder Octagon Diamond
Context 2 Rectangle Triangle Cylinder Diamond Octagon
Context 3 Rectangle Triangle Cylinder Octagon Diamond
Target(A) Triangle Rectangle Cylinder Octagon Diamond
Target(B) Rectangle Triangle Cylinder Octagon Diamond
Experimental Conditions • 10 native speakers of standard American English • Subject and experimenter in soundproof booth • Subject told to describe scenes to confederate outside the booth, visible but with providing no feedback • 10 practice scenarios • ~20 minutes per subject
Prosodic Analysis • Target turns excised analyzed by two judges independently for location of pitch accents for each referring expression: accented (2), unsure (1), deaccented (0) accentedness score from 0 -4 (81% agreement for 0 and 2 scores)
Grammatical Role/Surface Position Accenting CONTEXT GIVEN TARGET D-obj Pp-obj Subj 2. 1 3. 6 3. 2 D-obj 3. 3 0. 6 1. 6 Pp-obj NEW Subj 3. 0 1. 4 0. 7 3. 8 --
Findings • In general – Items that differ from context to target in grammatical function or surface position tend to be accented – Items that share grammatical function and surface position tend to be deaccented • But – Subjects tend to be accented more often than objects, even if previously mentioned in the same role – Direct objects and pp-objects tend to be more distinguished from subjects than from one another
How can we explain these observations? • Consider our examples, e. g. subj D. O. The TRIANGLE touches the CYLINDER. The triangle touches the DIAMOND. The triangle touches the OCTAGON. The RECTANGLE touches the TRIANGLE. • An entity may be ‘given’ or ‘new’ wrt the role it plays in the discourse
Given/New Sensitive to the Role the Discourse Entity Plays • E. g. , a discourse entity may retain a given or take on a new thematic role – By the time the target is uttered, ‘triangle’ is established both as a ‘given’ discourse entity and as the discourse topic (or BLC in centering theory) – But this status has been established for ‘triangle’ as agent – What is new, and, perhaps, focused in the target is ‘triangle’s’ new thematic role as patient – the players are the same but the roles are different
Consequences for NLP – Identification of given/new status must be sensitive to more complex model of context (grammatical function/thematic role) – Will this help us predict deaccenting more accurately? – Stay tuned…. .
Next Class


