
fac74c31261ab90b9cefe0fc90c4d09b.ppt
- Количество слайдов: 17
Anaphor Resolution in Norwegian Gordana Ilic Holen Institut for lingvistiske fag Det historisk-filosofiske fakultet Universitetet i Oslo g. i. holen@hfstud. uio. no January 2003 Fefor
Some technical data z Hovedfagsoppgave (incl. obligatory courses, a 4 semestrer project) z Aim: Making a system for resolving pronominal anaphors in Norwegian. z Mentor: Janne Bondi Johannessen z Implementation in (CLOS) LISP z To be finished Christmas 2003 January 2003 Fefor 2
Where did it start? z. Martin Hassel, 2000 y. Made AR system for Swedish pronouns han/ honom/ hans and hon/ henne /hennes z. Differences y. Planning to cover more pronouns y. A different theoretical background January 2003 Fefor 3
The Top List z. Han/ ham/ hans and hun/ hennes y. Among the most used; not ambiguous z. Seg and selv y. Syntactic solutions z. Den y. Ambiguous with the determinative den (gule bilen). January 2003 Fefor 4
The Top Wish List z. De y. Ambiguous with a determinative de (gule bilene) y. Problems delimiting the antecedent z Det y. Problems in deciding whether det is pronominal xdet (gule huset) xdet (regner) January 2003 Fefor 5
Approach To be based on z. Mitkov's anaphora resolution system/ MARS (Mitkov 1996, 1998) and partially on z. Resolution of Anaphora Procedures/ RAP (Leass & Lappin 1994). January 2003 Fefor 6
Why MARS and RAP z. Both made for English z. MARS: intuitive, fully automated z. RAP: high precision z. Flexible January 2003 Fefor 7
MARS z No parsing z The AR module uses a list of preferences called antecedent indicators y. Boosting y. Impeding z Fully automatic, not very high precision - 61%) January 2003 Fefor (60 8
MARS: The algorithm z The text is POS tagged. z NPs are extracted by a NP-extractor z NPs which precede the anaphor (in a twosentence scope) are located z Gender and number constraints are applied z Antecedent indicators are applied to the antecedent candidates that agree in gender and number. The scores (2, 1, 0 or -1) are assigned. z The NP with the highest score is proposed as antecedent. January 2003 Fefor 9
MARS: Antecedent indicators (boosting) z. First noun phrases +1 z. Indicating verbs +1 z. Lexical reiteration +2 / +1 z. Section heading preference +1 z. Collocation match +2 z. Immediate reference +2 z. Sequential instructions +2 z. Term preference +2 January 2003 Fefor 10
MARS: Antecedent indicators (boosting) z. First noun phrases +1 z. Indicating verbs +1 z. Lexical reiteration +2 / +1 z. Section heading preference +1 z. Collocation match +2 z. Immediate reference +2 z. Sequential instructions +2 z. Term preference +2 January 2003 Fefor 11
MARS: Antecedent indicators (boosting) z. First noun phrases +1 z. Indicating verbs +1 z. Lexical reiteration +2 / +1 z. Section heading preference +1 z. Collocation match +2 z. Immediate reference +2 z. Sequential instructions +2 z. Term preference +2 January 2003 Fefor 12
MARS: Antecedent indicators (impeding) z. Indefiniteness -1 z. Prepositional NPs -1 January 2003 Fefor 13
RAP z. A high precision system (86% correctly resolved anaphors) z. Originally based on parsed text, but there exists a version without (Kennedy and Boguraev, 1996) z. The AR module: Salience weighting January 2003 Fefor 14
RAP: Salience weighting z. Salience factors: y. Sentence recency 100 y. Subject emphasis y. Head noun emphasis y. Existential emphasis y. Accusative emphasis y. Non-adverbial emphasis y. IO and oblique component emphasis January 2003 Fefor 80 80 70 50 50 40 15
Modifications As both systems exist in versions with or without parsing, leaving this question open. Starting with using Oslo Corpus for training and adjusting z. Experiment with antecedent indicators and adjust them for Norwegian z. Try to combine them with RAP’s salience factors January 2003 Fefor 16
Open for suggestions g. i. holen@hfstud. uio. no January 2003 Fefor 17