7c2b3f206b372123bd1582a8bd23e9c4.ppt
- Количество слайдов: 40
Multi-word Expressions and CG How should MWEs be described?
Questions discussed in a workshop on MWEs – ACL 2007 • Is it sufficient to use purely statistical methods for the extraction of MWEs from corpora, or is it necessary to harness human knowledge and linguistic insights?
Questions discussed in a workshop on MWEs – ACL 2007 • Is fully automatic MWE extraction feasible, or will manual validation always be required?
Questions discussed in a workshop on MWEs – ACL 2007 • What is the nature of MWEs, and how can they be defined formally?
Questions discussed in a workshop on MWEs – ACL 2007 • To what extent can definitions and extraction procedures be generalised to other languages, other text types and other types of MWEs?
Questions discussed in a workshop on MWEs – ACL 2007 • Can and should we distinguish subtypes of MWEs for NLP applications?
Questions discussed in a workshop on MWEs – ACL 2007 • Is it sufficient to use purely statistical methods for the extraction of MWEs from corpora, or is it necessary to harness human knowledge and linguistic insights? Comment: Underlying the question, there is a fundamental misunderstanding on what languages are about. And what is bad in knowledge and linguistic insight?
Questions discussed in a workshop on MWEs – ACL 2007 • Is fully automatic MWE extraction feasible, or will manual validation always be required? Comment: Hopefully yes for both.
Questions discussed in a workshop on MWEs – ACL 2007 • What is the nature of MWEs, and how can they be defined formally? Comment: - At least they are not the same as collocations. - Absence of one to one mapping of members in translation. - Hints to a single semantic concept.
Questions discussed in a workshop on MWEs – ACL 2007 • To what extent can definitions and extraction procedures be generalised to other languages, other text types and other types of MWEs? Comment: I think they are generalizable.
Questions discussed in a workshop on MWEs – ACL 2007 • Can and should we distinguish subtypes of MWEs for NLP applications? Comment: Definitely yes. They often comprise separate POS categories.
How and where to describe MWEs? Two categories of MWEs: - frozen clusters of words - clusters of words, the members of which may inflect
How and where to describe MWEs? Frozen clusters of words - may be described in the tokenizer and analyzed as a single unit
How and where to describe MWEs? Inflecting clusters of words - cannot be described in the tokenizer - they must be described after analysis when all necessary linguistic information is available
How and where to describe MWEs? One possible solution: - describe frozen MWEs in the tokenizer - describe inflecting MWEs alter morphological analysis This was the earlier solution in Swahili Language Manager (SALAMA)
How and where to describe MWEs? Another solution: - describe all MWEs after morphological analysis - exceptions are a few fully lexicalized structures that are written as separate words This solution is applied in current SALAMA
How and where to describe MWEs? In describing inflecting MWEs, the following requirements apply: - each member must be described - the relative location of each member must be described - other words and punctuation marks in between members must be allowed - manipulation of the linguistic information (i. e. tags) must be possible, because the whole cluster will be described anew - it must be possible to isolate the newly described cluster and treat it as a single lexical unit
CG in describing MWEs In SALAMA, CG-2 was used for describing MWEs
CG in describing MWEs Phase 1. Analyze text: ameikubali "kubali" V 1/2 -SG 3 -SP VFIN { he/she } PERF: me 9/10 -SG-OBJ { it } [kubali] { accept } SVO AR shingo "shingo" N 9/10 -0 -SG { a/the } { neck } upande "upande" ADV { aside }
CG in describing MWEs Phase 2. Identify the MWE and describe its structure: ameikubali "kubali" V 1/2 -SG 3 -SP VFIN { he/she } PERF: me 9/10 -SG-OBJ { it } [kubali] { accept } SVO AR shingo "shingo" IN 9/10 -0 -SG { a/the } { neck } upande "upande" <<IDIOM { accept unwillingly } Note: Only the last member is affected, and the new lexical gloss is attached to it
CG in describing MWEs Phase 3. Remodify the other members of the MWE: ameikubali "kubali" V CAP 1/2 -SG 3 -SP VFIN { he/she } PERF: me 9/10 -SG-OBJ { it } [kubali] IDIOM-V>> SVO AR shingo "shingo" IDIOM<> upande "upande" <<IDIOM { accept unwillingly } Note: Gloss in English is rewritten, but necessary linguistic information in verb is retained
CG in describing MWEs Phase 4. Isolate the MWE as a single lexical unit: ("kubali_shingo_upande" V CAP 1/2 -SG 3 -SP VFIN { he/she } PERF: me 9/10 -SG-OBJ { it } SVO AR IDIOM-V>> { accept unwillingly } )
CG in describing MWEs Phase 5. Surface form in English: (V CAP 1/2 -SG 3 -SP VFIN { he/she } PERF: me 9/10 -SG-OBJ SVO AR IDIOM-V>> { has accepted { it } unwillingly } ) Phase 6. he/she has accepted it unwillingly Note 1: Surface form is written using lexical and linguistic information Note 2: The order of words, and their inclusion/exclusion is controlled by re-ordering rules
Problematic cases Original analysis: amechukua "chukua" V 1/2 -SG 3 -SP VFIN { he/she } PERF: me [chukua] { take } SVO hatua "hatua" N 9/10 -0 -PL { step } AR tatu "tatu" NUM 9/10 -PL CARD { three } Marking the idiom (wrong): amechukua "chukua" V 1/2 -SG 3 -SP VFIN { he/she } PERF: me SVO IDIOM-V> hatua "hatua" <IDIOM { take action } tatu "tatu" NUM 9/10 -PL CARD { three }
Safe cases Safe case: amepiga "piga" V 1/2 -SG 3 -SP VFIN { he/she } PERF: me [piga] { hit } SVO hatua "hatua" N 9/10 -0 -SG { a/the } { step } AR amepiga "piga" V 1/2 -SG 3 -SP VFIN { he/she } PERF: me SVO IDIOM-V> hatua "hatua" <IDIOM { advance } he/she has advanced
Types of MWEs Several types of MWEs, and each needs to be treated in a specific way
Types of MWEs Idiomatic expressions: - they often include a verb as a member - a large number of surface forms Alipiga kinanda. REPLACE (<IDIOM { play piano }) TARGET ("kinanda") (-1 ([piga])) ; "<*alipiga>" "piga_kinanda" V 1/2 -SG 3 -SP VFIN { he/she } PAST SVO ACT IDIOM-V "<kinanda>" { play piano }
Types of MWEs Nouns with genitive structure: - number of forms limited, often sg and pl suala la jinsia masuala ya jinsia REPLACE (<<MW { : gender issue }) TARGET ("jinsia") (-2 ("suala")) (-1 GEN-CON); "<suala>" "suala_la_jinsia" N 5/6 -SG { the } AR MW-N "<la>" "<jinsia>" { : gender issue } "<masuala>" "suala_la_jinsia" N 5/6 -PL { the } AR MW-N "<ya>" "<jinsia>" { : gender issue }
Types of MWEs Adjectival expressions with relative structure: - number of forms limited by the number of noun classes mtu mwenye akili REPLACE (ADJ <MW { clever , cute }) TARGET ("akili") (-1 ("enye")) (NOT 0 MW); "<mtu>" "mtu" N 1/2 -SG { the } { man } "<mwenye>" "enye_akili" MW> "<akili>" ADJ { clever , cute }
Types of MWEs Adjectival expressions with relative structure: - number of forms limited by the number of noun classes - is often embedded in the verb structure tendo lililohitimishwa vibaya REPLACE (ADJ <MW { illegitimate }) TARGET ("vibaya") (-1 ("hitimishwa") + REL) (NOT 0 MW); "<tendo>" "tendo" N 5/6 -SG { the } { act } "<lililohitimishwa>" "hitimishwa_vibaya" MW> "<vibaya>" ADJ { illegitimate }
Types of MWEs Adverbial expressions with genitive structure: - number of forms limited kwa bahati mbaya REPLACE ( ADV <<MW { unfortunately } ) TARGET ("baya") (-2 ("kwa")) (-1 ("bahati")) ; "<kwa>" "kwa_bahati_baya" MW>> "<bahati>" "<mbaya>" ADV { unfortunately }
Types of MWEs Proper names with several members: - fixed form Wizara ya Mawasiliano na Uchukuzi REPLACE (<<<<MW { *ministry of *communication et *transport }) TARGET ("uchukuzi") (-4 ("wizara")) (-3 ("ya")) (-2 ("mawasiliano")) (-1 ("na")) ; "<*wizara>" "wizara_ya_mawasiliano_na_uchukuzi" N 9/10 SG { the } AR MW-N "<ya>" "<*mawasiliano>" "<na>" "<*uchukuzi>" { *ministry of *communication et *transport }
Types of MWEs Proverbs: - ‘fixed’ form - one rule for different variants Baada ya dhiki faragha. Baada ya dhiki faraji. REPLACE (<<PROVERB { *after trouble there is relief } ) TARGET ("faragha") OR ("faraji") (-2 ("baada_ya")) (-1 ("dhiki")) ;
Types of MWEs Proverbs: - ‘fixed’ form "*baada_ya_dhiki_faragha" PROVERB>> { *after trouble there is relief } "*baada_ya_dhiki_faraji" PROVERB>> { *after trouble there is relief }
MWEs in dictionary compilation MWEs as separate dictionary entries {tia} V [tia] { put into, pour into, bring about, cause } 296 {tia_akili} V IDIOM-V { take note of } 1 [akili] taz. [tia_akili] V IDIOM-V { take note of } 1
MWEs in dictionary compilation MWEs as separate dictionary entries {afya} N 9/10 { health, sound condition } AR 1226 [afya]a taz. [bwana_afya] MW> N 9/6 { health officer } 10 [afya]a taz. [enye_afya] MW> ADJ { bonny } 17 [afya]a taz. [enye_nguvu_na_afya] MW>>> ADJ { hale } 1
MWEs in dictionary compilation MWEs with use examples: {piga} V (piga) { hit, beat } 647 {piga picha} V IDIOM-V { photograph } 40 [piga picha] <ALA> Ikulu kunywa chai na kupiga [piga picha] picha na Rais Mkapa (the State House to drink tea and to photograph and President Mkapa) [piga picha] <ALA> wapige [piga picha] picha, alionekana kugoma (they should photograph, he/she was seen to boycott) [piga picha] <DWE> Au kumpiga [piga picha] picha au hata kupeana naye (Or to photograph or even to give each other with him/her) [piga picha] <DWE> kutoka Ujerumani, walijitahidi kupiga [piga picha] picha za ukumbusho na kiongozi wao (from Germany, they made an effort to photograph the commemoration and their leader)
MWEs in dictionary compilation MWEs with use examples: {piga ramli} V IDIOM-V { divine } 4 [piga ramli] <KIO> anakwenda kwa mganga ili kupiga [piga ramli] ramli na kuongeza imani za ushirikina (he/she goes to the medical person in order to divine and to increase the faith in superstition) [piga ramli] <KIO> ikambidi amtume mtaalam wa kupiga [piga ramli] ramli kuhusu nyota hiyo (he/she was obliged to send to him/her the expert of divining concerning this star) [piga ramli] <KIO> kwenda kwa mganga wa kupiga [piga ramli] ramli, hujui kuwa imani ya (going to the medical person of divining, you do not know that the faith of) [piga ramli] <RAI> kuachana na mtindo wa kupiga [piga ramli] ramli (to leave with the style of divining)
Conclusion Detailed description of MWEs necessary at least in two applications - machine translation - automatic dictionary compilation
Conclusion Improvements needed for CG parser - possibility for ordering replace rules - more possibilities for controlling the deletion and/or replacement of morphemes
7c2b3f206b372123bd1582a8bd23e9c4.ppt