a51d569b033f5f0dfda331312ab0c713.ppt
- Количество слайдов: 63
A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer
Introduction n Goal: Create a machine translation system that translates Turkish text into English text ¨ Turkish has an agglutinative n ev+im+de+ki+ne n to the one at my home morphology ¨ Turkish has free word order n Ben eve gittim, Eve gittim ben, Gittim ben eve, . . . n I went to the house n Idea Write rules to translate analyzed Turkish sentence into English 2
Outline n Machine Translation (MT) Motivation ¨ Challenges in MT ¨ History of MT ¨ Classical Approaches to MT ¨ n The Hybrid Approach Challenges ¨ Translation Steps ¨ n n Analysis and Preprocessing Transfer and Generation Decoding Evaluation Methods ¨ Experimental Results ¨ Examples ¨ n Conclusions 3
Machine Translation n Given: Input text s in source language S n Find: A well-formed text in target language T that is equivalent to s Machine Translation (MT) n Any system using an electronic computer to perform translation 4
Motivation n Satisfy increasing demand for translation ¨ 100 n languages with 5 million or more native speakers Reduce the cost and effort of human translation ¨ 13% of EU budget ¨ weeks vs. minutes n Make information available to more people in less time ¨ translation n of web sites automatically Exploring limits to computers’ ability and linguistic challenges 5
Challenges in MT n Morphological issues ¨ Each n language has a different morphology Syntactical issues ¨ Word order in sentences and noun phrases ¨ Language-specific features (narrative past tense in Turkish, distinguishing feminine and masculine nouns) n Semantical issues ¨ Word n sense ambiguities bank geographical term OR financial institution? ¨ Idiomatic n phrases kafa çekmek pull head OR drink alcohol? 6
History of MT n n Idea by Warren Weaver in 1945 1950 s: Russian-English MT research during cold war between US and USSR 1960 s: Funding for research stopped due to failure Mid-1970 s ¨ METÉO: English-French MT in Canada ¨ Systran and Eurotra: Multi-lingual MT in Europe ¨ TITRAN and MU Project in Kyoto University, Japan n After 90 s ¨ Statistical MT: Use statistics and large amount of data 7
MT between English and Turkish n Morphological analyzer ¨ Oflazer, n 1993. Morphological disambiguator ¨ Oflazer & Kuruöz, 1994. ¨ Hakkani-Tür et al. , 2000. ¨ Yuret & Türe, 2006. n English-to-Turkish MT ¨ Sagay, 1981. ¨ Hakkani et al. , 1998. ¨ Keyder Turhan, 1997. n No Turkish-to-English system 8
Classical Approaches to MT 9
Vauquois Triangle Interlingua Semantic level sis aly An tio n Lexical level era Syntactic level n Ge Transfer 10
Word-by-word Translation Source sentence Bilingual Dictionary Target sentence Source sentence: Ali evdeki kediyi çok sevmez Translation: Ali home cat very like Reference: Ali does not like the cat at home very much 11
Direct Translation Source sentence Morphological Analyzer Source: Ali evde Analysis: Ali ev+Loc Lexical: Ali home+Loc Reorder: Ali at+Adj Generate: Ali at -ki Rel+Adj at+Adj home+Loc home Lexical Transfer kediyi kedi+Acc cat+Acc cat Local Reordering çok+Adv very much+Adv like+Neg+Present not like Target sentence sevmez sev+Neg+Present like+Neg+Present very much+Adv very much 12
Transfer-based Translation SL TL Grammar Source sentence SL Representation Transfer rules / Dictionary Grammar TL Target Representation sentence 13
Transfer-based Translation SL TL Transfer rules / Dictionary Grammar Source sentence SL Representation Grammar TL Target Representation sentence NP NP mavi evin duvarı NP NP NP A mavi N ev+in N duvar+ı Det the wall of the blue house PP NP N wall Prep of NP Det the AP NP N A blue house 14
Interlingual Translation Source sentence Analysis Interlingua Generation Target sentence Source: Ali evdeki kediyi çok sevmez Interlingua: ¬holds(in_general, like(subj: Ali, obj: cat(at: home), degree: very much)) Translation: Ali does not like the cat at home very much 15
Statistical MT Given a Turkish sentence t, find the English sentence e that is the “most likely” translation of t 16
Statistical MT Turkish-English aligned text whether an English text e is a good translation of a Turkish text t Translation Model P(t|e) English text Language Model P(e) whether an English text e is well-formed English or not Decoding argmax P(e) * P(t|e) e 17
Statistical MT Ali çok açtı Translation LM Score TM Score e P(e) P(t|e)×P(e) I have a book 0. 9 0. 2 0. 18 Hungry Ali be so 0. 1 0. 8 0. 08 Ali was so hungry 0. 8 0. 64 . . . Ali was so hungry 18
Outline n Machine Translation (MT) Motivation ¨ Challenges in MT ¨ History of MT ¨ Classical Approaches to MT ¨ n The Hybrid Approach Challenges ¨ Translation Steps ¨ n n Analysis and Preprocessing Transfer and Generation Decoding Evaluation Methods ¨ Experimental Results ¨ Examples ¨ n Conclusions 19
The Hybrid Approach 20
Why Hybrid? Classical transfer-based approaches are good at ¨ representing the structural differences between the source and target languages. and statistical methods are good at ¨ extracting knowledge from large amounts of data, about how well-formed a sentence or how “meaningful” a translation is. 21
Challenges Morphological differences Avrupalılaştıramadıklarımızdanmışsınız You were among the ones who we were not able to cause to become European n n Extreme case of a word in an agglutinative language Each Turkish morpheme corresponds to one or more words in English 22
Challenges Morphological differences arkadaşımdakiler the ones at my friend 23
Challenges Structural differences dinle+miş+sin dinle+di+n (someone told me that) you listened dinle+t+ti+n dinle+t+tir+di+n you made (someone) listen you had (someone) make (someone) listen dinle+r+im dinle+r+di+m I listen I used to listen dinle+t+ebil+ir+miş+im ? ? ? 24
Challenges Structural differences Adam evde kitap okuyordu The man was reading a book at home SUBJ ADJCT OBJ V SUBJ V OBJ ADJCT mavi kitap AP NP blue book AP NP evdeki kitap the book at home AP NP NP AP kitabımın kapağı my book’s cover NP 1 NP 2 arkadaşımın yüzünden because of my friend NP 1 NP 2 NP 1 25
Challenges Ambiguities koyun 1. sheep (or bosom) 2. your bay 3. your dark (one) 4. of the bay 5. put! 26
Challenges Ambiguities silahını evine koy 1. put your gun to your home 2. put your gun to his home 3. put his gun to your home 4. put his gun to his home 5. put your gun to her home 6. put her gun to your home 7. put her gun to her home. . 27
Challenges Ambiguities kitabın kapağı 1. the book’s cover 2. book’s cover 3. the cover of the book 28
Challenges Ambiguities ev+Dative (gitti) (went) to the house masa+Dative (çıktı) (jumped) on the table adam+Dative (baktı) (looked) at the man 29
Challenges Morphological Use morphological analysis on Turkish side differences and generation on English side -------------------------------------Transfer rules can represent such Structural transformations differences -------------------------------------- Ambiguities An English language model can determine the most probable translation statistically 30
The Avenue Transfer System n Avenue Project initiated by CMU LTI Group n Grammar formalism, which allows one to manually create a parallel grammar between two languages and n Transfer engine, which transfers the source sentence into possible target sentence(s) using this parallel grammar 31
Overview of Our Approach Turkish sentence Morphological Analyzer Analysis Preprocessor Lattice Transfer rules Avenue Transfer Engine. . . English translations English Language Model Most probable English translation 32
I. Analysis and Preprocessing Morphological analyses of each word: A set of features, describing the structural properties of the word adam evde oğlunu yendi 33
I. Analysis and Preprocessing Lattice representation of the sentence ada+N+P 1 Sg 0 ev+N+Loc 1 2 oğul+N+P 2 Sg ye+V 4 3 yen+N adam+N+PNon oğul+N+P 3 Sg +Pass+V+Past Zero+V+Past 5 yen+V+Past 34 6
I. Analysis and Preprocessing Representation of IGs 35
II. Transfer and Generation 36
II. Transfer and Generation 37
II. Transfer and Generation N N N V 38
II. Transfer and Generation N N N V adam evde oğlunu yendi N V N N man won son house 39
II. Transfer and Generation NP N N V adam evde oğlunu yendi the N V N N man won son house 40
II. Transfer and Generation SUBJ NP NP N N N V adam evde oğlunu yendi the N V N N man won son house 41
II. Transfer and Generation SUBJ NP NP N N NP N V adam evde oğlunu yendi the NP N V N man won son the N house 42
II. Transfer and Generation SUBJ Adjct SUBJ NP NP NP N N N V adam evde oğlunu yendi the Adjct at N V N man won son the NP N house 43
II. Transfer and Generation SUBJ Adjct SUBJ NP NP NP N N N Adjct NP V adam evde oğlunu yendi the NP N V man won his N son at the NP N house 44
II. Transfer and Generation SUBJ NP NP NP N N N V adam evde oğlunu yendi the OBJ NP SUBJ Adjct OBJ NP N V man won his N son Adjct at the NP N house 45
II. Transfer and Generation SUBJ Adjct OBJ NP NP NP Vc N N N OBJ SUBJ V adam evde oğlunu yendi NP the Vc N V man won NP his N son Adjct at the NP N house 46
II. Transfer and Generation SUBJ Adjct OBJ Vfin NP NP NP Vc N N SUBJ Vfin N V adam evde oğlunu yendi NP the Vc N V man won OBJ NP his N son Adjct at the NP N house 47
II. Transfer and Generation S S SUBJ Adjct OBJ Vfin NP NP NP Vc N N SUBJ Vfin N V adam evde oğlunu yendi NP the Vc N V man won OBJ NP his N son Adjct at the NP N house 48
II. Transfer and Generation S SUBJ Adjct OBJ Vfin S SUBJ Vfin OBJ Adjct 49
II. Transfer and Generation Adjunct NP Adjunct at NP {Adjunct, 3} Adjunct: : Adjunct : [NP] -> ["at" NP] ( (x 1: : y 2) (x 0 = x 1) ((x 1 CASE) =c Loc) ((x 1 poss) =c yes) (y 0 = x 0) ) 50
II. Transfer and Generation Vfin Vc ; ; yendi -> won {Vc, 2} Vc: : Vc : [V] -> [V] ( (x 1: : y 1) ; Analysis (x 0 = x 1) ; Constraints ((x 1 lex) =c (*or* “yen". . . ) ((x 0 casev) <= Acc) ((x 0 trans) <= yes) Vc ; Transfer ((y 1 TENSE) = (x 1 TENSE)) ((y 1 AGR-PERSON) = (x 1 AGR-PERSON)) ((y 1 AGR-NUMBER) = (x 1 AGR-NUMBER)) ((y 1 POLARITY) = (x 1 POLARITY)) ; Generation (y 0 = y 1) ) 51
III. Decoding Transfer engine outputs n translations T 1, . . . , Tn We use an English language model to calculate probability of each translation, and pick the one with highest language model score 52
III. Decoding 53
III. Decoding Translation Log Probability My island beat your son at home -29. 5973 My island beat his son at home -27. 1953 The man beat your son at home -23. 7629 The man beat his son at home -26. 1649 54
Outline n Machine Translation (MT) Motivation ¨ Challenges in MT ¨ History of MT ¨ Classical Approaches to MT ¨ n The Hybrid Approach Challenges ¨ Translation Steps ¨ n n Analysis and Preprocessing Transfer and Generation Decoding Evaluation Methods ¨ Experimental Results ¨ Examples ¨ n Conclusions 55
Evaluation 56
MT Evaluation • Manual evaluation: • SSER (subjective sentence error rate) • Correct/Incorrect • Manual evaluations require human effort and time • Automatic evaluation: • WER (word error rate) • BLEU (Bilingual Evaluation Understudy) • METEOR 57
Automatic Evaluation Word Error Rate (WER) Number of insertions, deletions, and substitutions required to transform the reference translation into the system translation BLEU Number of common n-grams of words between the system translation S and a set of reference translations METEOR Similar to BLEU, considers roots and synonyms 58
Experimental Results n n System contains over 200 transfer rules, and 20000 lexical rules It can parse and translate challenging sentences Translations are sound, but not complete We tested the system on 192 noun phrases, and 70 sentences. BLEU Score for noun phrases: BLEU Score for sentences: 60. 38 33. 17 59
Examples Noun phrase: Translation: Reference: siyahlarla birlikte bir protesto yürüyüşünde in a protest walk with the blacks Noun phrase: Translation: Reference: Elif 'in arkasındaki kapıda at the door at the back of Elif on the door behind Elif Noun phrase: Translation: Reference: alışveriş dünyasında in the shopping world at the shopping world 60
Examples Sentence: Bu tutku zamanla bana acı vermeye başladı Translation: This passion began to give pain to me with time Reference: In time this passion began to give me pain Sentence: Perşembe uzun yürüyüşler ve ziyaretler yapıyorum Translation: I am doing long walks and visits on Thursday Reference: On Thursdays I take long walks and make visits Sentence: Kaçtıkça daha büyüdü, bir tutku oldu Translation: It grew more as escaping, it became a passion Reference: He grew as he ran away, became an obsession 61
Conclusions & Future Work n A hybrid machine translation system from Turkish to English ¨ wide linguistic coverage by manually-crafted transfer rules in Avenue ¨ ambiguities handled by English language model ¨ computationally inefficient translation ¨ time-consuming development n Future work ¨ further improvement of transfer rules ¨ learning rules automatically from parallel corpus 62
Thank you! 63
a51d569b033f5f0dfda331312ab0c713.ppt