Скачать презентацию Search Applications Machine Translation Next time Constraint Satisfaction Скачать презентацию Search Applications Machine Translation Next time Constraint Satisfaction

c31854d1830bd85220559830b39cccef.ppt

  • Количество слайдов: 33

Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for next time: Chapter 5

Homework Questions? ¡ 2 Homework Questions? ¡ 2

Agenda ¡ Introduction to machine translation Statistical approaches ¡ Use of parallel data ¡ Agenda ¡ Introduction to machine translation Statistical approaches ¡ Use of parallel data ¡ Alignment ¡ ¡ What functions must be optimized? ¡ Comparison of A* and greedy local search (hill climbing) algorithms for translation How they work ¡ Their performance ¡ 3

Approach to Statistical MT ¡ Translate from past experience ¡ Observe how words, and Approach to Statistical MT ¡ Translate from past experience ¡ Observe how words, and phrases, and sentences are translated ¡ Given new sentences in the source language, choose the most probable translation in the target language ¡ Data: large corpus of parallel text ¡ E. g. , Canadian Parliamentary proceedings 4

Data ¡ Example ¡ ¡ ¡ Quantity ¡ ¡ Ce n’est pas clair. It Data ¡ Example ¡ ¡ ¡ Quantity ¡ ¡ Ce n’est pas clair. It is not clear. 200 billion words (2004 MT evaluation) Sources ¡ ¡ Hansards: Canadian parliamentary proceedings Hong Kong: official documents published in multiple languages Newspapers published in multiple languages Religious and literary works 5

Alignment – the first step ¡ ¡ Which sentences or paragraphs in one language Alignment – the first step ¡ ¡ Which sentences or paragraphs in one language correspond to which paragraphs or sentences in another language? (Or what words? ) Problems Translators don’t use word for word translations ¡ Crossing alignments ¡ ¡ Types of alignment 1: 1 (90% of the cases) ¡ 1: 2, 2: 1 ¡ 3: 1, 1: 3 ¡ 6

With regard to Quant aux According to [the] mineral waters and [(les) eaux minerales With regard to Quant aux According to [the] mineral waters and [(les) eaux minerales et [our survey, ] 1988 the lemonades-soft drinks aux limonades], they encounter [elles rencontrent [sales] of still more toujours plus [mineral water users. Indeed d’adeptes. ] En effet and soft drinks] were our survey [notre sondage] [much higher] makes standout fait ressortir [than in 1987, ] the sales [des ventes] reflecting clearly [nettement [The growing popularity] superior Superieures] Of these products. to those in 1987 [a celles de 1987] [Cola drink] manufacturers for cola-based drinks Pour [les boissons a base de cola] [in particular] especially notamment Achieved above Average growth rates An example of 2: 2 alignment 7

¡ Fertility: a word may be translated by more than 1 word Notamment -> ¡ Fertility: a word may be translated by more than 1 word Notamment -> in particular (fertility 2) ¡ Limonades -> soft drinks ¡ ¡ Fertility 0: A word translated by 0 words Des ventes -> sales ¡ Les boissons a base de cola -> cola drinks ¡ ¡ Many to many: ¡ Elles rencontrent toujours plus d’adeptes -> The growing popularity 8

Bead for sentence alignment ¡ A group of sentences in one language that corresponds Bead for sentence alignment ¡ A group of sentences in one language that corresponds in content to some group of sentences in the other language ¡ Either group can be empty ¡ How much content has to overlap between sentences to count it as alignment? ¡ An overlapping clause can be sufficient 9

Methods for alignment ¡ Length based ¡ Offset alignment ¡ Word based ¡ Anchors Methods for alignment ¡ Length based ¡ Offset alignment ¡ Word based ¡ Anchors (e. g. , cognates) 10

Word Based Alignment ¡ ¡ Assume first and last sentences of the texts align Word Based Alignment ¡ ¡ Assume first and last sentences of the texts align (anchors). Then until most sentences aligned: ¡ Form an envelope of alignments from the cartesian product of the list of sentences l Exclude alignments if they cross anchors or too distance Choose pairs of words that tend to occur in alignments ¡ Find pairs of source and target sentences which contain many possible lexical correspondences. ¡ ¡ The most reliable augment the set of anchors 11

The Noisy Channel Model for MT Language Model P(e) Translation Model Decoder P(f|e) e’=argmaxe. The Noisy Channel Model for MT Language Model P(e) Translation Model Decoder P(f|e) e’=argmaxe. P(e|f) Noisy Channel 12

The problem ¡ Language model constructed from a large corpus of English Bigram model: The problem ¡ Language model constructed from a large corpus of English Bigram model: probability of word pairs ¡ Trigram model: probability of 3 words in a row ¡ From these, compute sentence probability ¡ ¡ Translation model can be derived from alignment ¡ ¡ For any pair of English/French words, what is the probability that pair is a translation? Decoding is the problem: Given an unseen French sentence, how do we determine the translation? 13

Language Model ¡ Predict the next word given the previous words ¡ ¡ Markov Language Model ¡ Predict the next word given the previous words ¡ ¡ Markov assumption ¡ ¡ Only the last few words affects the next word Usual cases: bigram, trigram, 4 gram ¡ ¡ P(Wn| W 1……Wn-1) Sue swallowed the large green …. Parameter estimation Bigram: 20, 000 X 19, 000 = 400 million ¡ Trigram: 20, 0002 X 19, 000 = 8 trillion ¡ 4 gram: 20, 0003 X 19, 000=1. 6 X 1017 ¡ 14

Translation Model ¡ For a particular word alignment, multiply the m translation probabilities: P(Jean Translation Model ¡ For a particular word alignment, multiply the m translation probabilities: P(Jean aime Marie | John loves Mary) ¡ P(Jean|John)XP(aime|loves)XP(Marie|Mar y) ¡ ¡ Then sum the probabilities of all alignments 15

Decoding is NP complete ¡ When considering any word reordering Swapped words ¡ Words Decoding is NP complete ¡ When considering any word reordering Swapped words ¡ Words with fertility > n (insertions) ¡ Words with fertility 0 (deletions) ¡ Usual strategy: examine a subset of likely possibilities and choose from that ¡ Search error: decoder returns e’ but there exists some e s. t. P(e|f) > P (e’|f) ¡ 16

Example Decoding Errors ¡ Search Error Permettez que je donne un example a la Example Decoding Errors ¡ Search Error Permettez que je donne un example a la chambre. Let me give the House one example. Let me give an example in the House ¡ Model Error Vous avez besoin de toute l’aide disponible. You need all the help you can get. You need of the whole benefits available. 17

Search ¡ Traditional decoding method: stack decoder A* algorithm ¡ Deeply explore each hypothesis Search ¡ Traditional decoding method: stack decoder A* algorithm ¡ Deeply explore each hypothesis ¡ ¡ Fast greedy algorithm Much faster than A* ¡ How often does it fail? ¡ ¡ Integer Programming Method Transform to Traveling Salesman (see paper) ¡ Very slow ¡ Guaranteed to find the best choice ¡ 18

Large branching factors ¡ Machine Translation l l l Input: sequence of n words, Large branching factors ¡ Machine Translation l l l Input: sequence of n words, each with up to 200 possible target word translations. Output: sequence of m words in the target language that has high score under some goodness criterion. Search space: ¡ 6 words French sentence has 10300 distinct translation scores under the IBM M 4 translation model. [Soricut, Knight, Marcu, AMTA’ 2002] … … 19

Stack decoder: A* Initialize the stack with an empty hypothesis ¡ Loop ¡ l Stack decoder: A* Initialize the stack with an empty hypothesis ¡ Loop ¡ l l l Pop h, the best hypothesis off the stack If h is a complete sentence, output h and terminate For each possible next word w, extend h by adding w and push the resulting hypothesis onto the stack. 20

Complications ¡ It’s not a simple left-to-right translation ¡ Because we multiply probabilities as Complications ¡ It’s not a simple left-to-right translation ¡ Because we multiply probabilities as we add words, shorter hypotheses will always win ¡ ¡ Use multiple stacks, one for each length Given fertility possibilities, when we add a new target word for an input source word, how many do we add? 21

Example 22 Example 22

Hill climbing function Hill. Climbing(problem, initial-state, queuing-fn) node ← Make. Node(initial-state(problem)); while T do Hill climbing function Hill. Climbing(problem, initial-state, queuing-fn) node ← Make. Node(initial-state(problem)); while T do next ← Best(Search. Operator-fn(node, cost-fn)); if(Is. Better-fn(next, node)) then continue; else if(Goal. Test(node)) then return node; else exit; end while MT (Germann et al. , ACL-2001) return Failure; node ← target. Gloss(source. Sentence); while T do next ← Best( Locally. Modified. Translation. Of(node)); if(Is. Better(next, node)) then continue; else print node; exit; end while 23

Types of changes ¡ Translate one or two words (j 1 e 1 j Types of changes ¡ Translate one or two words (j 1 e 1 j 2 e 2) ¡ Translate and insert (j e 1 e 2) ¡ Remove word of fertility 0 (i) ¡ Swap segments (i 1 i 2 j 1 j 2) ¡ Join words (i 1 i 2) 24

Example ¡ Total of 77, 421 possible translations attempted 25 Example ¡ Total of 77, 421 possible translations attempted 25

26 26

27 27

How to search better? ¡ Make. Node(initial-state(problem)) ¡ Remove. Front(Q) ¡ Search. Operator-fn(node, cost-fn); How to search better? ¡ Make. Node(initial-state(problem)) ¡ Remove. Front(Q) ¡ Search. Operator-fn(node, cost-fn); ¡ queuing-fn(problem, Q, (Next, Cost)); 28

Example 1: Greedy Search Make. Node(initial-state(problem)) Machine Translation (Marcu and Wong, EMNLP-2002) node ← Example 1: Greedy Search Make. Node(initial-state(problem)) Machine Translation (Marcu and Wong, EMNLP-2002) node ← target. Gloss(source. Sentence); while T do next ← Best( Locally. Modified. Translation. Of(node)); if(Is. Better(next, node)) then continue; else print node; exit; end while 29

Climbing the wrong peak What sentence is more grammatical? 1. better bart than madonna Climbing the wrong peak What sentence is more grammatical? 1. better bart than madonna , i say 2. i say better than bart madonna , Model validation Can you make a sentence with these words? a and apparently as be could dissimilar firing identical neural really so things thought two Model stress-testing 30

Language-model stress-testing ¡ ¡ Input: bag of words Output: best sequence according to a Language-model stress-testing ¡ ¡ Input: bag of words Output: best sequence according to a linear combination of an l ngram LM l syntax-based LM (Collins, 1997) 31

Size: 3 -7 words long Best searched • 32. 3: i say better than Size: 3 -7 words long Best searched • 32. 3: i say better than bart madonna , Original word order • 41. 6: better bart than madonna, i say SBLM*: trained on an additional 160 k WSJ sentences. Size: 10 -25 words long Best searched • 51. 6: and so could really be a neural apparently thought things as dissimilar firing two identical Original word order • 64. 3: could two things so apparently dissimilar as a thought and neural firing really be identical 32

End of Class Questions 33 End of Class Questions 33