5c0b5fbddd94eeadd093889f06b50efd.ppt
- Количество слайдов: 1
Structural Phrase Alignment Based on Consistency Criteria Core Steps of Alignment Flow of Our EBMT System • Searching Correspondence Candidates Translation Examples Input 交差 (cross) 交差点に入る時 私の信号は青でした。 交差 (cross) 点に (point) at me 突然 (suddenly) The light to remove 入る (enter) 時 (when) 二百十六万 → 2, 160, 000 ← 2. 16 million entering a house 私 の (my) • Numeral normalization when entering 私 の (my) ローズワイン → rosuwain ⇔ rose wine (similarity: 0. 78) 新宿 → shinjuku ⇔ shinjuku (similarity: 1. 0) was green when 脱ぐ (put off) • Bilingual dictionaries • Transliteration (Katakana words, NEs) traffic at the intersection 家に (house) 信号 は (signal) 青 (blue) でした 。 (was) my from the side 飛び出して 来た のです 。 (rush out) 入る (enter) 時 (when) – Fine alignment is efficient in translation – Search candidates as much as possible using variety of linguistic information came 点 で 、(point) • Japanese flexible matching (Odani et. al. 2007) • Substring co-occurrence measure (Cromieres 2006) the intersection my signature サイン (signal) 信号 は (signal) 青 (blue) でした 。 (was) Language Models traffic Output My traffic light was green when entering the intersection. The light was green Toshiaki Nakazawa, Kun Yu, Sadao Kurohashi (Graduate School of Informatics, Kyoto University) {nakazawa, kunyu}@nlp. kuee. kyoto-u. ac. jp kuro@i. kyoto-u. ac. jp • Selecting Correspondence Candidates – More candidates derive more ambiguities and improper alignments – Necessity of robust alignment method which can align parallel sentences consistently by selecting the adequate candidates set Selecting Correspondence Candidates Using Consistency Score and Dependency Type Ambiguities! 日本 で you (in Japan) Near! will have to file 保険 (insurance) insurance 会社 に 対して Far! (to company) Far! an claim 保険 (insurance) 1/1+1/2=1. 5 insurance 請求 の (claim) with the office 申し立て が Near! (instance) 可能ですよ (you can) Improper alignments! baseline in Japan How to reflect the inconsistency? Japanese English predicate: level C S / SBAR / SQ … 5 predicate: level B+/B 5 VP / WHADVP 4 predicate: level B-/A 4 WHADJP case no / rentai 2 Inside clause J-Side Distance E-Side Distance Consistency Score 6 1 ADVP / ADJP NP / PP / INTJ Frequency (log) 3 Others 1 Dependency Type Distance 3 デ格 日本 で [case “de”] Dist of E-Side Distribution of the distance of alignment pairs in hand-annotated data (Mainichi newspaper 40 K sentence pairs) [Uchimoto 04] 保険 1 [inside clause] 文節内 2 ノ格 [case “ga”] (instance) 可能です よ J-Side Distance Experimental Result 1 NN insurance 3 NP an claim 保険 1 NN (claim) Pair 2: (Ds, Dt) = (1, 7) Negative Score insurance 3 PP with the office 3 PP in Japan Quality of Other Language Pairs 500 test sentences from Mainichi newspaper parallel corpus Bilingual dictionary: KENKYUSYA J-E/J-E 500 K entries Evaluation criteria: Precision / Recall / F-measure Character-base for Japanese, word-base for English Rec 64. 32 66. 90 69. 14 71. 31 33. 15 89. 80 (you can) will have to file 請求 の 3 ガ格 申し立て が E-Side Distance 3 NP you (insurance) [case “no”] Pre 77. 47 80. 30 80. 77 82. 48 60. 19 95. 58 Pair 1: (Ds, Dt) = (1, 1) Positive Score [renyou] Consistency Score Function * Using 300 K newspaper domain bi-sentences for training (insurance) [inside clause] 3 に 対して 連用 会社company) (to Score “Near-Near” pair → Positive Score “Far-Far” pair → 0 “Near-Far” pair → Negative Score Baseline +Consistency Score Proposed(+CS, +Dpnd. Type) Filtering (80%) Moses (SMT Toolkit)* Manual (upper bound) (in Japan) 1 文節内 Dist of J-Side • • QP / PRT / PRN predicate: level AOthers 3 F 70. 29 72. 99 74. 51 76. 49 42. 75 92. 60 HLT-NAACL 2003 ACL 2005 (Gildea, 2003) GIZA++ English. French 5. 71 15. 89 English. Romanian 28. 86 26. 55 27. 19 English. Korean 32 35 (AER) Conclusion • Proposed a new phrase alignment method using consistency criteria. • Enough alignment accuracy compared to other language pairs. • We need to acquire the parameters automatically by machine learning. • We are planning to evolve the framework which revises the parse result. (There is a translation demos in exhibition corner by NICT which is using our system!)
5c0b5fbddd94eeadd093889f06b50efd.ppt