6dbfaf616ad29cf667c31517eb01ecdc.ppt
- Количество слайдов: 37
An example of a good translation En inbyggd oljepump levererar olja under tryck både till hydraulsystemet och växellådans oljesystem. An integrated oil pump delivers pressurised fluid both to the hydraulic system and to the lubrication system of the gearbox. @ Anna Sågvall Hein 2005
An example of a bad translation Stackars Kalle var rädd. -> Wretched cold each cautious. @ Anna Sågvall Hein 2005
Fundamental problems in MT • lexical ambiguity in SL • translation ambiguity • grammatical differences between SL and TL @ Anna Sågvall Hein 2005
Lexical ambiguity in SL • form – hus (sg/pl, basic/genitive case) • part-of-speech – var (verb, pronoun, adverb, noun) • polysemy – fil (milk, tool, traffic lane) @ Anna Sågvall Hein 2005
Handling form and part-of-speech ambiguity in SL Syntactic analysis of the input sentence • Han köpte ett nytt hus (sg, basic case). • Stackars Kalle var (verb) rädd. @ Anna Sågvall Hein 2005
Handling polysemy • domain • context – rules based on grammatical analysis • anta en elev (obj) –> admit a student • anta, att (compl) -> suppose that • examples @ Anna Sågvall Hein 2005
Handling translation ambiguity • rules based on grammatical analysis • bilen på gatan -> the car on the street (lokationsattribut) • taket på huset -> the roof of the house (partonymiattribut) • examples @ Anna Sågvall Hein 2005
Grammatical differences • morphology – Hon köpte en liten hund. -> Sie hat einen kleinen Hund gekauft. • syntax – Genom att svänga till vänster hittar du huset. -> Turning left you will find the house. • word order – Sedan gick han hem. -> Then he went home. @ Anna Sågvall Hein 2005
Handling grammatical differences • syntactic-semantic analysis + transfer rules • deep analysis (interlingua) + generation according to TL grammar • examples @ Anna Sågvall Hein 2005
Re-use techniques • sentence alignment – linking source and target sentences pairwise – success rate close to 100 % – translation memories – basis for word alignment @ Anna Sågvall Hein 2005
Sentence alignment I oljefilterhållaren sitter en överströmningsventil. The oil filter retainer has an overflow valve. (sventscan 3888 1 -1) Undvik hudkontakt med kylvätska. Hudkontakt kan medföra irritation. Avoid contact with the skin as this may cause irritation. (sventscan 3200 2 -1) @ Anna Sågvall Hein 2005
Sentence alignment, cont. Skruvarna sträcks vid varje åtdragning, därför får skruvarna i en del förband återanvändas endast ett visst antal gånger. Bolts are stretched each time they are tightened. For this reason, the bolts in some joints should only be reused a certain number of times. (sventscan 783 1 -2) @ Anna Sågvall Hein 2005
Re-use techniques, cont. • word alignment – linking sub-sentence segments, typically, source and target words and phrases, pair-wise – co-occurrence, word similarity, dictionary – large-scale processing – success rate close to 80 % – translation dictionaries – bi- or multi-lingual term databases – data-driven machine translation @ Anna Sågvall Hein 2005
A word alignment example Jag tar mittplatsen, som jag inte tycker om. I take the middle seat, which I dislike. jag – I tar – take mittplatsen – the middle seat som – which jag – I inte tycker om – dislike (from Tiedemann 2003) @ Anna Sågvall Hein 2005
Evaluation of MT • human – adequacy – acceptance • automatic • comparison with a gold standard • n-gram technique: e. g. BLEU, NEVA • edit distance See further http: //stp. ling. uu. se/~evafo/gslt_eval. pdf (OH-presentation by Eva Forsbom) @ Anna Sågvall Hein 2005
Automatic evaluation, ex. 1 SL: Framställningsmetod och särskild beredningsmetod: En hög kvalitet på råvaran komjölk är viktig för tillverkningen. MT: Manufacturing method and special manufacturing method: A high quality of the raw material cow's milk is important to the production. Ref: Specific production or manufacturing method: High -quality cow's milk is important to production. NEVA: 0, 27 @ Anna Sågvall Hein 2005
Automatic evaluation, ex. 2 SL: Mjölkråvaran som används för ystning pastöriseras till 72 ºC i 15 sekunder. MT: The milk that is used for coagulation is pasteurised to 72 ºC for 15 seconds. Ref: The milk used for coagulation is pasteurised at 72 ºC for 15 seconds. NEVA: 0, 59 @ Anna Sågvall Hein 2005
Basic translation strategies • rule-based translation – direct translation – transfer-based translation – interlingua translation • datadriven translation – statistical translation – example-based translation • hybrids @ Anna Sågvall Hein 2005
Direct translation • translation proceeds word by word, or phrase by phrase • no intermediary sentence structure • the most important language component is a translation dictionary • translation problems are handled more or less ad hoc by means of specific rules @ Anna Sågvall Hein 2005
Simplistic direct approach • • sentence splitting tokenisation handling capital letters dictionary look-up and lexical substitution incl. heuristics for handling ambiguities • copying unknown words, digits, signs of punctuation etc. • formal editing @ Anna Sågvall Hein 2005
Advanced direct approach (Tucker 1987) • • • source text dictionary look-up and morphological analysis identification of homographs identification of compound nouns identification of nouns and verb phrases processing of idioms @ Anna Sågvall Hein 2005
Advanced approach, cont. • • • processing of prepositions subject-predicate identification syntactic ambiguity identification synthesis and morphological processing of TL rearrangement of words and phrases in TL @ Anna Sågvall Hein 2005
Feasibility of direct translation • quality – typically browsing quality – depends on • the quality of the translation dictionary • the coverage of the translation rules – editing quality may be achieved • problems with – ambiguity – inflection – word order – other structural differences @ Anna Sågvall Hein 2005
SYSTRAN • SYStem TRANslation • advanced direct translation (moving towards transfer-based translation) • http: //babelfish. altavista. com/ • http: //www. systranet. com/systran/net ) @ Anna Sågvall Hein 2005
EC Systran • 1, 600, 000 dictionary units – 20 domain dictionaries • daily use by EC translators, administrators of the European institutions @ Anna Sågvall Hein 2005
Ex. 1: fairly good translation "Enskilda företagare som inte bildat bolag klassificeras hit. " "Individual entrepreneurs that have not formed companies are classified here. ” Systemet känner igen bildat som en perfektform och översätter korrekt have formed, trots att hjälpverbet är utelämnat. Negationen not placeras på rätt plats. @ Anna Sågvall Hein 2005
Ex. 2: word order problem/ Systran sv-en "När byarna kontaktades hade de inte ens utsatts för influensa. " "When the villages were contacted had they not even been exposed to flu. ” Systemet hittar inte subjekt och predikat och ger därför fel ordföljd. @ Anna Sågvall Hein 2005
Ex. 3: ambiguity problem "Vad kan vi lära av Arrawetestammen? " "What can we faith of the Arawete? ” Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb. @ Anna Sågvall Hein 2005
Ex. 4: ambiguity problem ”Extrapoleringen går till så här. " ”The extrapolation goes to so here. ” Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord. @ Anna Sågvall Hein 2005
Transfer-based translation • intermediary sentence structure • provides a basis for the systematic handling of grammatical problems and some types of lexical choices • basic processes – analysis – transfer – generation (synthesis) @ Anna Sågvall Hein 2005
Transfer-based translation, cont. • knowledge-intensive • language modules – dictionary and grammar of SL – transfer dictionary and transfer rules – dictionary and grammar of TL @ Anna Sågvall Hein 2005
Multra • transfer-based translation engine • transfer via grammatical relations – TL word order not inherited from SL • • modular unification-based focus on restricted domains developped at Uppsala University @ Anna Sågvall Hein 2005
An example Sv. I oljefilterhållaren sitter en överströmningsventil. En. The oil filter retainer has an overflow valve. (from the Scania corpus) transfer rule: sitter has, adv subj, subj obj @ Anna Sågvall Hein 2005
Interlingua translation • analysis of SL sentence into a languageindependent meaning representation, an interlingua – ideally, no trace of the SL structure in the interlingua • generation of TL sentence from the interlingua @ Anna Sågvall Hein 2005
Statistical machine translation • translation model based on word alignment • language model based on n-grams • decoding algorithm – selecting the most probable combination of alternatives in the translation model and the language model @ Anna Sågvall Hein 2005
Statistical MT on the market Language Weaver http: //www. languageweaver. com/ @ Anna Sågvall Hein 2005
Example-based machine translation • non-trivial use of translation examples in the translation process • preliminary definition – alignment of texts – matching of input sentences against phrases (examples) – selection and extraction of equivalent TL phrases – adaptation and combination of TL phrases as acceptable output sentences (from Hutchins, J. , Towards a definition of example-based machine translation. Proc. of Workshop: Example. Based Machine Translation. MT SUMMIT X. Phuket. Thailand. 2005) @ Anna Sågvall Hein 2005
6dbfaf616ad29cf667c31517eb01ecdc.ppt