Machine Translation MT Introduction sub-field of computational linguistics
Machine Translation MT
Introduction sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another (http://en.wikipedia.org/) Use: translation of large amount of date in the shortest possible time Standard documents Instructions and manuals Web sites, multilingual search Reference information(addresses, recipes, etc.) Aim: to understand the main contents of the document in a foreign language unknown to the user NOT to be used instead of human translation !!!
Approaches to machine translation Rule-based approach Statistical Example-based approach Hybrid machine translation
Rule-based translation Stages Morphological analyses of source language Parsing source language (syntactic groups) Getting syntactic information about each word Dictionary based translation example: A girl eats an apple. (Eng.-Ger.) stages of translation: 1st: getting basic part-of-speech information of each source word: a = ind.art.; girl = n.; eats = v.; an = ind.art.; apple = n. 2nd: getting syntactic information about the verb “to eat”: here: eat – Pr. Simple, 3rd Pers. Sing., Act. V. 3rd: parsing the source sentence:(an apple) = the object of eat 4th: translate English words into Germana (category = indef.article) => ein (category = indef.article)girl (category = noun) => Mädchen… 5th: finding appropriate inflected forms: A girl eats an apple. => Ein Mädchen isst einen Apfel.
Statistical translation Translations are generated according to probability distribution on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora Benefits Better use of resources More natural translations No programmers or linguists* involved Shortcomings Corpus creation can be costly for users with limited resources. The results are unexpected. Superficial fluency can be deceiving. Statistical machine translation does not work well between languages that have significantly different word orders
Статистический перевод Основа - параллельный корпус Вероятности назначаются подсчетом наиболее вероятного варианта перевода Оценки вероятности зависят от объема и качества обучающего корпуса Лингвистическая информация: разбиение на предложения, графематический анализ, морфология При наличии корпуса простейшая система перевода может быть сделана на 2 недели
Rule-based vs. statistical news: document:
Rule-based translation Types Dictionary-based (direct) Transfer-based Interlingual
Dictionary-based (direct) word by word translation with or without morphological analysis or lemmatisation Application translation of long lists of phrases on the subsentential (i.e., not a full sentence) level, e.g. lists, inventories or simple catalogs of products and services.
Direct translation example
Transfer-based machine translation 1. Analyzing the input text for morphology and syntax (and sometimes semantics) 2. Creating an internal representation 3. Generating translation using both bilingual dictionaries and grammatical rules Sentence in a source language Source language structure Sentence in a target language Target language structure analysis transfer synthesis
Interlingua machine translation the source language is transformed into an interlingua, i.e., an abstract language-independent representation the target language is generated from the interlingua.
Transfer vs. interlingua
Hybrid machine translation method of machine translation characterized by the use of multiple approaches within a single machine translation system. Types: RBMT guided by statistics Statistical method guided by RBMT
MT software
14754-machine_translation_mine.ppt
- Количество слайдов: 15