data:image/s3,"s3://crabby-images/68abb/68abb56c7f787cd2955a41f2e3ff1d7b7c5854f2" alt="Скачать презентацию Machine Translation MT Introduction n n sub-field Скачать презентацию Machine Translation MT Introduction n n sub-field"
Machine Translation mine.ppt
- Количество слайдов: 15
Machine Translation MT
Introduction n n sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another (http: //en. wikipedia. org/) Use: translation of large amount of date in the shortest possible time n n n Standard documents Instructions and manuals Web sites, multilingual search Reference information(addresses, recipes, etc. ) Aim: to understand the main contents of the document in a foreign language unknown to the user NOT to be used instead of human translation !!!
Approaches to machine translation Rule-based approach n Statistical n Example-based approach n n Hybrid machine translation
Rule-based translation n Stages ü Morphological analyses of source language ü Parsing source language (syntactic groups) ü Getting syntactic information about each word ü Dictionary based translation example: A girl eats an apple. (Eng. -Ger. ) n stages of translation: 1 st: getting basic part-of-speech n n n information of each source word: a = ind. art. ; girl = n. ; eats = v. ; an = ind. art. ; apple = n. 2 nd: getting syntactic information about the verb “to eat”: here: eat – Pr. Simple, 3 rd Pers. Sing. , Act. V. 3 rd: parsing the source sentence: (an apple) = the object of eat 4 th: translate English words into Germana (category = indef. article) => ein (category = indef. article)girl (category = noun) => Mädchen… 5 th: finding appropriate inflected forms: A girl eats an apple. => Ein Mädchen isst einen Apfel.
Statistical translation n Translations are generated according to probability distribution on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora Benefits n n n Better use of resources More natural translations No programmers or linguists* involved Shortcomings n n n Corpus creation can be costly for users with limited resources. The results are unexpected. Superficial fluency can be deceiving. Statistical machine translation does not work well between languages that have significantly different word orders
Статистический перевод Основа - параллельный корпус n Вероятности назначаются подсчетом наиболее вероятного варианта перевода n Оценки вероятности зависят от объема и качества обучающего корпуса n Лингвистическая информация: разбиение на предложения, графематический анализ, морфология n При наличии корпуса простейшая система перевода может быть сделана на 2 недели n
Rule-based vs. statistical news: document:
Rule-based translation Types v Dictionarybased (direct) v Transfer-based v Interlingual
Dictionary-based (direct) word by word translation n with or without morphological analysis or lemmatisation Application translation of long lists of phrases on the subsentential (i. e. , not a full sentence) level, e. g. lists, inventories or simple catalogs of products and services. n
Direct translation example
Transfer-based machine translation 1. Analyzing the input text for morphology and syntax (and sometimes semantics) Sentence in a source language analysis Source language structure 2. Creating an internal representation 3. Generating translation using both bilingual dictionaries and grammatical rules transfer Target language structure synthesis Sentence in a target language
Interlingua machine translation n n the source language is transformed into an interlingua, i. e. , an abstract languageindependent representation the target language is generated from the interlingua.
Transfer vs. interlingua
Hybrid machine translation n method of machine translation characterized by the use of multiple approaches within a single machine translation system. Types: n RBMT guided by statistics n Statistical method guided by RBMT
MT software Name Platform Freeware/commercial Type Google Translate Cross-platform (Web application) Freeware Statistical Commercial Hybrid rules-based and SMT SYSTRAN Promt Cross-platform (Web application) Cross-platform
Machine Translation mine.ppt