Скачать презентацию Machine Translation The Translator s Choice Heidi Düchting Sylke Скачать презентацию Machine Translation The Translator s Choice Heidi Düchting Sylke

eebc4bf539d485e694cd141e16a8f42e.ppt

  • Количество слайдов: 18

Machine Translation The Translator’s Choice Heidi Düchting Sylke Krämer Johann Roturier Machine Translation The Translator’s Choice Heidi Düchting Sylke Krämer Johann Roturier

Outline § Background § Challenges § Solutions § Benefits § Next steps § Conclusions Outline § Background § Challenges § Solutions § Benefits § Next steps § Conclusions

Commercial Imperatives § Effective – Time-critical documents in volume § Efficient – Translation process Commercial Imperatives § Effective – Time-critical documents in volume § Efficient – Translation process automation – Combining translation technologies Ø workflow Ø TM, MT, and PE tools § Control – Loose writing guidelines vs. Controlled Language rules Ø Improved machine translatability

Commercial Systems § Combine technologies Ø TM with previously machine-translated and post-edited segments for Commercial Systems § Combine technologies Ø TM with previously machine-translated and post-edited segments for look-up § TM systems with MT component Ø Ø Rule based and Example based Pre-translate phase Towards improved post-editing efficiency? Not available in all systems § MT systems with TM component Ø 100 % match look-up

Challenges § Setting a threshold for TM matches – 100% matches only Ø suitable Challenges § Setting a threshold for TM matches – 100% matches only Ø suitable when the objective is to provide MT output for gisting (no post-editing) Ø suitable when the MT system is fully customized and CL environment is in place (no post-editing? ) § Quick PE Ø New sentences in which only one character changes are sent to the MT engine – W 32. Beagle. AB is a mass-mailing worm that neither propagates via network shares nor deletes files – W 32. Beagle. AC is a mass-mailing worm that neither propagates via network shares nor deletes files

Solutions (1) § Two-tier process Ø Leverage Trados TM repository Ø Use MT system Solutions (1) § Two-tier process Ø Leverage Trados TM repository Ø Use MT system to translate unknown segments (Systran Premium 5. 0) Ø Use MT output as TM input § Determine the export threshold Ø Existing TM segments vs. new controlled segments – Uncontrolled: Symantec announced a patch was available – CL: Symantec announced that a patch was available

Solutions (2) § TMX format Ø obvious choice as the exchange format Ø XLIFF Solutions (2) § TMX format Ø obvious choice as the exchange format Ø XLIFF not supported by all MT systems Ø source and target segments Then the worm searches all local and network drives for. gif, . bmp, and. wav files. Then the worm searches all local and network drives for. gif, . bmp, and. wav files.

Processing TMX § Technical issues Ø TMX's various implementations can create discrepancies during the Processing TMX § Technical issues Ø TMX's various implementations can create discrepancies during the exchange process Ø Identical source and target segment Ø XML parser and TMX header § Pre and post processing with a single macro Ø Modules to remove and restitute sections Ø Environment: VBA

Pre-translation Workflow Step 1: Analyze new document Step 2: Export unmatched segments Step 3: Pre-translation Workflow Step 1: Analyze new document Step 2: Export unmatched segments Step 3: Preprocessing module Step 4: Call to MT system Step 5: Postprocessing module Step 6: Import segments into TM

Effective pre-translation § Efficiency and robustness Ø Refinable § Opportunity for modifications Ø Target Effective pre-translation § Efficiency and robustness Ø Refinable § Opportunity for modifications Ø Target segments Ø CL environment predictability Ø Frequent errors § Ideal scenario Ø Address problems that could not be fixed with CL rules

Towards Automated Post-Editing § Surface post-editing Ø Ø No linguistic analysis: no second MT Towards Automated Post-Editing § Surface post-editing Ø Ø No linguistic analysis: no second MT Text processing Frequent errors due to default MT settings Remove drudgery from post-editing § Lexical Ø Ø Capitalization (folgende vs. Folgende) Incorrect spelling (neuzustarten vs. neu zu starten) Missing contractions (à le vs. au) Extra words (fichier de. bmp vs. fichier. bmp)

Towards Automated Post-Editing § Syntactic Ø Word order: “Klicken auf Sie” vs. “Klicken Sie Towards Automated Post-Editing § Syntactic Ø Word order: “Klicken auf Sie” vs. “Klicken Sie auf” Ø Wrong structures (transfer or generation issue): neither…nor (ni ne. . ni ne) § Textual Ø Formatting: trailing spaces after symbols (backslashes) Ø Punctuation inconsistent with style guide: inverted commas for German

Towards Automated Post-Editing § Suitability of the environment ØRegular expressions support ØRE are a Towards Automated Post-Editing § Suitability of the environment ØRegular expressions support ØRE are a ‘way to describe text through pattern matching’ (Stubblebine 2003: 1) ØGrouping and Capturing: 1. Match: ([Kk]licken) (auf) (Sie) 2. Replace: 1 3 2

Content workflow Content workflow

Next steps § New environment – GMS integration Ø Centralized interface with content Ø Next steps § New environment – GMS integration Ø Centralized interface with content Ø Transport layer Ø MT as plug-in – XLIFF format Ø To machine translate unmatched segments – PE replacements Ø Fine-tune contextual replacements

Conclusions § Combining MT & TM is efficient Ø leverage Ø post-editing is not Conclusions § Combining MT & TM is efficient Ø leverage Ø post-editing is not repeated Ø increased throughput § Environment for avoiding errors Ø facilitated when CL rules are introduced Ø Scope of errors is reduced § New opportunities for translators Ø Fine-tuning MT user dictionaries Ø Refine automated PE tasks

Thank You johann_roturier@symantec. com Thank You johann_roturier@symantec. com