Скачать презентацию Ministère de l Education Nationale de l Enseignement Supérieur et Скачать презентацию Ministère de l Education Nationale de l Enseignement Supérieur et

452ff9d369b5483f1ee26497bf3904d6.ppt

  • Количество слайдов: 14

Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director « Information & Communication Technologies » Department French Ministry of Research May 28, 2006 Cocosda / WRITE Workshop

Support to LT: Techno-langue • Report to the Prime Minister (November 2000) • Techno-langue Support to LT: Techno-langue • Report to the Prime Minister (November 2000) • Techno-langue Action – Language technology survey and evaluation • Articulate with related existing programs – ICT Research & Innovation Technological Networks (RRIT) • Telecommunications, Software Engineering, Audiovisual & Multimedia – Ministry of Research action on Business Intelligence Tools (VSE) May 28, 2006 Cocosda / WRITE Workshop 2

Techno-langue structure TELECOM SOFT AMM VSE Infrastructure program to support core LT progress, while Techno-langue structure TELECOM SOFT AMM VSE Infrastructure program to support core LT progress, while innovative application projects stay with RRIT (110 M€ / year) May 28, 2006 Cocosda / WRITE Workshop 3

Techno-langue Call – – Language resources (data, tools) Evaluation (technology, application) Standards Technological survey Techno-langue Call – – Language resources (data, tools) Evaluation (technology, application) Standards Technological survey May 28, 2006 Cocosda / WRITE Workshop 4

Techno-langue Call • • Launched in 2002, 3 year duration Funding by 3 ministries Techno-langue Call • • Launched in 2002, 3 year duration Funding by 3 ministries (Research, Industry, Culture) Same on Vision Technology (Techno-vision) in 2005 (Mo. D) International cooperation – Foreign entities may participate in the projects, with their own funding • All funded projects completed in 2006 – – Joint Techno-X workshop (ASTI conference, October 2005) Paper at LREC’ 2006 (S. Chaudiron, J. Mariani) + 16 papers Book under preparation Public presentation of results (Fall 2006) • Feedback to research and industry (RRIT, VSE/Business Intelligence) • Presentation to administration Agencies (Do. D, MAE…) • LT in 2006 « Data Masses and Ambient Intelligence » Cf. P – Managed by ANR – 3 M€ funding for LT May 28, 2006 Cocosda / WRITE Workshop 5

Results of the Call • • • 52 proposals submitted 21 projects funded 94 Results of the Call • • • 52 proposals submitted 21 projects funded 94 participants – – 33 industry 39 public research 11 other categories (Associations, CEA, French Do. D…) 11 foreign (Bell Labs (USA), NII (Japan), EPFL, LATL…) • Budget: 20 M€ effort- 7. 5 M€ public funding (3 years) • Special attention to the distribution of Language Resources and Evaluation packages May 28, 2006 Cocosda / WRITE Workshop 6

21 funded projects • • 10 on Language Resources (data and tools) 2 on 21 funded projects • • 10 on Language Resources (data and tools) 2 on Standards (Spoken / Written) : support to ISO TC 37 -SC 4 1 on Technological survey (Portal) : http: //www. technolangue. net 8 on Technology Evaluation – Written language processing (5) • • • EASY: Syntactic parsing ARCADE 2: Text alignment CESART: Terminology extraction EQUER: Information query CESTA: Machine translation – Spoken Language processing (3) • EVASY: Speech synthesis • MEDIA: Spoken dialog • ESTER: Speech transcription / automatic indexing May 28, 2006 Cocosda / WRITE Workshop 7

ESTER • Task: «Rich» speech transcription and indexing evaluation – Broadcast news data in ESTER • Task: «Rich» speech transcription and indexing evaluation – Broadcast news data in French (radio/TV) • 100 h manually transcribed (1 MW, 350 speakers) + 1600 h untranscribed • Second largest worldwide – 13 participants (3 companies) • • • Written transcription (RT / non RT) Segmentation (sound, speaker recognition / diarization) Named Entity recognition (from speech / transcribed text) Topic detection and tracking for indexing : postponed Final internal Workshop in March 2005 Distribution of Evaluation Package – Development and Test data, scoring, results. Data used in EASY. • Workshop for linguists in May 2005 – Data and tools available, Results – Open issues necessitating Basic Scientific Research investigations May 28, 2006 Cocosda / WRITE Workshop 8

LT for a Multilingual Europe • Language as a specific issue for Europe – LT for a Multilingual Europe • Language as a specific issue for Europe – Economical, cultural and political challenge with 2 dimensions: – A) Preserve the EU Member States cultures • Preference for native language (Web sites in German (75%). . . ) • 50% of European citizens only speak one language • (3% of Japanese people speak a foreign language) – B) Allow for communication across member states • 1170 translators at the EC - 1. 3 Mpages translated in 2001 • 30% European Parliament budget (300 M€) – 500 translators • EU: 25 countries, 20 languages / 380 language pairs – Enormous cost for the EU, while mandatory – Need for the assistance of Language Technologies • Huge effort (# LT * # languages), too large for the EC alone • Should be shared with EU Member States (subsidiarity) May 28, 2006 Cocosda / WRITE Workshop 9

Language Technologies EU Program • European Research Area (ERA) – Coordinate EC (< 15%) Language Technologies EU Program • European Research Area (ERA) – Coordinate EC (< 15%) and MS (> 85%) research efforts – ERA-Net initiative in FP 6 to coordinate MS national programs • LT well fitted with ERA – EC prime responsibility : • the coordination: management, standards, technology evaluation, communication. . . • the development cost of generic Language Technologies: – Speech recognition, synthesis, understanding, spoken dialog, language tagging, parsing, analysis, generation, text retrieval, document understanding, machine translation. . . – Each Member State would primarily have the responsibility of ensuring a proper coverage of its language(s): • Language Resources (essential) : (annnotated) corpus (spoken / written), lexicon (including pronunciations), dictionaries… • Language specific technology development/adaptation May 28, 2006 Cocosda / WRITE Workshop 10

Lang-Net proposal • Build-up ERA-Net proposal of infrastructural nature – Language Resources, LT evaluation, Lang-Net proposal • Build-up ERA-Net proposal of infrastructural nature – Language Resources, LT evaluation, Standards, Survey • • Share of information Strategic activities and Best Practice Implementation of joint activities Transnational research activities – Identify EU countries or regions having similar programs • 11 countries / regions in partnership : Germany, France, Italy, Trento region, Czech Republic, Denmark, Norway, The Netherlands / Belgium-Flanders (Dutch Language Union), Spain, Basque region, Sweden • Austria, Catalonia, Finland, Greece, Iceland, Portugal, Switzerland, UK (contacts) – Extendable to other partners • NMS (Slovenia, Cyprus, Poland, Hungary, Malta, Baltic countries…) • AS (Romania, Bulgaria…) • USA, Japan, South Africa, Israel, Canada… (contacts) May 28, 2006 Cocosda / WRITE Workshop 11

Joint LT program proposal • DG Research (ERA-Net program) – Lang-Net proposal submitted in Joint LT program proposal • DG Research (ERA-Net program) – Lang-Net proposal submitted in march 2005, not selected – Look forward for Thematic ERA-Net+ in FP 7 • DG INFSO + Media – «Science & Technology Forum on Multilingualism» • June 2005 and February 2006 in Luxembourg • DG Education, culture and multilingualism – « A new framework strategy for multilingualism » (Nov. 2005) • • http: //europa. eu. int/languages/ Web site in the 20 EU languages EC will set up a High Level Group on Multilingualism A EU ministerial conference will be held Further communication will be presented by EC to Parliament and Council – Committee of the regions (use of regional Spanish languages) • TC-Star report : Introduction signed by V. Reding & J. Figel May 28, 2006 Cocosda / WRITE Workshop 12

French support to LT in FP 7 • Visit of a French delegation to French support to LT in FP 7 • Visit of a French delegation to EC E Directorate – H. Forster & B. Smith (September 2005) • • French Memorandum for a Digital Europe (i 2010) European Digital Library EU ICT Directors meeting (Vienna, March 2006) FP 7 ICT program (2007 -2013) – Technology pillar : Simulation, Visualization, Interaction, mixed realities • « Multilingual and automatic machine translation systems » – Replace / add LT • « Language technology, including multilingual and automatic MT systems » – FP 7 Budget reduction (12 B€ to 9 B€ for ICT) • «language-enabled … interaction & communication» May 28, 2006 Cocosda / WRITE Workshop 13

LT in FP 7 • Article 169 large (several 100 M€) EC + MS LT in FP 7 • Article 169 large (several 100 M€) EC + MS + industry program) on LT in FP 7 ? • Present topics: SMEs, Metrology, Research in the Baltic sea… • Joint support to LT in FP 7 from MS May 28, 2006 Cocosda / WRITE Workshop 14