Скачать презентацию Language Service Management with the Language Grid NICT Скачать презентацию Language Service Management with the Language Grid NICT

5716d0ce4f891897dbfe7c9cd5cbf6b2.ppt

  • Количество слайдов: 31

Language Service Management with the Language Grid NICT Language Grid Project Yohei Murakami E-mail: Language Service Management with the Language Grid NICT Language Grid Project Yohei Murakami E-mail: [email protected] go. jp Web: http: //langrid. nict. go. jp/

Background p p Existing frameworks to combine language resources (data and tools) are constructed Background p p Existing frameworks to combine language resources (data and tools) are constructed for NLP professionals End users have difficulties while trying to combine the existing language resources and use them in real field p p p The Language Grid is a trial of service-oriented collective intelligence to share language resources worldwide. p p p Less knowledge of language resources Complex contracts and intellectual property rights Users can combine existing language resources (machine translations, morphological analyzers, dictionaries etc. ) to create customized language services. Users can create their own language resources and utilize them to further customize the language services. Public Language Grid n 120 groups from 18 countries share more than 60 language services. NICT Language Grid Project 2

The Language Grid Disaster Management Education Medical Care more Sharing Multilingual Information Universal Playground The Language Grid Disaster Management Education Medical Care more Sharing Multilingual Information Universal Playground Translation Services at Hospital Receptions Providing Language Support for Multicultural Societies Sharing language resources such as dictionaries and machine translators around the world German Research Center Kookmin University Stuttgart University for Artificial Intelligence National Institute of Princeton University National Research Council, Italy Google Inc. Informatics Chinese Academy of NICT NTT Research Labs Sciences Asian Disaster Reduction Center NECTEC Univ. of Indonesia NICT Language Grid Project 3

Service-Oriented Approach p p p Enactment of a wide variety of policies and licenses Service-Oriented Approach p p p Enactment of a wide variety of policies and licenses n Policies and licenses depend on providers n Different from content-based CI framework which relies on common license (e. g. Wikipedia) Users Access controller + Request Coordination engine Service Interface + Policy Protecting intellectual property rights of the resources n Access control based on polices Combining services freely n Any combination of services are available NICT Language Grid Project Resources + License A Resources + License B Resources + License C Resource Providers 4

Service Layers of the Language Grid Intercultural Collaboration Tools Application System Customized Multilingual Environment Service Layers of the Language Grid Intercultural Collaboration Tools Application System Customized Multilingual Environment Multilingual communication is supported using various language services. Language Services Composite Service Composite Language Services Language Resources Atomic Service Atomic Language Services (back translations, specialized translations, …. ) (back translations, domain-specific translations, …. ) (machine translations, morphological analyzers, dictionaries, parallel texts…) P 2 P Grid Infrastructure P 2 P Service Grid NICT Language Grid Project Multiple atomic language services are composed using workflows. Language resources are made usable as Web services with standardized interfaces. Cloud Services Allow users to connect to Language Grid servers on the Internet. 5

P 2 P Service Grid Language Grid Core Node Language service management, search & P 2 P Service Grid Language Grid Core Node Language service management, search & composition, and access control Language Grid Service Node Provides language resources as Web services. ① ② Japanese Morphological Analyzer Sharing Information Invoking Services ④ ⑥ ③ ⑤ Korean Life Science Morphological Dictionary Analyzer (ja, en) Ja to Ko Multi-language Glossary Translator on Natural Disasters (Ja, En, Ko, Zh, Es, Fr) NICT Language Grid Project En to Fr Translator Ja to En Translator 6

Atomic Service p p Wrap language resources as Web services equipped with a standard Atomic Service p p Wrap language resources as Web services equipped with a standard interface Language service ontology is required for wrapping language resources to standardize interfaces of machine translations or dictionaries. Language Resource Morphological Analyzer Dictionary Parallel Texts NICT Language Grid Project Wrapper Machine Translation Web Service Morphological Analyzer Dictionary Parallel Texts Machine Translation 7

Standard Interfaces p Translation Service n n p Morphological Analysis Service n n p Standard Interfaces p Translation Service n n p Morphological Analysis Service n n p Input: analyze(language, text) Output: Mopheme[], Morpheme={word, lemma, part. Of. Speech} Bilingual Dictionary n n p Input: translate(source. Lang, target. Lang, source) Output: String Input: search(head. Lang, target. Lang, head. Word, matching. Method) Output: Translation[], Translation={head. Word, target. Words[]} Parallel Text n n Input: search(source. Lang, target. Lang, source, matching. Method) Output: Parallel. Text[], Parallel. Text={source, target} p Pictogram, Paraphrase, … p Wrapper libraries to ease implementation of wrappers will be provided as open sources. (http: //langrid. nict. go. jp/langrid-developers-wiki/) NICT Language Grid Project 8

Composite Service p p p To create a new language service, describe an abstract Composite Service p p p To create a new language service, describe an abstract workflow Register the abstract workflow into Language Grid Core Node Assign an concrete atomic service to each task in the abstract workflow in invoking the service n Put the binding information into the SOAP header Translation ja->en Translation en->de Change Service JServer Web Transer …… Translation Services NICT Language Grid Project 9

Workflow can be Complex! Atomic Services Multilingual Backtranslation Japanese-German Domain Specific Translation (ja->zh->ja, ja->de->ja, Workflow can be Complex! Atomic Services Multilingual Backtranslation Japanese-German Domain Specific Translation (ja->zh->ja, ja->de->ja, ja->en->ja) Japanese Morphological Analysis Me. Cab (by NTT CS) Translation Technical Term Extraction ja->en ja->zh ja->de No No remaining terms? Translation Yes zh->ja de->ja Technical Term Intermediate en->ja Multilingual Code Dictionary Table + + 3 translation results, 3 back translation results (by NPO Pangaea) Intermediate Code Insertion Translation ja->en Translation ja->de Term Replacement NICT Language Grid Project Pangaea’s Community Dictionary JServer (by Kodensha) Translation en->de Web Transer 10 (by Cross Language) Japanese-German Translation 10

Language Service Management Architecture Policy Service Manager create Language Service Users Language Resource Providers Language Service Management Architecture Policy Service Manager create Language Service Users Language Resource Providers WSDL Service Registration Monitor get Access Constraint Application System Virtual Endpoint 1. SOAP 3. SOAP Endpoint URL Access Log Load Access Controller Balancer Access Logging Service Invoker 2. SOAP Composite Service Engine (Active. BPEL, Java. Script, etc) Language Grid Core Node NICT Language Grid Project WSDL Language Service Wrapper Atomic 4. SOAP Service Engine Language Resources Language Grid Service Node 11

Service Manager Web-based tool to manage Language Grid users, language resources, and language services Service Manager Web-based tool to manage Language Grid users, language resources, and language services on the Language Grid. (http: //langrid. org/operation/service_manager) NICT Language Grid Project (http: //langrid. org/operation/service_manager/) 12

Monitoring & Control of Language Services To Monitor and Control the Language Services p Monitoring & Control of Language Services To Monitor and Control the Language Services p Monitor access date, IP address, and data transfer size of each request p Set access right for each user p Control accesses per day/month/ year, and data transfer size NICT Language Grid Project 13

Case Studies (1) p NICT Hard to provide free EDR (Concept/Bilingual Dictionary) services because Case Studies (1) p NICT Hard to provide free EDR (Concept/Bilingual Dictionary) services because NICT sells it. ⇒ Set 1000/month and 15 KB/access for bilingual dictionary ⇒ Set 2000/month and 35 KB/access for concept dictionary (These polices are configured to take almost one year for downloading whole data!!) ⇒ Allow only members to access EDR services without any restrictions n p Kodensha Co. , Ltd. Hard to provide free J-Server service to users who are Kodensha’s business target ⇒ Prohibit them to access the free J-Server service ⇒ Allow only members to access the latest and high-quality J -Server service operated by Kodensha n NICT Language Grid Project 14

Case Studies (2) p Kyoto University Have a responsibility to prevent illegal usage because Case Studies (2) p Kyoto University Have a responsibility to prevent illegal usage because Kyoto U. provides services based on resources it purchased from companies ⇒ Monitor whether the services are abused or not ⇒ Detect excessive access from a specific IP address n p GSK Promote language resource distribution on behalf of language resource providers ⇒ Deploy the language resources on GSK’s server and allow users who purchase the resources to access them (This hosting model can reduce language resource providers’ burden for selling and operating them) n NICT Language Grid Project 15

Service-Oriented Approach: Pros and Cons Pros p From Having to Using: Service-oriented approach can Service-Oriented Approach: Pros and Cons Pros p From Having to Using: Service-oriented approach can relax complex issues of intellectual property rights of language resources. p Cloud Services: Service-oriented approach allows resource providers to scale up the usage of language resources. p Service Federation: Service-oriented approach allows language services to be easily combined with other services, i. e. , e-learning services, ambient intelligent services, etc. Cons n Maintenance Cost: Language services should be maintained and provided continuously by secure providers. n Market Pull: Language services should be designed based on market demand that is hard to be controlled by academic communities. NICT Language Grid Project 16

Summary p Propose service-oriented collective intelligence platform to manage language services n p Develop Summary p Propose service-oriented collective intelligence platform to manage language services n p Develop language service management architecture n n n p Enable language resource providers to provide their services while holding their ownership of their resources Monitoring of language services Access control of language services Service Manager is a Web-based GUI Collect experience of operating the first operation of serviceoriented platform n n Several language service policies Pros and Cons of service-oriented approach NICT Language Grid Project 17

NICT Language Grid Project 18 NICT Language Grid Project 18

Role of the Language Grid p Difficulties often arise while trying to share and Role of the Language Grid p Difficulties often arise while trying to share and combine the existing language resources and use them in real field n n p Complex contracts and intellectual property rights Non-standardized application interfaces Improve the accessibility and usability of those language resources and encourage users to create new language services that suit their needs by combining several language resources n Standardize interfaces of language resources by wrappers n Publish language resources not as source program but as Web services n Combine language resources by Web service workflows n Manage those service profile n Control access to those resources NICT Language Grid Project 19

Language Grid Core Node and Service Node Language Service Management Language Resource Providers WSDL Language Grid Core Node and Service Node Language Service Management Language Resource Providers WSDL Language Service Users create Service Registration Monitor get Access Constraint IC Tools 1. SOAP Virtual Endpoint URL Access Log Service Invoker 2. SOAP Composite Service Engine 3. SOAP (Active. BPEL, UIMA, Ho. G, etc) Language Grid Core Node NICT Language Grid Project WSDL 4. SOAP Atomic Service Engine 5. HTTP, Function Call, etc. Language Resources Language Grid Service Node 20

Participants / Language Services • Participants (17 countries, 118 groups) – University / Research Participants / Language Services • Participants (17 countries, 118 groups) – University / Research Institute • Kyoto Univ. (Japan), Shanghai Jiaotong Univ. (China), Univ. of Stuttgart (Germany), IT Univ. of Copenhagen (Denmark), Princeton Univ. (U. S), DFKI (Germany), CNR (Italy), Chinese Academy of Sciences (China), NECTEC (Thailand), and more. – NPO/NGO/Public Sector • NGOs for disaster reduction, Public Junior-high schools, City Boards of Education, and more. – Corporate (CSR activities / language resource providers) • NTT, Toshiba, Oki, Google, Kodensha, Translution, and more. • Language Services (more than 60) – Machine Translator • J-Server, Web-Transer, Toshiba, Parsit, Google Translate, and more. – Dictionary, Parallel Text • EDR , Wordnet, Life Science Dictionary, Multi-language Glossary on Natural Disasters, and more. – Morphological Analyzer – Dependency Parser – Composite Services NICT Language Grid Project 21

Atomic Service Dictionary Service Wrapping Dictionary Service Parallel Text Service Wrapping Parallel Text Service Atomic Service Dictionary Service Wrapping Dictionary Service Parallel Text Service Wrapping Parallel Text Service Machine Translation Service Wrapping MT Machine Translation Service Human Translation Service Wrapping Human N I C T Translator Language Grid Human Translation Service Project Language Grid Dictionary Search translated word Search similar translated text Hinanbasho (Disaster shelter) disaster shelter Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house) The disaster shelter is []. Translate by machine Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house) Disaster shelter is school close from a house. Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house) Translate with high quality Your disaster shelter is the school closest to your house. 22

NICT Language Grid Project 23 NICT Language Grid Project 23

Fourth Layer: Intercultural Collaboration Tools Language Grid Toolbox (developed by NICT) Multilingual BBS Text Fourth Layer: Intercultural Collaboration Tools Language Grid Toolbox (developed by NICT) Multilingual BBS Text Translation Input Translation Result Back Translation Result ・Estimate the translation accuracy using backtranslation Multilingual BBS Text Translation Language Resource Creation ・Submit messages in users’ mother languages ・Improve the translation result by post editing manually Multilingual Dictionary Multilingual Corpus Dictionary Data Parallel Texts XOOPS (Open Source Software) Language Services on the Language Grid ・Create multilingual dictionaries specific to users’ communities NICT Language Grid Project Toolbox was released as OSS. http: //langrid-tools. nict. go. jp/toolbox/ 24

Operation of the Language Grid p Diverse stakeholders n n p p Language Service Operation of the Language Grid p Diverse stakeholders n n p p Language Service User Language Resource Provider Computation Resource Provider Language Grid Operator Language Grid for non-profit use has been operated by Kyoto Univ. since December 2007. The Letter of Agreement on the Language Grid is available. (http: //langrid. org/operation/) p 118 organizations (from 17 countries) signed the agreement NICT Language Grid Project Language Service User control their resources Language Resource Provider Computation Resource Provider Language Grid Operator 25

Language Grid Association (http: //langrid. org/associaiton/) NICT Language Grid Project 26 Language Grid Association (http: //langrid. org/associaiton/) NICT Language Grid Project 26

Intercultural Collaboration Tools Language Grid Playground (developed by Kyoto Univ. ) NICT Language Grid Intercultural Collaboration Tools Language Grid Playground (developed by Kyoto Univ. ) NICT Language Grid Project http: //langrid. org/playground 27

M 3 (developed by Wakayama Univ. ) For medical staff http: //www. langrid. org/association/m M 3 (developed by Wakayama Univ. ) For medical staff http: //www. langrid. org/association/m 3 support/indexe. html NICT Language Grid Project For foreign patient 28

Pangaea Community Site (NPO Pangaea) p p Pangaea is an NPO which aims at Pangaea Community Site (NPO Pangaea) p p Pangaea is an NPO which aims at supporting communication between children in various countries Pangaea Community Site allows the participants and the staffs to n Communicate in their own language using translation service n Japanese, Korean, English, German n Revise the result of machine translation for other people NICT Language Grid Project 29

Pangaea as a Language Resource Provider p Pangaea is also a provider of language Pangaea as a Language Resource Provider p Pangaea is also a provider of language resources n Pictograms designed for communication between children in different countries n Pangaea Community dictionary which contains 500 terms for Pangaea’s activities p e. g. Pangaean (participants of activities), Koetsuna (ice breaking activity for children) Pictograms and community dictionary Pangaea Activities (Community Site) Language Grid Both resources are provided as Web Combining Korean, Japanese, English, and services and combined with other German Morphological Analyzers, community services on the Language Grid NICT Language Grid Project dictionary, and 2 Machine Translators 30

Multilingual Communication System (Kyoto University, Ritsumeikan University) Japanese user Fujimi Junior High School Every Multilingual Communication System (Kyoto University, Ritsumeikan University) Japanese user Fujimi Junior High School Every students 584 Filipino 4 Chinese user Chinese 6 Korean 2 Peruvian 1 Autocomplete Translation Backtranslation NICT Language Grid Project 31