6d44e987e2d5c55eb536f5b4bfbcdb35.ppt
- Количество слайдов: 47
LRC-XI-11 th Annual Internationalisation and Localisation Conference A Paper On Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering Vishwakarma Institute of Technology, Pune, India Organised By: Localisation Research Centre (LRC), Department of Computer Science and Information Systems (CSIS), University of Limerick, Ireland. 1
Agenda Introduction — Why Web Page Localisation? — Borderless Integration — Why Multilingual Web Sites? — What is Locale and multi-locale Operation? — Internationalisation and Key Challenges — I 18 n Standard: Important Issues and Business Context — Variance : Regional and Cultural Issues System Design — Web Localisation and Rural India — Localization Approaches — Architecture of Servers System Implementation and Test Results — Configuration of Server — Localisation Test Results — Alternative Approach Conclusion References 2
Why Web Page Localisation? International Market and Customers Service Sector Online Business Web Localisation Internet ü Increased Sales Leads ü Advantage of Global growth Banking Sector Open Linguistic Barriers Information Repository Objective Information Convenience ü Reduce Marketing Costs Closed Linguistic Barriers 3
Borderless Integration Local Model Business Process Business Entities Customer Integration Logic Resource Mapping Global Integration Deployment Business Logic Market Research Analyse Optimize Process Internet Framework 4
Why Multilingual Websites? Over 100 million people access the Internet in a language other than English. Over 50% of web users speak native language other than English According to Forrester research, 50% of all online sales are expected to occur outside USA. Web users are four times more likely to purchase from a site that communicates in the customer’s native language. “Your website is your window to the world…” 5
Basic Terminology Locale Set of features that can be varied depending on the language and culture of the user or the data Internationalisation The process of designing software so that it can be easily adapted to different locales Localisation The process of adapting software to a locale 6
What is Locale? A locale is an abstraction: a data processing structure that identifies a collection of culturally and linguistically affected preferences. Java locales are associated with upwards of 300 pieces of data — time zone names — collation sequences — the infinity symbol — Number formats — Days of the week Locales generally do not contain this data themselves. They represent a way of obtaining “localized behavior” in the system. Locales are generally part of the programming context or environment. 7
Multi-Locale Operation Server Processes Message Passing Client Locale Logic Execution System Context Separation Message Passing Client Locale Logic Execution Design Policy APIs provide late binding localisation 8
Internationalisation "I 18 n" is an abbreviation for the word "Internationalisation". The term "i 18 n" is derived from its spelling as the letter "i" plus 18 letters plus the letter "n". I+n 1 t 2 e 3 r 4 n 5 a 6 t 7 i 8 o 9 n 10 a 11 l 12 i 13 s 14 a 15 t 16 i 17 o 18+n The extension of this naming convention to the terms Localisation (l 10 n), Europeanisation (e 13 n), Japanisation (j 10 n), Globalisation (g 11 n), seemed to come somewhat after the invention of "i 18 n". — — — Potentially handle multiple languages, customs in the world Displaying/ Inputting characters for the users' native languages. Handling popular encoding for the users' native languages. Native characters for file names and other items. Character classification & sorting. Typesetting and hyphenation rules. 9
Locale and Parameterisation n Availability, Performance n Continuity of i 18 n features n Translation Standards Key Challenges Encoding and Character Set n Unicode support and implementation n Use of language specific encoding n Configuring encoding Data Correspondence R In efe fo re rm nc at e io n Presentation, Processing n UI design n Handling collation n Migration of existing data 10
Important Issues in I 18 n 11
Business Context of I 18 n To improve effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces To reach out to global customer base by providing language/culture specific interfaces and allow for international preferences. New Application New Service New Product Mergers / Acquisitions. Internationalisation To support region specific functionality (due to legal aspects, financial practice etc. ). To provide region specific value added services (like UI, look and feel, Sorting/Searching). Old Product Old Application Existing Service To consolidate same functionality application/service developed and maintained separately for separate language/region. 12
Regional and Cultural Differences Software solutions should be designed to fit into the cultural context of the user Examples Naming of the product Differences in the meanings of jargons Confusing graphical symbols National rules, conventions Religious beliefs and assumptions Basic cultural values and customs No appropriate translations available for phrases and slogans Favorite sports and slangs cultural anachronisms Reading left-to-right, top-to-bottom etc… 13
Language and Character Encoding Language peculiarities Hyphenation Collation Spelling Transliteration English: German: Swedish/Finnish: Norwegian: ABC. . . RSTUVWXYZ AÄB. . . NOÖ. . . SßTUÜV…YZ AB. . . STUVWXYZÅÄÖ AB…VWXYÜZÆØÅ There are various “standards” and they are varied for different languages ISO standards: ISO-8859 -1, 2, 3, 4, 5, 6, 7, Windows-1252 Chinese encodings: Big 5, Big 5 -HKCS, GB 18030, GB 2312 Japanese and Korean: EUC-JP, EUC-KR, ISO-2022 -JP, ISO-2022 -KR 14
Unicode Character Standard Developed by the Unicode Consortium Covers all major living scripts Version 4. 0 has 96, 000+ characters Capacity for 1 million+ characters Unicode Character Set = ISO 10646 Unicode adds character properties and algorithms ISO and Unicode work together to synchronize ISO support enhances international acceptance 15
Date / Time Formats Variance Locale Example Format U. S. A. 2/16/05 mdy, / France 16. 2. 05 dmy, . France 16 -2 -05 dmy, - CJKT 2005/2/16 ymd, / Japan 17/2/16 ¥md, / Hour minute separators, AM, PM, Time. Zone • • • India : U. S. A. : France : Japan : 4: 00 P. M. 4: 00 p. m. 16. 00 1600 4: 00 16
Numbers / Currency Variance Varieties in group and fractional separators • • • India : England : Germany : Switzerland: Swiss money: France : 12, 34, 567. 89 12, 345. 67 12. 345, 67 12’ 345. 67 12 345, 67 Varieties in symbol placement, symbol length, precision, number width, rounding rules • • • India : Rs. 12, 34, 567. 89 ; Re. 1 U. S. A : US $1, 234, 567. 89 France : 12. 345, 67 € Portuguese : 12$34 ESC Portuguese : 12$34€ 17
System Design 18
Indian Languages Profile 19
Percentage Languages Usage Index Data Source : 2001 Census of India Language Number Percentage Hindi 337, 272, 114 40. 22% Bengali 69, 595, 738 8. 30% Telugu 66, 017, 615 7. 87% Marathi 62, 481, 681 7. 45% Tamil 53, 006, 368 6. 32% Urdu 43, 406, 932 5. 18% Gujarati 40, 673, 814 4. 85% Kannada 32, 753, 676 3. 91% Malayalam 30, 377, 176 3. 62% Oriya 28, 061, 313 3. 35% Punjabi 23, 378, 744 2. 79% Assamese 13, 079, 696 1. 56% Sindhi 2, 122, 848 0. 25% Nepali 2, 076, 645 0. 25% Konkani 1, 760, 607 0. 21% Manipuri 1, 270, 216 0. 15% Kashmiri 56, 693 0. 01% Sanskrit 49, 736 0. 01% Other Languages 31, 142, 376 3. 71% Total : 838, 583, 988 100. 00% 20
Indian Currency Example Indian Currency (Value Rs. 10) Population resides in villages of India : 70% Total number of Languages in India : 40 Official Languages : 22 Language Panel Overall Literacy Rate : 64. 20 % English Language Literacy : 17. 75 % 15 major Indian Languages 21
Information Channelisation Internationalisation Text Extraction Prepare material for localisation (account for text expansion, avoid embedded text. . ) Extract text from source Files (graphics, PDFs etc. ) Translation Translate content from Extracted materials Localisation Replace graphics, change colors, redesign layout to accommodate target culture. 22
Localisation Process Translation Errors Text Placement in Separate File Site Acceptance Factors — Color — Image — Representation Web page is “dynamically” converted into target language Late Binding Localisation Language selection Static web page is selected and displayed Mapping Techniques Translation 23
Server Architecture Client Browser_1 Client Browser_2 Client Browser_3 Client Browser_n Default Alternative Language Response S O C K E T A P I Localised Content --------------- Parse Request Module HTML Server Property File ----------------- 24
Implementation: Parse Request Module Definition – To parse the request header Responsibilities – To parse the request header – To analyze and forward the request – Provide log to the administrator Compositions – Main server loop – Threads Interfaces/Ports — Socket APIs 25
Parse Request Module Architecture Thread 1 Thread 2 Main Server Loop Thread 3 Thread 4 Thread 5 Thread n 26
HTML Server Definition – Default implementation of HTTP protocol – Processes static HTML requests Responsibilities – Process static HTML request – Process dynamic Internationalisation request Compositions – Server Processes Interfaces/Ports — Socket APIs 27
HTML Server Architecture GET Request Processor Static Response Default Language Alternative Language Parse Protocol GET/POST --------------- . properties ----------------- Static Response Default Language Alternative Language --------------- POST Request Processor 28
System Implementation and Test Results 29
Java Support for Internationalisation The Locale class lets applications identify locales, allowing for truly multilingual applications. The Resource. Bundle class provides the foundation for localisation, including localization for multiple locales in a single application container. The Date, Calendar, and Time. Zone classes provide the basis for time handling around the globe. The String and Character classes as well as the java. text package contain rich functionality for text processing, formatting, and parsing. Text stream input and output classes support converting text between Unicode and other character encoding. 30
Conversion Process Character conversion is a pretty straightforward process as long as there is a one-to-one mapping between sequences of Unicode characters on one side and sequences of bytes in another encoding on the other side, and the input only consists of characters or bytes that have mappings. The reality is : — — — A single character in a non-Unicode encoding may have multiple equivalent representations (say, a precomposed character and a sequence of base character and combining mark). A character in one encoding may not have an equivalent in the other encoding. An invalid sequence of bytes or characters may show up in the input. 31
Process: Configure Server 32
Process: Register 33
Process: Log 34
Process: Localise Servlet 35
Web Page in English with IE 36
Web Page in Spanish with IE 37
Web Page in Dutch with IE 38
Web Page in French with IE 39
Web Page in Italian with IE 40
Web Page in Portuguese with IE 41
Web Page in German with IE 42
Web Page in English with IE 43
Web Page in Marathi with IE 44
Conclusion The Java Localisation API`s come in handy to dynamically localise the web page into alternative languages The rich set of Java class libraries such as java. util. Resource. Bundle and java. util. Locale provide an efficient approach to work with locale specific information More manageable workspace for users in native language Regional Settings, Colour, Image representation not disturbed Improves effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces Supports region specific functionality (due to legal aspects, financial practice etc. ). Provides region specific value added services (like UI, look and feel, Sorting/Searching). consolidate same functionality application/service developed and maintained separately for separate language/region. 45
References [1]. Fernandez, N. C. (2000), Web Site Localisation and Internationalisation: A Case study, published, City University [2]. Khachane, J, (2005), Web Page Localisation, published Pune University [3]. DEPALMA, D. A. (1999), Strategies for Global Sites, Forrester Research Inc, May 1998 and The e. Business Report. In: e. Marketer [4]. ROCHE, M. (2000) Managing Multilingual Web Applications. 16 th International Unicode Conference, Amsterdam [5]. NIELSEN, J. (1999) Designing Web Usability, Indianapolis: New Riders Publishing [6]. Deitsch, Loukides, M, Java Internationalisation 46
47


