OPS-25: Unicode and the Data. Server David Moloney Software Architect

Agenda Unicode deployment with Open. Edge® Data. Servers § Unicode: • How did we get here ? What are its broader Open. Edge implications ? What are its Data. Server implications ? Specific Implementation in the Data. Servers for: – Oracle® – MS SQL Server

Code Pages ASCII: 7 -bit 127 Character Set Special Chars Upper Case Lower Case 97 32 Space A 66 B 67 C 68 D 33 69 101 9 t(Tab) 10 n(NL) 13 r(CR) ! 34 " 35 # 37 … … … … 3 65 E 70 F 71 G 72 … … … … 98 99 100 … … … a b c d … … … Extended ASCII 128 € 129 � 130 ‚ 131 ƒ 132 „ 133 … … … 125 … ü 253 ý 126 254 127 Extended 255 Character Sets: • ISO 8859 -1 • 1250 • IBM 437/850 255

8 -bit Code Pages § Examples of character encoding: ISO 8859 -1 1252 1250 IBM 437 IBM 850 IBM 852 a 61 61 á E 1 E 1 A 0 A 0 È C 8 n/a n/a D 4 n/a Č n/a C 8 n/a AC " 4 ISO 8859 -2 n/a 93 93 n/a n/a

Data Corruption ISO 8859 -1 1250 E 8 Avoid This "è" France 5 E 8 "č" Czech Republic

What is Unicode ? ("Unique Code") § A character encoding standard that: • Replaces all legacy SBCS & MBCS systems • Can assign more than a million numbers – Highest code point: "U+10 FFFF"=2^20+2^16=1, 114, 112 • Gives one "unique" number/text-symbol-character • Provides one internationalization process • Is Not platform, program, country or language specific • Is essential to the Web (HTML, XML, etc. )

How is Unicode encoded ? "UTF-x" UTF = Unicode Transformation Format x = Minimum length of coding unit U+0000 U+0001 Extended ASCII (ISO 8859 -1) U+0002 U+0003 … … UTF-16 BMP U+00 FF ÿ … … UTF-32 UTF-8 Ease of Use Storage Space U+FFFF U+100000 … … U+10 FFFD The Encoding Tradeoff Supplementary Range U+10 FFFE Char ANSI Number Unicode Number ANS Hex Unicode Range ÿ 255 0 x. FF U+00 FF Basic Latin U+10 FFFF = 1, 114, 112

UTF Encoding Examples UTF-8 UTF-16 UTF-32 U+004 D 4 D 00 00 00 4 D U+00 A 1 C 2 A 1 00 00 00 A 1 U+00 E 1 C 3 A 1 00 E 1 00 00 00 E 1 U+0470 D 0 C 0 04 70 00 00 04 70 U+4 E 9 C E 4 BA 9 C 4 E 9 C 00 00 4 E 9 C U+10302 F 0 90 9 C 82 D 8 00 DF 02 00 01 03 02 BMP Unicode

UTF Encoding Examples UTF-8 UTF-16 UTF-32 U+004 D 4 D 00 00 00 4 D U+00 A 1 C 2 A 1 00 00 00 A 1 U+00 E 1 C 3 A 1 00 E 1 00 00 00 E 1 U+0470 D 0 C 0 04 70 00 00 04 70 U+4 E 9 C E 4 BA 9 C 4 E 9 C 00 00 4 E 9 C U+10302 F 0 90 9 C 82 D 8 00 DF 02 00 01 03 02 BMP Unicode (Oracle) NLS_LANG UTF 8 3 -byte "Modified": C 0 D 8 00 80 DF 02 AL 32 UTF 8 4 -byte "Standard": F 0 90 9 C 82

Unicode Conversion § All code pages convert to Unicode § Unicode may not convert to other code pages IBM 437 IBM 852 IBM 850 1252 ISO 8859 -1 ü Unicode ? IBM 437 IBM 852 IBM 850 1252 ISO 8859 -1

Agenda The path to successful development & deployment § Unicode: • How did we get there ? What are its broader Open. Edge implications ? What are its Data. Server implications ? Specific Implementation in the Data. Servers for: – Oracle – MS SQL Server

The Unicode "Solution" ? Yes ! § YES ! • One stop shopping for Internationalization! § NO, there are considerations to be addressed: • Operating System • Web Server (XML Schemas and HTML) • Print drivers • Data from/to other systems • OCX's • Terminal Emulators

Open. Edge Globalization Settings For more info: See "Internationalizing Applications" Guide Primary Parameters Secondary Database Parameters Settings -cpinternal -cplog _db-xl-name -cpstream -cpterm _db-coll-name -cpcoll -cpprint -d -numsep -E -numdec -cprcodein -cprcodeout -lng Existing Open. Edge Constructs: • Convmap. cp – Character Processing Tables • Progress. ini Fonts New Open. Edge Construct: • ICU Library – For Linguistic Sorting

Common Mistakes Loading or importing data with the wrong code page ÄŚzech 0 125 C 4 8 C 7 A 65 63 68 ISO 8859 -1 Ä zech UT F-8 Čzech

Byte Order Mark (BOM) on ! auti C Čzech 0 125 EF BB DF C 4 8 C 7 A 65 63 Čzech ISO 8859 -1 68 UT F -8 Čzech e rit W OUTPUT TO text. txt CONVERT TARGET "UTF-8". PUT CONTROL "~357~273~277". /* BOM */ PUT UNFORMATTED "UTF-8 text". OUTPUT CLOSE

Common Mistakes Loading or importing data with the wrong code page (…) "imuller" "Ian Muller" "Y" "C" 1657 283200 "jdoe" "Jane Doe" "N" "U" 3275 450010 "jsmith" "John Smith" "Y" "C" 1450 323700 "jsanchez" "Juan Sánchez" "Y" "C" 4250 323900. PSC filename=users records=000001133 ldbname=mydatabase timestamp=2007/03/28 -20: 55: 03 numformat=44, 46 dateformat=mdy-1950 map=NO-MAP cpstream=ISO 8859 -1. 0000143373

Common Mistakes Updating data with the wrong code page _progres E 0 -cpinternal IBM 850 _mprosrv -cpinternal ISO 8859 -1 D 3 E 0 -cpstream IBM 850 OS = 1252 E 0 à _db-xl-name ISO 8859 -1 D 3 Ó

Common Mistakes Updating data with the CORRECT code page _progres 85 -cpinternal IBM 850 _mprosrv -cpinternal ISO 8859 -1 E 0 -cpstream 1252 OS = 1252 E 0 à _db-xl-name ISO 8859 -1 E 0 à

Real Life Story ASCII Linefeed (0 x 0 A) to EBCDIC Newline (0 x 25) Hi Bob, CRLF How are you? CRLF Bye Data. Server for ODBC 0 x 0 A IBM 037 EBCDIC Open. Edge Client -cpstream iso 8859 -1 0 D 0 A -cpinternal iso 8859 -1 _db-xl-name IBM 037 0 x 0 A Iso 8859 -1 ASCII Hi Bob, ▐How are you? ▐Bye 0 x 0 A

Real Life Story ASCII Linefeed (0 x 0 A) to EBCDIC Newline (0 x 25) Hi Bob, CRLF How are you? CRLF Bye Data. Server for ODBC 0 x 25 IBM 037 EBCDIC Open. Edge Client -cpstream IBM 850 OD 0 A -cpinternal IBM 850 _db-xl-name IBM 037 0 x 25 0 x 0 A IBM 850 ASCII Hi Bob, How are you? Bye 0 x 0 A

Tips & Hints Un-corrupting data § ISO 8859 -1 database with data encoded in § IBM 850 Run on session with -cpinternal iso 8859 -1 FOR EACH my. Table EXCLUSIVE-LOCK. RUN Fix. Char(INPUT-OUTPUT my. Table. my. Field). END. PROCEDURE Fix. Char: DEF INPUT-OUTPUT PARAM c AS CHAR NO-UNDO. c = CODEPAGE-CONVERT(c, "IBM 850", "ISO 8859 -1"). END PROCEDURE.

Database Sorting Rules Are not all the same FOR EACH table WHERE name <= CHR(126). FOR EACH table WHERE name >= CHR(126). -cpinternal MSS 1252 # $ ~ Alphanumerics _Db-collate Iso 8859 -1 Basic # $ Alphanumerics ~

Agenda The path to successful development & deployment § Unicode: • How did we get there ? What are its broader Open. Edge implications ? What are its Data. Server implications ? Specific Implementation in the Data. Servers for: – Oracle – MS SQL Server

Unicode Deliverables re utu F 1 0. 1 C 10 C 1 10. 03 1 B 10. B 0 10. Unicode A 0 10. ICU Collation Unicode for Unicode MSS for + MSS Oracle Data. Srvr (limited) Data. Srvr + CLOBs Oracle NCLOB Support MSS CLOB Support + CLOB Params To Stored Proc. 's

Open. Edge Settings _db-xl-name, -cpinternal and -cpstream Open. Edge Process GUI Database -cpinternal CHUI Keyboard -cpstream Screen Printer Open. Edge code page conversions _db-xl-name OS files

Open. Edge Settings _db-xl-name, -cpinternal and -cpstream Open. Edge Process -cpinternal -cpstream Screen Printer Open. Edge code page conversions OS files Schema Holder _db-xl-name DB Driver Foreign Data Source Database CP Nee ds t om Ne atch ed st o m at ch GUI Layer or process CHUI Keyboard Data. Server Driver Conversions ?

Open. Edge Settings WEBSPEED™ _progres -web -cpinternal ORACLE Database -cpinternal -cpstream Web Browser OS files GUI CLIENT prowin 32 -cpstream Printer OS files -cpinternal Keyboard Screen OS files Printer CHUI CLIENT _progres -cpstream Printer Schema Holder _db-xl-name -cpstream -cpinternal Screen ma APPSERVER™ _proapsv -cpinternal _db-xl-name -cpstream OS files Keyboard Driver match DATASERVER _orasrv OS files

Dictionary Utilities changed for Unicode For Both Oracle and MS SQL Server • Schema Migration * – Including Unicode batch mode parameters • Update/Add Table Definitions + • Verify Table Definitions + • Adjust Schema + • Generate delta. sql * • Dump as Create Table Statement * * "Use Unicode Types" GUI selection provided + Modified to handle Unicode types internally

Comparing 10. 1 C Unicode: Oracle vs. MSS Open. Edge Attribute ORACLE MSS § DB-Codepage (_db. _db-xl-name) § Data Types CHAR, LONGCHAR, CLOB CHAR, VARCHAR 2, LONG, CLOB NCHAR, NVARCHAR 2, NCLOB (in 10. 1 C 01) NCHAR, NVARCHAR(max)and NTEXT mapped to Open. Edge CHAR Max. Char Size CHAR: 30, 000 bytes LONGCHAR/CLOB: 1 G CHAR types: 4000 bytes CLOB types: 4 G CHAR types: 8000 bytes CLOB types: 2 G Max. Char Size for Unicode Same as above but. . . CHAR: 15, 000 bytes using MSS Data. Server 4000 bytes 4000 chars Semantics Character or Byte (double-byte) Character Driver Settings N/A NLS_LANG=. AL 32 UTF 8 ACP=Active Code Page UTF-8 NLS_CHARACTERSETS: AL 32 UTF 8 & UTF 8 NLS_NCHAR_CHARACTERSETS AL 16 UTF 16 or UTF 8 UCS-2 (partial UTF-16) Unicode Definitions Database Code Pages § Data Types § Data Types

Common Unicode Requirements Data. Server Migration Open. Edge Process -cpstream Open. Edge code page conversions Foreign Data Source Database CP DB Driver Layer or process h UTF-8 Data. Server ma tc -cpinternal Driver Conversions ? Ne Schema Holder _db-xl-name UTF-8 ed s to UTF-8 o st m tch a d e Ne Database cpstream=ISO 8859 -1 _db-xl-name ANSI or UTF-8 . d file cpstream=ISO 8859 -5 PRODB . d file Recommended: Set $DLCDB environment

Agenda The path to successful development & deployment § Unicode: • • How did Agenda The path to successful development & deployment § Unicode: • • How did we get there ? What are its broader Open. Edge implications ? What are its Data. Server implications ? Specific Implementation in the Data. Servers for: – Oracle – MS SQL Server 32 OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

Oracle Data. Server Migration _db-xl-name, -cpinternal and -cpstream Open. Edge Process Open. Edge -cpinternal Oracle Data. Server Migration _db-xl-name, -cpinternal and -cpstream Open. Edge Process Open. Edge -cpinternal Driver conversions 10. 1 C ORACLE Data. Server UTF-8 Layer or process -cpstream OCI Client Library NLS_LANG=. AL 32 UTF 8 ORACLE 9 i+ Database Charset National Charset Mat ch UTF-8 _db-xl-name UTF-8. d file Database cpstream=ISO 8859 -1 _db-xl-name ANSI or UTF-8 33 OPS-25: Unicode and the Data. Server . d file cpstream=ISO 8859 -5 Schema Holder VARCHAR NVARCHAR CLOB CFILE NCLOB © 2008 Progress Software Corporation

Oracle Unicode Migration Ø What version of ORACLE Ø Unicode Instance and Unicode drivers Oracle Unicode Migration Ø What version of ORACLE Ø Unicode Instance and Unicode drivers must be 9 i or above Ø Codepage for Schema Image Ø Declares Unicode Ø 34 OPS-25: Unicode and the Data. Server Collation Name Ø Sets ICU collation © 2008 Progress Software Corporation

Oracle Unicode Migration Two ways to configure an ORACLE database to store Unicode: Ø Oracle Unicode Migration Two ways to configure an ORACLE database to store Unicode: Ø Use Unicode Types Ø Unchecked – Uses Database Charset NLS_CHARACTERSETS: Ø AL 32 UTF 8 Ø UTF 8 ü Checked – Uses National Language Charset Ø NLS_NCHAR_CHARACTERSETS: Ø AL 16 UTF 16 Ø UTF 8 35 OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

Oracle Unicode Migration Ø For field width’s use Ø Width (recommended) Ø Use SQL Oracle Unicode Migration Ø For field width’s use Ø Width (recommended) Ø Use SQL Width Tool Ø Char semantics ü Checked – CHAR(10) = 10 chars (w/UTF 8 =10– 30 bytes) (w/AL 32 UTF 8=10 -40 bytes) Ø Unchecked – CHAR(10) = 10 bytes 36 OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

Oracle Unicode Migration Ø Maximum char length ü Use Unicode Types Ø = 2000 Oracle Unicode Migration Ø Maximum char length ü Use Unicode Types Ø = 2000 (assumes NCS = AL 16 UTF 16 ) Ø Use Unicode Types Ø = 1000 (assumes DB CP = AL 32 UTF 8 Ø Expand to CLOB ü Checked – Greater than Maximum char length produces CLOB Ø Unchecked – Greater than Maximum char length produces LONG (backward compatible) 37 OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

Agenda The path to successful development & deployment § Unicode: • • How did Agenda The path to successful development & deployment § Unicode: • • How did we get there ? What are its broader Open. Edge implications ? What are its Data. Server implications ? Specific Implementation in the Data. Servers for: – Oracle – MS SQL Server 38 OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

MS SQL Server Data. Server Migration _db-xl-name, -cpinternal and -cpstream Open. Edge Process Open. MS SQL Server Data. Server Migration _db-xl-name, -cpinternal and -cpstream Open. Edge Process Open. Edge conversions Driver conversions 10. 1 C MSS Data. Server -cpinternal UTF-8 Layer or process -cpstream MSS 2005 Database ODBC Driver ACP = OS CP UTF-8 UCS-2 UTF-16 h d Im e pli atc M _db-xl-name UTF-8 Schema Holder . d file Database cpstream=ISO 8859 -1 _db-xl-name ANSI or UTF-8 39 . d file cpstream=ISO 8859 -5 OPS-25: Unicode and the Data. Server NCHAR NVARCHAR NTEXT NVARCHAR(max) © 2008 Progress Software Corporation

MS SQL Server Unicode Migration Ø ODBC Data Source Name Ø Must be Unicode MS SQL Server Unicode Migration Ø ODBC Data Source Name Ø Must be Unicode Driver Ø Codepage for Schema Image Ø Declares Unicode Ø Collation Name Ø Sets ICU collation Ø Use Unicode Types ü Checked – Selects Unicode (Changes Codepage to UTF-8) Ø NVARCHAR types Ø Unchecked – Uses non. Unicode character types Ø VARCHAR types 40 OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

MS SQL Server Unicode Migration Ø Maximum char length ü Use Unicode Types Ø MS SQL Server Unicode Migration Ø Maximum char length ü Use Unicode Types Ø = 4000 (assumes MSS 2005 = UCS-2 Ø For field width’s use Ø Width (recommended) Ø Use SQL Widtth Tool Ø Expand width (utf-8) ü Checked – Doubles width defined for NVARCHAR types Ø NVARCHAR(1000) becomes NVARHCAR (2000) 41 OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

Linguistic Sorting and Collation Sorting with Finnish collation FOR EACH mytable BY COLLATE(myfield, Linguistic Sorting and Collation Sorting with Finnish collation FOR EACH mytable BY COLLATE(myfield, "CASE-INSENSITIVE", "ICU-fi"): DISPLAY myfield WITH FONT 8. END. Basic ICU-fi Aaa Ááá Äää Ççç Ĉĉĉ Bbb Ccc Zzz 42 ICU-UCA Aaa Ááá Äää Bbb Ccc Ĉĉĉ Ççç Zzz Aaa Ááá Bbb Ccc Ĉĉĉ Ççç Zzz Äää OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

="," src="https://present5.com/presentation/c1524880103d1f842fc5af2814ad62f6/image-43.jpg" alt="Linguistic Sorting and Collation Comparing with Finnish collation FOR EACH mytable WHERE COMPARE(myfield, ">="," /> Linguistic Sorting and Collation Comparing with Finnish collation FOR EACH mytable WHERE COMPARE(myfield, ">=", "CASE-INSENSITIVE", "ICU-fi") BY COLLATE(myfield, "CASE-INSENSITIVE", "ICU-fi"): DISPLAY myfield WITH FONT 8. END. Basic ICU-fi Ccc Zzz 43 ICU-UCA Ccc Ĉĉĉ Ççç Zzz Äää OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

Linguistic Sorting and Collation Global Setup Caution with performance! Database -cpcoll ICU-uca TEMPTABLES App. Linguistic Sorting and Collation Global Setup Caution with performance! Database -cpcoll ICU-uca TEMPTABLES App. Server -cpcoll ICU-uca --Uses client collation in COMPARE and COLLATE -cpcoll ICU-en TEMPTABLES -cpcoll ICU-fr TEMPTABLES -cpcoll ICU-cs RUN ASprg. p ON h. App. Server (INPUT SESSION: CPCOLL, INPUT USERID, INPUT , OUTPUT TABLE tt. Mytable). 44 OPS-25: Unicode and the Data. Server TEMPTABLES -cpcoll ICU-fi English User French User Czech User Finnish User © 2008 Progress Software Corporation

8 -bit Code Pages § Where to find code page tables: § § • 8 -bit Code Pages § Where to find code page tables: § § • 10. 1 B Internationalizing Applications manual (IBM 850 and ISO 8859 -1) • http: //www. microsoft. com/globaldev/reference/cphome. mspx • http: //www 03. ibm. com/servers/eserver/iseries/software/globalization/codepag es. html • http: //en. wikipedia. org • http: //www. fileformat. info/charset/index. htm Where to find Unicode Fonts: • http: //en. wikipedia. org/wiki/Code 2000 Information about Windows fonts: http: //www. microsoft. com/typography/fonts/default. aspx http: //www. microsoft. com/globaldev/getwr/steps/wrg_font. mspx 45 OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

For More Information, go to… § § Progress e. Learning Community: • Understanding Internationalization For More Information, go to… § § Progress e. Learning Community: • Understanding Internationalization – Salvador Vinals § 46 PSDN • B 2420 -LV: From 26 to 96, 000 Characters in 60 Minutes • DEV-10: Supporting Multiple Languages in Your Application • DEV-23: Global Applications and Code Pages Documentation: • Open. Edge Data Management: Data. Server for Oracle • Open. Edge Data Management: Data. Server for Microsoft SQL Server • Open. Edge Development: Internationalizing Applications OPS-25: Unicode and the Data. Server © 2008 Progress Software Corporation

