Fun With Ccsids: Working With Unicode and Other Types of Data in RPG
Fun With Ccsids: Working With Unicode and Other Types of Data in RPG
' apostrophe?
} curly brace?
-7 minus 7?
125 one hundred twenty five?
1
Welcome to the Waitless World
We have to interpret it
By itself, x'7d'IBM
doesn't mean
Software Groupanything.
' EBCDIC
} ASCII
-7 1-byte packed decimal
125 1-byte integer
Another interpretation
2
Welcome to the Waitless World
Character data
Latin:
Cyrillic:
Japanese:
3
Welcome to the Waitless World
• 26 have A-Z
• Another 26 have a-z
• 10 have 0-9
• And there's the accented letters like Á, é, ñ, Ç
• And many characters like this; !@#$%^&*,©§¶¼½¾
Let's considerIBM
anSoftware
imaginary character set with just 32
Group
characters:
! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4
! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4
A B C D a b c d 4 Á é ñ Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾
A @ B # C $ D a b © § ¶ ¼ 4 c d Á é ñ 1 2 3 ! % ^ & * , ½ ¾ Ç 0
But they are always the same characters from that set.
4
Welcome to the Waitless World
We can give an
IBMID to theGroup
Software orderings
1. ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4
2. A B C D a b c d 4 Á é ñ Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾
3. A @ B # C $ D a b © § ¶ ¼ 4 c d Á é ñ 1 2 3 ! % ^ & * , ½ ¾ Ç 0
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
5. Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® + é ê ë è í î ï ì ß A B C D a b c d
5
Welcome to the Waitless World
Concept: CCSID
1. ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4
2. A B C D a bIBM
c dSoftware
4 Á é ñGroup
Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾
3. A @ B # C $ D a b © § ¶ ¼ 4 c d Á é ñ 1 2 3 ! % ^ & * , ½ ¾ Ç 0
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
5. Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® + é ê ë è í î ï ì ß A B C D a b c d
From just the ID, we can deduce both the character set and the
order.
The ID “4” indicates
It is from the second character set
It is the first ordering from that character set
1. ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4
2. A B C D a bIBM
c dSoftware
4 Á é ñGroup
Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾
3. A @ B # C $ D a b © § ¶ ¼ 4 c d Á é ñ 1 2 3 ! % ^ & * , ½ ¾ Ç 0
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
5. Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® + é ê ë è í î ï ì ß A B C D a b c d
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
6
Welcome to the Waitless World
CCSID conversion
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
Step 1. Find out which character matches each hex value in CCSID 2, for
our data x'011E'
CCSID conversion
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
Step 1. Find out which character matches each hex value in CCSID 2, for
our data x'011E'
7
Welcome to the Waitless World
CCSID conversion
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
Step 1. Find out which character matches each hex value in CCSID 2, for
our data x'011E'
CCSID conversion
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
Step 1. Find out which character matches each hex value in CCSID 2, for
our data x'011E'
8
Welcome to the Waitless World
CCSID conversion
Converting to CCSID 4
IBM Software Group
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
CCSID conversion
Converting to CCSID 4
IBM Software Group
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
9
Welcome to the Waitless World
CCSID conversion
Converting to CCSID 4
IBM Software Group
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
CCSID conversion
Converting to CCSID 4
IBM Software Group
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
10
Welcome to the Waitless World
CCSID conversion
Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F
Result: x’Bn'
Each character
IBMset has aGroup
Software “replacement character” that
represents any character that doesn’t exist in the character
set.
11
Welcome to the Waitless World
1. Single-byte
IBM character set (SBCS)
Software Group
One byte per character
Contains the characters for one type of language
Examples are Latin (European languages), Cyrillic (Russian)
Always includes the standard characters A-Z, a-z, 0-9, + = - etc
12
Welcome to the Waitless World
Single-byte CCSIDs
IBM Software Group
Characters from one SBCS character set
Double-byte CCSIDs
Characters from one DBCS character set
Mixed-byte CCSIDs
Characters from one SBCS set and one DBCS set
Unicode CCSIDs
1208 is the CCSID for UTF-8: 8 bits (1 byte) is the smallest size
1200 is the CCSID for UTF-16: 16 bits (2 bytes) is the smallest size
13488 is the CCSID for UCS-2: similar to UTF-16
Hex CCSIDs
Hex data cannot be converted to another CCSID
© 2014 IBM Corporation
13
Welcome to the Waitless World
All string data has a CCSID. You can use DSPFFD to find out
the CCSID of the data in your files.
You can use the cross reference in your RPG listings to find
out the CCSID of your string variables. If alphanumeric data
doesn’t show a CCSID, then it is the job CCSID.
14
Welcome to the Waitless World
ctl-opt CCSID(*CHAR:37);
dcl-s company char(20);
/set ccsid(*char : *JOBRUN)
dcl-s city char(20);
dcl-s description char(100) ccsid(*utf8);
/restore ccsid(*char)
temp_name = name;
15
Welcome to the Waitless World
16
Welcome to the Waitless World
alphaText = unicodeText;
if %status() = 50;
... there was loss of data
Two problems:
• It’s awkward to check for status code 50 after every statement with a
CCSID conversion
• It’s not always easy to tell which statements have CCSID conversions
If you are on 6.1 and 7.1 and get this function through a PTF, you will need
to add messages RNX0452 and RNQ0452 to your message file. The
cover letter of the PTF for the RPG runtime has CLP code for adding the
messages. See the RPG Cafe for PTF details.
17
Welcome to the Waitless World
RNF7357 Conversion from UCS-2 to Alpha might not convert all data.
RNF7358 Conversion from UCS-2 to DBCS might not convert all data.
RNF7359 Conversion from DBCS to DBCS might not convert all data.
RNF7360 Conversion from Alpha to DBCS might not convert all data.
RNF7361 Conversion from DBCS to Alpha might not convert all data.
18
Welcome to the Waitless World
19
Welcome to the Waitless World
Prior to 7.2, you could call an API to convert the data to UCS-2:
dcl-s stringA char(10000);
dcl-s stringC varucs2(10000);
dcl-pr getData varchar(10000); ...
But remember not to use the data for ordinary RPG statements
because RPG thinks the data is in the job CCSID.
if stringA <> *blanks; // BUG!
stringC = convert...
...
© 2014 IBM Corporation
20
Welcome to the Waitless World
By default, alpha subfields are defined with the job CCSID for
RPG.
If there is invalid data in the record of the file, this allows you to
delay the discovery of the problem
21
Welcome to the Waitless World
The subfields are copied directly between the buffer and the
data structure unless CCSID conversion is required
22
Welcome to the Waitless World
23
Welcome to the Waitless World
• Literals are handled as though they have the job CCSID, but
they are actually saved in the source file CCSID
dcl-pi *n;
parm char(1) const;
end-pi;
if parm = '!';
dsply ('parm is exclamation mark!');
else;
dsply ('parm is not exclamation mark!');
endif;
return;
© 2014 IBM Corporation
24
Welcome to the Waitless World
ctl-opt ccsid(*exact);
dcl-pi *n;
parm char(1) const;
end-pi;
...
25
Welcome to the Waitless World
But by default, RPG thinks that the data is in the default CCSID
of the job. The default CCSID depends on the Language ID
and the Country ID.
Language identifier . . . . . . . . . . . . . . . : ENU
Country or region identifier . . . . . . . . . . : US
Coded character set identifier . . . . . . . . . : 65535
Default coded character set identifier . . . . . : 37
If all the fields in the file have an EBCDIC CCSID, and all the
characters in the data have the same hex values as the
characters in the job CCSID, everything is fine
If some fields in the file are UTF-8, RPG will not handle the
data correctly (by default)
• Using UTF-8 data with RPG is unusable if the job CCSID
might be 65535
26
Welcome to the Waitless World
If DATA(*NOCVT) is coded
IBM Software for the file, the data in the I/O buffer
Group
will always have the same CCSID as the fields in the file.
Doing any one of these will cause RPG to handle the I/O
buffers correctly.
27
Welcome to the Waitless World
UTF-8 data
UTF-8 data
28
Welcome to the Waitless World
UTF-8 data
• %len
• substring
• scan
• truncation on assignment
Recommendations
29
Welcome to the Waitless World
Special notices
This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these off erings
available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information
on the IBM offerings available in your area.
IBM Software Group
Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources.
Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this documen t does not
give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive,
Armonk, NY 10504-1785 USA.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent go als and
objectives only.
The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or
guarantees either expressed or implied.
All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used
and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client
configurations and conditions.
IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries an d divisions
worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type,
equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to chang e, extension
or withdrawal without notice.
IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.
All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and
are dependent on many factors including system hardware configuration and software design and configuration. Some measuremen ts
quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the
same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation.
Users of this document should verify the applicable data for their specific environment.
30
Welcome to the Waitless World
Special notices
IBM, the IBM logo, ibm.com AIX, AIX (logo), AIX 6 (logo), AS/400, BladeCenter, Blue Gene, ClusterProven, DB2, ESCON, i5/OS, i5/OS (logo), IBM
Business Partner (logo), IntelliStation, LoadLeveler, Lotus, Lotus Notes, Notes, Operating System/400, OS/400, PartnerLink, PartnerWorld, PowerPC,
IBM Software Group
pSeries, Rational, RISC System/6000, RS/6000, THINK, Tivoli, Tivoli (logo), Tivoli Management Environment, WebSphere, xSeries, z/OS, zSeries, AIX
5L, Chiphopper, Chipkill, Cloudscape, DB2 Universal Database, DS4000, DS6000, DS8000, EnergyScale, Enterprise Workload Manager, General
Purpose File System, , GPFS, HACMP, HACMP/6000, HASM, IBM Systems Director Active Energy Manager, iSeries, Micro-Partitioning, POWER,
PowerExecutive, PowerVM, PowerVM (logo), PowerHA, Power Architecture, Power Everywhere, Power Family, POWER Hypervisor, Power Systems,
Power Systems (logo), Power Systems Software, Power Systems Software (logo), POWER2, POWER3, POWER4, POWER4+, POWER5, POWER5+,
POWER6, POWER6+, System i, System p, System p5, System Storage, System z, Tivoli Enterprise, TME 10, Workload Partitions Manager and X-
Architecture are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If
these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols
indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered
or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at
www.ibm.com/legal/copytrade.shtml
The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by
Power.org.
UNIX is a registered trademark of The Open Group in the United States, other countries or both.
Linux is a registered trademark of Linus Torvalds in the United States, other countries or both.
Microsoft, Windows and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries or both.
Intel, Itanium, Pentium are registered trademarks and Xeon is a trademark of Intel Corporation or its subsidiaries in the United States, other countries or
both.
AMD Opteron is a trademark of Advanced Micro Devices, Inc.
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries or both.
TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC).
SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and
SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC).
NetBench is a registered trademark of Ziff Davis Media in the United States, other countries or both.
AltiVec is a trademark of Freescale Semiconductor, Inc.
Cell Broadband Engine is a trademark of Sony Computer Entertainment Inc.
InfiniBand, InfiniBand Trade Association and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade Association.
Other company, product and service names may be trademarks or service marks of others.
31