0% found this document useful (0 votes)
61 views

Fun With Ccsids: Working With Unicode and Other Types of Data in RPG

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Fun With Ccsids: Working With Unicode and Other Types of Data in RPG

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Welcome to the Waitless World

Fun with CCSIDs


IBM Software Group
Working with Unicode and
Other Types of Data in RPG

Session ID: 610060


Agenda Key: 31AM
Speaker: Barbara Morris

© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

First, let’s just consider “Data”

What does x'7d'


IBMmean?
Software Group

' apostrophe?
} curly brace?
-7 minus 7?
125 one hundred twenty five?

Yes, all of those. And many more things.

© 2014 IBM Corporation

© 2016 IBM Corporation

1
Welcome to the Waitless World

We have to interpret it

By itself, x'7d'IBM
doesn't mean
Software Groupanything.

We have to know how to interpret it

' EBCDIC
} ASCII
-7 1-byte packed decimal
125 1-byte integer

© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

Another interpretation

It could be even IBMbe specific


Software to a particular program, where each
Group
bit has a different meaning. x'7d' = b'0111 1110'

Double occupancy False


»Prepaid True
»Vegetarian True
»Fitness class True

»Buenos Aires side trip True


»Swimming pool access True
»Returning customer True
»VIP False
© 2014 IBM Corporation

© 2016 IBM Corporation

2
Welcome to the Waitless World

Character data

IBM Software Group


Let's just focus on character data.
' EBCDIC
} ASCII
-7 packed decimal
125 integer
bit data about a booking

But we still have an interpretation problem.

And in general, it's not just a matter of ASCII vs EBCDIC.

© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

Concept: Character set

There are many


IBMdifferent character sets
Software Group

Latin:

Cyrillic:

Japanese:

© 2014 IBM Corporation

© 2016 IBM Corporation

3
Welcome to the Waitless World

Concept: Character set

IBM Software Group


Think of a “character set” as the tiles for a game. You get 256
tiles.

• 26 have A-Z
• Another 26 have a-z
• 10 have 0-9
• And there's the accented letters like Á, é, ñ, Ç
• And many characters like this; !@#$%^&*,©§¶¼½¾

© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

The characters can be ordered in many ways

Let's considerIBM
anSoftware
imaginary character set with just 32
Group
characters:

! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4

There are many many ways to order these characters.

! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4
A B C D a b c d 4 Á é ñ Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾
A @ B # C $ D a b © § ¶ ¼ 4 c d Á é ñ 1 2 3 ! % ^ & * , ½ ¾ Ç 0

But they are always the same characters from that set.

© 2014 IBM Corporation

© 2016 IBM Corporation

4
Welcome to the Waitless World

The characters can be ordered in many ways

We can give an
IBMID to theGroup
Software orderings

1. ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4
2. A B C D a b c d 4 Á é ñ Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾
3. A @ B # C $ D a b © § ¶ ¼ 4 c d Á é ñ 1 2 3 ! % ^ & * , ½ ¾ Ç 0

We have IDs for 3 different orderings for our character set.

© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

Imagine another character set

Here's another imaginary


IBM character set with a couple of
Software Group
orderings.
Some of the characters are the same as for the previous set.

4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
5. Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® + é ê ë è í î ï ì ß A B C D a b c d

We have 2 IDs for our second character set.

© 2014 IBM Corporation

© 2016 IBM Corporation

5
Welcome to the Waitless World

Concept: CCSID

1. ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4
2. A B C D a bIBM
c dSoftware
4 Á é ñGroup
Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾
3. A @ B # C $ D a b © § ¶ ¼ 4 c d Á é ñ 1 2 3 ! % ^ & * , ½ ¾ Ç 0

4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
5. Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® + é ê ë è í î ï ì ß A B C D a b c d

From just the ID, we can deduce both the character set and the
order.
The ID “4” indicates
 It is from the second character set
 It is the first ordering from that character set

“4” is the “Coded Character Set ID” or CCSID


© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

Knowing the CCSID lets us interpret the data

1. ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾ A B C D a b c d Á é ñ Ç 0 1 2 3 4
2. A B C D a bIBM
c dSoftware
4 Á é ñGroup
Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾
3. A @ B # C $ D a b © § ¶ ¼ 4 c d Á é ñ 1 2 3 ! % ^ & * , ½ ¾ Ç 0

4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +
5. Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® + é ê ë è í î ï ì ß A B C D a b c d

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Let's say the hex value of a byte is x'0B'.


If we know the CCSID, we know the character
CCSID 2: ñ
CCSID 4: C

© 2014 IBM Corporation

© 2016 IBM Corporation

6
Welcome to the Waitless World

CCSID conversion

Let’s convert some data from CCSID 2 to CCSID 4, x’011E’


IBM Software Group
2. A B C D a b c d 4 Á é ñ Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Step 1. Find out which character matches each hex value in CCSID 2, for
our data x'011E'

Hex value in Character


CCSID 2
x'01'

© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

CCSID conversion

Finding the CCSID 2 characters for our hex data


IBM Software Group
2. A B C D a b c d 4 Á é ñ Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Step 1. Find out which character matches each hex value in CCSID 2, for
our data x'011E'

Hex value in Character


CCSID 2
x'01' B

© 2014 IBM Corporation

© 2016 IBM Corporation

7
Welcome to the Waitless World

CCSID conversion

Finding the CCSID 2 characters for our hex data


IBM Software Group
2. A B C D a b c d 4 Á é ñ Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Step 1. Find out which character matches each hex value in CCSID 2, for
our data x'011E'

Hex value in Character


CCSID 2
x'01' B
x'1E'

© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

CCSID conversion

Finding the CCSID 2 characters for our hex data


IBM Software Group
2. A B C D a b c d 4 Á é ñ Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Step 1. Find out which character matches each hex value in CCSID 2, for
our data x'011E'

Hex value in Character


CCSID 2
x'01' B
x'1E' ½

© 2014 IBM Corporation

© 2016 IBM Corporation

8
Welcome to the Waitless World

CCSID conversion

Converting to CCSID 4
IBM Software Group
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Step 2. Find the characters in the second character set

Character Hex value in


CCSID 4
B
½

© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

CCSID conversion

Converting to CCSID 4
IBM Software Group
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Step 2. Find the characters in the second character set

Character Hex value in


CCSID 4
B x'0A'
½

© 2014 IBM Corporation

© 2016 IBM Corporation

9
Welcome to the Waitless World

CCSID conversion

Converting to CCSID 4
IBM Software Group
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Step 2. Find the characters in the second character set

Character Hex value in


CCSID 4
B x'0A'
½ Does not exist!

© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

CCSID conversion

Converting to CCSID 4
IBM Software Group
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Step 2. Find the characters in the second character set

Character Hex value in


CCSID 4
B x'0A'
½ Where is it? There's nowhere
for it to go. There's
no ½ character.

© 2014 IBM Corporation

© 2016 IBM Corporation

10
Welcome to the Waitless World

CCSID conversion

We cannot completely convert x'011E' from CCSID 2 to CCSID 4


IBM Software Group
2. A B C D a b c d 4 Á é ñ Ç 0 1 2 3 ! @ # $ % ^ & * , © § ¶ ¼ ½ ¾
4. é ê ë è í î ï ì ß A B C D a b c d Á é ñ Ç 0 1 2 3 4 ¿ Ð Ý Þ ® +

Hex
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

Hex value in Character Hex value in


CCSID 2 CCSID 4
x'01' B x'0A'
X'1E' ½ ????

Result: x’Bn'

Data has been lost!


© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

Loss of data during CCSID conversion

Each character
IBMset has aGroup
Software “replacement character” that
represents any character that doesn’t exist in the character
set.

When there is no matching character in the target character


set, the replacement character is used instead.

If we assume that x'FE' is the replacement character for our


CCSID 4, then the result of the previous conversion is
x'0AFE'

We converted the value 'B½' to 'Bn' where n represents the


replacement character x'FE'.
© 2014 IBM Corporation

© 2016 IBM Corporation

11
Welcome to the Waitless World

Loss of data during CCSID conversion

Let's try to convert the CCSID


IBM Software Group 4 data back to CCSID 2

Assume the replacement character is also x'FE' in CCSID 2.

Original CCSID 2 data: x'011E' = 'B½'


Converted CCSID 4 data: x'0AFE' = 'Bn'
Converted CCSID 2 data: x'01FE' = 'Bn'

We cannot get back to the original CCSID 2 data.

Solution: Don't even try to convert to a character


set that might not have some characters
© 2014 IBM Corporation

© 2016 IBM Corporation

Welcome to the Waitless World

Three kinds of character sets

1. Single-byte
IBM character set (SBCS)
Software Group
 One byte per character
 Contains the characters for one type of language
 Examples are Latin (European languages), Cyrillic (Russian)
 Always includes the standard characters A-Z, a-z, 0-9, + = - etc

2. Double-byte character set (DBCS)


 Also called "Graphic"
 Two bytes per character
 Contains the characters for one graphic language
 Examples are Chinese and Japanese

© 2014 IBM Corporation

© 2016 IBM Corporation 24

12
Welcome to the Waitless World

Three kinds of character sets

3. Unicode IBM Software Group


 1-3 bytes per character for UTF-8
 2-4 bytes per character for UCS-2 and UTF-16
 Contains the characters from all SBCS and DBCS character sets a
The Unicode character set will solve our problem. If we
have data that might have characters from different
languages, using Unicode is the only way to avoid losing
data.

In RPG, there are three types of Unicode data:


• UCS-2: data type C with CCSID(13488)
• UTF-16: data type C with CCSID(1200)
• UTF-8: data type A with CCSID(*UTF8) 7.2 +
© 2014 IBM Corporation

© 2016 IBM Corporation 25

Welcome to the Waitless World

Several kinds of CCSIDs

Single-byte CCSIDs
IBM Software Group
 Characters from one SBCS character set

Double-byte CCSIDs
 Characters from one DBCS character set

Mixed-byte CCSIDs
 Characters from one SBCS set and one DBCS set

Unicode CCSIDs
 1208 is the CCSID for UTF-8: 8 bits (1 byte) is the smallest size
 1200 is the CCSID for UTF-16: 16 bits (2 bytes) is the smallest size
 13488 is the CCSID for UCS-2: similar to UTF-16

Hex CCSIDs
 Hex data cannot be converted to another CCSID
© 2014 IBM Corporation

© 2016 IBM Corporation 26

13
Welcome to the Waitless World

Working with character data in RPG

From now on,IBM


I'mSoftware
going to use the term "string data" to refer to
Group
alphanumeric, graphic and UCS-2 data. The term "character"
often just means RPG's CHAR (A) data type.

All string data has a CCSID. You can use DSPFFD to find out
the CCSID of the data in your files.

You can use the cross reference in your RPG listings to find
out the CCSID of your string variables. If alphanumeric data
doesn’t show a CCSID, then it is the job CCSID.

© 2014 IBM Corporation

© 2016 IBM Corporation 27

Welcome to the Waitless World

Working with character data in RPG

RPG defaults:IBM Software Group


• Alphanumeric: job CCSID
• Graphic: CCSID is ignored by default
• UCS-2: 13488

You can use the CCSID H spec keyword to set different


defaults. For example
CCSID(*CHAR:37) CCSID(*UCS2:1200)

© 2014 IBM Corporation

© 2016 IBM Corporation 28

14
Welcome to the Waitless World

Working with character data in RPG

You can use the


IBM /SET and
Software /RESTORE directives to
Group
temporarily set different defaults for your definitions

ctl-opt CCSID(*CHAR:37);
dcl-s company char(20);
/set ccsid(*char : *JOBRUN)
dcl-s city char(20);
dcl-s description char(100) ccsid(*utf8);
/restore ccsid(*char)

• "company" has CCSID 37, from the H spec default


• "city" has CCSID *JOBRUN from the /SET default
• "description" has CCSID *UTF8 (1208) from its CCSID keyword
© 2014 IBM Corporation

© 2016 IBM Corporation 29

Welcome to the Waitless World

Implicit CCSID conversion

Normally, RPG automatically


IBM Software Group does whatever CCSID
conversion is needed.

dcl-s name varchar(20) ccsid(*jobrun);


dcl-s temp_name varucs2(20);

temp_name = name;

• "temp_name" has a different CCSID from "name"

• RPG automatically converts the data in "name" from the job


CCSID to UCS-2 when assigning to "temp_name"

© 2014 IBM Corporation

© 2016 IBM Corporation 30

15
Welcome to the Waitless World

Explicit CCSID conversion

There are a few


IBMscenarios, such as some built-in functions,
Software Group
where RPG does not yet do automatic CCSID conversion

You can explicitly request CCSID conversion using the


%CHAR, %UCS2 or %GRAPH built-in functions.

Assume that "description" is defined as UCS-2:

p = %scan('?' : description); // not supported

p = %scan(%char('?' : *utf8) : description); // ok

© 2014 IBM Corporation

© 2016 IBM Corporation 31

Welcome to the Waitless World

Warnings or exceptions for CCSID conversions

We saw that a IBM


CCSID conversion
Software Group may sometimes result in a
“substitution” character being placed in the result.

Unicode source data:


The Thai word for “house” is “บ้ าน”.

The target is an alphanumeric variable with CCSID 37:


The Thai word for “house” is “”.

CCSID 37 uses the “Latin” character set, and there are no


matching characters for the Thai characters that are in the
Unicode variable. Substitution characters are placed in the
alphanumeric result.

The original Thai characters are all converted to the same


substitution characters, so their value is lost. © 2014 IBM Corporation

© 2016 IBM Corporation 32

16
Welcome to the Waitless World

Warnings or exceptions for CCSID conversions

By default, non-error RPGGroup


IBM Software status code 50 is set when the
conversion had to use substitution characters.

You have to add code to check whether %status = 50

alphaText = unicodeText;
if %status() = 50;
... there was loss of data

Two problems:
• It’s awkward to check for status code 50 after every statement with a
CCSID conversion
• It’s not always easy to tell which statements have CCSID conversions

© 2014 IBM Corporation

© 2016 IBM Corporation 33

Welcome to the Waitless World

Get an exception when substitution occurs

IBM Software Group


CCSIDCVT(*EXCP)

Code H spec keyword CCSIDCVT(*EXCP) to get an exception


when a CCSID conversion results in a substitution
character.

 Status code 00452

If you are on 6.1 and 7.1 and get this function through a PTF, you will need
to add messages RNX0452 and RNQ0452 to your message file. The
cover letter of the PTF for the RPG runtime has CLP code for adding the
messages. See the RPG Cafe for PTF details.

© 2014 IBM Corporation

© 2016 IBM Corporation 34

17
Welcome to the Waitless World

Get an list of CCSID conversions

IBM Software Group


CCSIDCVT(*LIST)

Code H spec keyword CCSIDCVT(*LIST) to get a list of all the


CCSID conversions in the module.

For each conversion, it shows


 The source statements using that conversion
 Whether the conversion might result in substitution characters

If you want both options, code CCSIDCVT(*EXCP:*LIST) or


CCSIDCVT(*LIST:*EXCP)
© 2014 IBM Corporation

© 2016 IBM Corporation 35

Welcome to the Waitless World

Sample CCSIDCVT summary


C C S I D C o n v e r s i o n s
IBM Software Group
From CCSID To CCSID References
RNF7361 834 *JOBRUN 15 25
RNF7357 1200 *JOBRUN 27 921 1073
*JOBRUN 1200 28 12 321 426
552 631
RNF7359 835 834 41 302 302
RNF7360 *JOBRUN 834 242 304 305
* * * * E N D O F C C S I D C O N V E R S I O N S * * * *

RNF7357 Conversion from UCS-2 to Alpha might not convert all data.
RNF7358 Conversion from UCS-2 to DBCS might not convert all data.
RNF7359 Conversion from DBCS to DBCS might not convert all data.
RNF7360 Conversion from Alpha to DBCS might not convert all data.
RNF7361 Conversion from DBCS to Alpha might not convert all data.

© 2014 IBM Corporation

© 2016 IBM Corporation 36

18
Welcome to the Waitless World

How to use the CCSIDCVT summary

You can use this


IBMinformation
Software Groupfor two purposes:

• Improve performance: Reduce the number of conversions


by changing the data types of some of your variables.
• Improve reliability: Eliminate the conversions that have the
potential to result in substitution characters.
For example, if you have conversion from UCS-2 to an
alphanumeric variable, and that alphanumeric data is later
converted back to UCS-2, you may be able to change the type of
the alphanumeric variable to UCS-2, to avoid the potential data
loss.

© 2014 IBM Corporation

© 2016 IBM Corporation 37

Welcome to the Waitless World

Why bother about the CCSID of alpha data?

IBM Software Group


Starting in 7.2, you can code the CCSID keyword for character
fields:

 All EBCDIC CCSIDs


 ASCII CCSIDs
 The Unicode CCSID UTF-8 (1208, or *UTF8)

But why bother? Isn't the job CCSID good enough?

© 2014 IBM Corporation

© 2016 IBM Corporation 38

19
Welcome to the Waitless World

The job CCSID isn't always the right CCSID

Assume that the


IBMgetData procedure returns UTF-8 data
Software Group

Prior to 7.2, you could call an API to convert the data to UCS-2:
dcl-s stringA char(10000);
dcl-s stringC varucs2(10000);
dcl-pr getData varchar(10000); ...

stringA = getData ();


stringC = convert(stringA: %len(stringA): 1208: 13488);

But remember not to use the data for ordinary RPG statements
because RPG thinks the data is in the job CCSID.
if stringA <> *blanks; // BUG!
stringC = convert...
...
© 2014 IBM Corporation

© 2016 IBM Corporation 39

Welcome to the Waitless World

Defining alpha data with a CCSID

In 7.2, you canIBM


saySoftware
that the data is UTF-8.
Group

dcl-s stringA char(10000) ccsid(*utf8);


dcl-pr getData varchar(10000) ccsid(*utf8); ...

stringA = getData ();

Now you can use the returned value in ordinary RPG


statements because RPG knows that it is UTF-8 data.
if stringA <> *blanks; // OK!
stringC = convert...
...

© 2014 IBM Corporation

© 2016 IBM Corporation 40

20
Welcome to the Waitless World

CCSID of externally-described subfields

UCS-2 and graphic subfields


IBM Software Groupalways get the CCSID of the field
in the file.

By default, alpha subfields are defined with the job CCSID for
RPG.

If you want the alpha subfields to be defined with the same


CCSID as the matching fields in the file, code CCSID(*EXACT)
for the data structure.
dcl-ds ds1 likerec(rec) ccsid(*exact);

dcl-ds ds2 extname('MYFILE') ccsid(*exact);

© 2014 IBM Corporation

© 2016 IBM Corporation 41

Welcome to the Waitless World

Using CCSID(*EXACT) files for I/O

When you specify a data Group


IBM Software structure in the result field of your I/O
operation, the data is copied directly between the I/O buffer and
your data structure

If there is invalid data in the record of the file, this allows you to
delay the discovery of the problem

read rec ds;


monitor;
salary = ds.salary;
on-error;
... problem with the "salary" field

© 2014 IBM Corporation

© 2016 IBM Corporation 42

21
Welcome to the Waitless World

Using CCSID(*EXACT) files for I/O

But what happens if the data


IBM Software Groupstructure was defined with
CCSID(*EXACT)? The subfields might have a different CCSID.

The subfields are copied directly between the buffer and the
data structure unless CCSID conversion is required

Field Buffer CCSID Handled … DS CCSID


ID - Move bytes -
NAME Job Converted 37
SALARY - Move bytes -
STARTDATE - -
ADDRESS Job Converted 37
DESC Job Converted 1208
BONUS - Move bytes -
© 2014 IBM Corporation

© 2016 IBM Corporation 43

Welcome to the Waitless World

Avoiding CCSID conversions for database files

By default, for IBM


a database file
Software Group
 When you read a record, database converts the alphanumeric
data from the field CCSID to the job CCSID
 When you write or update a record, database converts the
alphanumeric data from the job CCSID to the field CCSID

Use DATA(*NOCVT) for a file to open the file so these


conversions do not happen at the database level. Any
CCSID conversions will be performed in the RPG program if
necessary.
Use H spec OPENOPT(*NOCVTDATA) to default this
behaviour for all database files.
© 2014 IBM Corporation

© 2016 IBM Corporation 44

22
Welcome to the Waitless World

Using CCSID(*EXACT) with DATA(*NOCVT)

If the data structure is defined


IBM Software Group with CCSID(*EXACT) and the file
is defined with DATA(*NOCVT), the data in the buffer and the
data structure will have the same CCSID

Field Buffer CCSID Handled … DS CCSID


ID - Move bytes -
NAME 37 37
SALARY - -
STARTDATE - -
ADDRESS 37 37
DESC 1208 1208
BONUS - -
© 2014 IBM Corporation

© 2016 IBM Corporation 45

Welcome to the Waitless World

Using DATA(*NOCVT) without data structures

If you don't specify a dataGroup


IBM Software structure for your I/O operation, the
data is always moved between the fields and the buffer
individually.

CCSID conversion is automatically done as part of this.

© 2014 IBM Corporation

© 2016 IBM Corporation 46

23
Welcome to the Waitless World

RPG and literals

In the past, RPG IBMdid not always


Software Group handle CCSIDs correctly (and
still does not, by default)

• Literals are handled as though they have the job CCSID, but
they are actually saved in the source file CCSID
dcl-pi *n;
parm char(1) const;
end-pi;

if parm = '!';
dsply ('parm is exclamation mark!');
else;
dsply ('parm is not exclamation mark!');
endif;

return;
© 2014 IBM Corporation

© 2016 IBM Corporation 47

Welcome to the Waitless World

RPG and literals

Here is the hexIBM


value of the
Software "if" statement in a source file with
Group
CCSID(37):

if parm = '!'; ! is x'5A'


88498994747575 Here is where
96071940E0DADE
the fun starts!
In a source file with CCSID(500):

if parm = '|'; ! is x'4F'


88498994747475
96071940E0DFDE

(My emulator is setup with CCSID 37, so DSPPFM is showing


the data as if it were CCSID 37 data …)

© 2014 IBM Corporation

© 2016 IBM Corporation 48

24
Welcome to the Waitless World

RPG and literals

I compile the two


IBMprograms. The ! character is saved as x'5A' in
Software Group
TEST37, and as x'4F' in TEST500.

I call the programs:

> call bmorris/test37 '!'


DSPLY parm is exclamation mark!
> call bmorris/test500 '!'
DSPLY parm is not exclamation mark|

Two problems with test500:


1. I did pass an exclamation mark!
2. The DSPLY shows | instead of !

© 2014 IBM Corporation

© 2016 IBM Corporation 49

Welcome to the Waitless World

To have RPG handle literals correctly

Add CCSID(*EXACT) to the


IBM Software H spec.
Group

ctl-opt ccsid(*exact);

dcl-pi *n;
parm char(1) const;
end-pi;
...

I call the new programs. They both work correctly. The


TEST500 program understands that the literal must be
converted to the job CCSID.

> call bmorris/test37 '!'


DSPLY parm is exclamation mark!
> call bmorris/test500 '!'
DSPLY parm is exclamation mark!
© 2014 IBM Corporation

© 2016 IBM Corporation 50

25
Welcome to the Waitless World

RPG and I/O with CCSID 65535

• The job CCSID is 65535


IBM Software Group
• A database file is opened to have the alpha data converted
to the job CCSID
• No CCSID conversion is done by database
• The data in the buffer has the same CCSID as the data in
the file.

But by default, RPG thinks that the data is in the default CCSID
of the job. The default CCSID depends on the Language ID
and the Country ID.
Language identifier . . . . . . . . . . . . . . . : ENU
Country or region identifier . . . . . . . . . . : US
Coded character set identifier . . . . . . . . . : 65535
Default coded character set identifier . . . . . : 37

© 2014 IBM Corporation

© 2016 IBM Corporation 51

Welcome to the Waitless World

RPG and I/O with CCSID 65535

If the default job CCSID


IBM SoftwareisGroup
the same as the CCSID of all the
alpha fields in the file, everything is fine

If all the fields in the file have an EBCDIC CCSID, and all the
characters in the data have the same hex values as the
characters in the job CCSID, everything is fine

If some fields in the file are UTF-8, RPG will not handle the
data correctly (by default)
• Using UTF-8 data with RPG is unusable if the job CCSID
might be 65535

© 2014 IBM Corporation

© 2016 IBM Corporation 52

26
Welcome to the Waitless World

DATA(*CVT) with CCSID 65535

If DATA(*NOCVT) is coded
IBM Software for the file, the data in the I/O buffer
Group
will always have the same CCSID as the fields in the file.

If DATA(*CVT) is coded, then the data in the I/O buffer might


have the job CCSID or the CCSID of the fields in the file,
depending on the job CCSID at runtime.

The RPG compiler will generate code to handle both situations,


so your program will work with a job CCSID of either 65535 or
another CCSID.

© 2014 IBM Corporation

© 2016 IBM Corporation 53

Welcome to the Waitless World

RPG and I/O with CCSID 65535

To make RPG IBM


handle database
Software Group files correctly when your job
CCSID is 65535, do any of the following

• Code CCSID(*EXACT) in the H spec


• Code the DATA keyword for the file, either DATA(*CVT) or
DATA(*NOCVT)
• Define the CCSID for all the alpha fields in the RPG
program (CCSID keyword on externally-described DS,
CCSID keyword on fields)

Doing any one of these will cause RPG to handle the I/O
buffers correctly.

© 2014 IBM Corporation

© 2016 IBM Corporation 54

27
Welcome to the Waitless World

UTF-8 data

UTF-8 data can have


IBM characters
Software Group which are 1, 2, or 3 bytes
long.

For example, the Ø character is two bytes in UTF-8.


dcl-s a37 varchar(4) ccsid(37) inz('Ø');
dcl-s a1208 varchar(4) ccsid(*utf8) inz('Ø');
return;

In debug (the first two bytes are the length)


> EVAL a37:x
00000 000180 1 byte, x'80'
> EVAL a1208:x
00000 0002C398 2 bytes, x'c398'

© 2014 IBM Corporation

© 2016 IBM Corporation 55

Welcome to the Waitless World

UTF-8 data

Since some characters may


IBM Software be up to 3 bytes long, UTF-8 data
Group
is longer than EBCDIC alphanumeric data.

If you define temporary UTF-8 fields, be sure to define them


long enough

© 2014 IBM Corporation

© 2016 IBM Corporation 56

28
Welcome to the Waitless World

UTF-8 data

RPG does not IBM


consider
Softwarethe length of each character in string
Group
operations

• %len
• substring
• scan
• truncation on assignment

These are all handled by RPG on a byte basis, not a character


basis

(This issue has always existed for mixed SBCS/DBCS data)

© 2014 IBM Corporation

© 2016 IBM Corporation 57

Welcome to the Waitless World

Recommendations

• Code CCSID(*EXACT) so that RPG will handle CCSIDs


IBM Software Group
correctly
• Code CCSIDCVT(*EXCP) so that conversion failures will
cause an exception
• Code CCSIDCVT(*LIST), possibly temporarily, and study the
CCSID conversions that are happening in your module
• Be sure to define UTF-8 fields long enough

© 2014 IBM Corporation

© 2016 IBM Corporation 58

29
Welcome to the Waitless World

IBM Software Group

© Copyright IBM Corporation 2016. All rights reserved.


The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible
for any damages arising out of the use of, or otherwise related to, these materials. Nothing contained in these materials is intended to, nor shall have the effect of, creating any warranties or
representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in these materials
to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in these materials may
change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way.
IBM, the IBM logo, the on-demand business logo, Rational, the Rational logo, and other IBM products and services are trademarks of the International Business Machines Corporation,
in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others.
© 2014 IBM Corporation

© 2016 IBM Corporation 59

Welcome to the Waitless World

Special notices
This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these off erings
available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information
on the IBM offerings available in your area.
IBM Software Group
Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources.
Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this documen t does not
give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive,
Armonk, NY 10504-1785 USA.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent go als and
objectives only.
The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or
guarantees either expressed or implied.
All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used
and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client
configurations and conditions.
IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries an d divisions
worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type,
equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to chang e, extension
or withdrawal without notice.
IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.
All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and
are dependent on many factors including system hardware configuration and software design and configuration. Some measuremen ts
quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the
same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation.
Users of this document should verify the applicable data for their specific environment.

© 2014 IBM Corporation

© 2016 IBM Corporation 60

30
Welcome to the Waitless World

Special notices
IBM, the IBM logo, ibm.com AIX, AIX (logo), AIX 6 (logo), AS/400, BladeCenter, Blue Gene, ClusterProven, DB2, ESCON, i5/OS, i5/OS (logo), IBM
Business Partner (logo), IntelliStation, LoadLeveler, Lotus, Lotus Notes, Notes, Operating System/400, OS/400, PartnerLink, PartnerWorld, PowerPC,
IBM Software Group
pSeries, Rational, RISC System/6000, RS/6000, THINK, Tivoli, Tivoli (logo), Tivoli Management Environment, WebSphere, xSeries, z/OS, zSeries, AIX
5L, Chiphopper, Chipkill, Cloudscape, DB2 Universal Database, DS4000, DS6000, DS8000, EnergyScale, Enterprise Workload Manager, General
Purpose File System, , GPFS, HACMP, HACMP/6000, HASM, IBM Systems Director Active Energy Manager, iSeries, Micro-Partitioning, POWER,
PowerExecutive, PowerVM, PowerVM (logo), PowerHA, Power Architecture, Power Everywhere, Power Family, POWER Hypervisor, Power Systems,
Power Systems (logo), Power Systems Software, Power Systems Software (logo), POWER2, POWER3, POWER4, POWER4+, POWER5, POWER5+,
POWER6, POWER6+, System i, System p, System p5, System Storage, System z, Tivoli Enterprise, TME 10, Workload Partitions Manager and X-
Architecture are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If
these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols
indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered
or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at
www.ibm.com/legal/copytrade.shtml

The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by
Power.org.
UNIX is a registered trademark of The Open Group in the United States, other countries or both.
Linux is a registered trademark of Linus Torvalds in the United States, other countries or both.
Microsoft, Windows and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries or both.
Intel, Itanium, Pentium are registered trademarks and Xeon is a trademark of Intel Corporation or its subsidiaries in the United States, other countries or
both.
AMD Opteron is a trademark of Advanced Micro Devices, Inc.
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries or both.
TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC).
SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and
SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC).
NetBench is a registered trademark of Ziff Davis Media in the United States, other countries or both.
AltiVec is a trademark of Freescale Semiconductor, Inc.
Cell Broadband Engine is a trademark of Sony Computer Entertainment Inc.
InfiniBand, InfiniBand Trade Association and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade Association.
Other company, product and service names may be trademarks or service marks of others.

© 2014 IBM Corporation

© 2016 IBM Corporation 61

31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy