The Micro Focus Extensions for Double-Byte Character Support is the additional facility we provide as the programming solution for environments using 16-bit coding schemes (DBCS). This facility incorporates every implementation of Japanese language support from earlier Micro Focus products.
(MIA) If you wish your program to comply with the Multivendor Integration Architecture Standard or to be compatible with IBM VS COBOL II, COBOL/370 or IBM SAA you should use the DBCS Support defined earlier in the topic Double-Byte Character Set Support.
8-bit codes used by your COBOL system are referred to as the Single-Byte Character Set (SBCS). 16-bit codes, each character occupying a pair of adjacent bytes are referred as the Double-Byte Character Set (DBCS).
Micro Focus Extensions for Double-Byte Character Support is enabled by the NCHAR or JAPANESE Compiler directives.
When the Micro Focus Extensions for Double-Byte Character Support is enabled, the support defined in the topic Double-Byte Character Set Support is modified. In particular, in this chapter MOVE operations from SBCS to DBCS data items perform SBCS to DBCS conversion.
The classes NCHAR and JAPANESE, and NCHAR-EDITED and JAPANESE-EDITED are synonyms and interchangeable. In this chapter, reference to the class or category NCHAR or the category NCHAR-EDITED is equivalent to the class or category JAPANESE or the category JAPANESE-EDITED respectively.
The NCHAR or JAPANESE directive makes your COBOL compiler recognize the NCHAR data category in which data is stored in DBCS. It does not prevent the use of other SBCS data categories; thus you can still use those data categories in which data is stored in SBCS.
Provided you have the necessary hardware support, NCHAR data items used in input and output are recognized and their data displayed and accepted correctly on such devices as screens, keyboard, printers.
DBCS characters can be used in literals, in comments and comment-entries, and in user-defined words. Otherwise the NCHAR or JAPANESE directives do not change the range of characters that can be used in source programs - the program is still written using the COBOL character set (see the chapter Concepts of the COBOL Language).
There are extensions to the PICTURE and USAGE clauses to define items that are to contain NCHAR data.
There are additional rules for various options, clauses and statements to define the behavior of NCHAR data.
Except where otherwise stated, all the rules and features of COBOL remain applicable when the Micro Focus Extensions for Double-Byte Character Support are in use. The following sections give only the additional rules and formats pertaining to this support.
SBCS and DBCS characters can be mixed freely in comments and comment-entries.
Either SBCS or DBCS characters can be used and mixed freely in user-defined words for:
alphabet-name, | cd-name, | class-name, |
condition-name, | constant-name, | data- name/identifier, |
file-name, | index-name, | level-number, |
library-name, | mnemonic-name, | object-computer-name, |
paragraph-name, | program-name, | record-name, |
report-name, | screen-name, | section-name, |
segment-number, | source-computer-name, | symbolic-character, |
text-name. |
This entry should be considered as an additional syntax rule for each user-defined word specified above. Where a character exists in both the DBCS and SBCS character sets, its DBCS and SBCS representations are not regarded as equivalent.
On some operating systems, only ASCII characters might be permitted for:
external-file-reference, | library-name, | program-name. |
Spaces in data of class NCHAR are represented by the DBCS code for space. A space character represented by a 2-byte code is referred to as a DBCS space.
The values assigned to a DBCS space are sensitive to the NCHAR, JAPANESE and DBSPACE Compiler directives.
In common with all data items that do not have a VALUE clause, data items of class NCHAR initially contain SBCS spaces.
There is a class of data additional to the classes described in the section Class and Category of Data: NCHAR. It includes two data categories: NCHAR and NCHAR-EDITED.
A data item of class NCHAR can be described by using the USAGE NCHAR or USAGE JAPANESE clause. An item with this clause can have only the characters "N", "B", "/" or "0" in its PICTURE character-string.
An item whose PICTURE character-string is all "N"s is of category NCHAR, an item whose PICTURE character-string contains both "N" and "B", "/" or "0" is of category NCHAR-EDITED.
Note that each "N", "B", "/" or "0" represents one 2-byte character position. Except where otherwise stated, the length of the data item for all purposes is the number of "N"s, "B"s, "/"s and "0"s in its PICTURE character-string.
For reference modification, the leftmost-character-position and length specify the number of DBCS characters, not bytes.
Data items of class NCHAR can be used wherever data items of class alphanumeric can be used, subject to rules and exceptions given in the appropriate places in this chapter.
DBCS characters can be included in data stored in data items of category alphanumeric. In such data, SBCS characters are represented by SBCS codes and DBCS characters by DBCS codes. Each space character is represented by the SBCS code for space.
In operations within the program the data are treated as ordinary alphanumeric data. It is the programmer's responsibility to ensure that the two halves of a DBCS code do not get separated.
The length of the data item for all purposes is its length in bytes when stored in machine memory.
There is a fourth type of literal in addition to the nonnumeric, numeric and national literals described in the section Literals, the NCHAR literal.
An NCHAR literal is a character-string delimited at both ends by quotation marks or apostrophes, the character-string can consist of any allowable character in the computer's DBCS character set.
All DBCS literals can be used wherever nonnumeric literals can be used, subject to rules and exceptions given in the appropriate places in this chapter.
DBCS characters can be included in nonnumeric literals. A nonnumeric literal that includes SBCS and DBCS characters is called a mixed literal. In such a literal, SBCS characters are represented by SBCS codes and DBCS characters by DBCS codes. Each space character is represented by the SBCS code for space.
On output both the SBCS and the DBCS codes are recognized. In operations within the program the literal is treated as an ordinary nonnumeric literal. It is the programmer's responsibility to ensure that the two halves of a DBCS code do not get separated.
A mixed literal is of category alphanumeric, not NCHAR.
Whether quotation marks or apostrophes are used as character-string delimiters, the presence of that delimiter in a mixed literal can be represented by two contiguous occurrences. The presence of the character that is not serving as the delimiter is represented by a single occurrence. The value of a mixed literal in the object program is the string of characters itself, except each embedded pair of contiguous delimiter characters represents a single character.
If a figurative constant is used where only an NCHAR literal is allowed (according to the rules concerning classes and categories given in the appropriate places in this chapter), it is an NCHAR literal.
Constant | Representation | Example NCHAR Japanese Values | |
---|---|---|---|
Shift-JIS | EUC | ||
ZERO ZEROS ZEROES | Represents one or more of the double-byte character "0" depending on the context. | x"824F" | x"A3B0" |
SPACE SPACES | Represents one or more of the double-byte character space from the computer's set. | x"8140" | x"A1A1" |
HIGH-VALUE HIGH-VALUES | Represents one or more character that has the highest ordinal position in the program collating sequence. | x"FFFF" | x"FFFF" |
LOW-VALUE LOW-VALUES | Represents one or more character that has the lowest ordinal position in the program collating sequence. | x"0000" | x"0000" |
QUOTE QUOTES | Represents one or more of the double-byte character " " ". | x"818D" | x"A1ED" |