Many of the world's languages use sets of characters that run into the thousands. Most computers use 8-bit bytes, and assign a different 8-bit code to represent each character; this scheme can represent no more than 256 different characters.
Ideally a COBOL programmer should not need to be aware of the internal code used to represent characters. However, in practice some features of the internal code can affect the source programmer, and this limitation to 256 different characters is one of the most restricting of these.
For this reason the Double-Byte Character Set (DBCS) is provided. In this scheme each character is represented by a 16-bit code, each character occupying a pair of adjacent bytes. This scheme can represent thousands of different characters.
The assignment of DBCS character codes to characters varies from country to country.
The 8-bit code used by your COBOL system is the American Standard Code for Information Interchange (ASCII). In this chapter this will be referred to as the Single-Byte Character Set (SBCS).
Double-Byte Character
set support is sensitive to the DBCS Compiler directive.
See also the chapter Micro Focus Extensions for Double-Byte Character Support, primarily for Japanese language support.
The DBCS Compiler directive makes your COBOL compiler recognize two data categories in which data is stored in DBCS. It does not prevent the use of other data categories; thus you can still use those data categories in which data is stored in SBCS.
Provided you have the necessary hardware support, DBCS data items used in input and output will be recognized and their data displayed and accepted correctly on such devices as screens, keyboard, printers, et cetera.
The character set that can be represented by SBCS is based on the Roman alphabet plus some other characters. In some countries the DBCS character codes also include codes for many of these characters.
On some hardware the character displayed is visibly different according to whether the character is stored in SBCS or DBCS; for example on some screens the DBCS code for a letter causes it to be printed larger than does its SBCS code.
Programs written to the NTT Multivendor Integration Architecture (MIA) Support are accepted by the COBOL compiler, using the DBCS and CURRENCY-SIGN"92" directives.
DBCS characters can be used in literals (since literals are data), in comments and comment-entries, and in user-defined words. Otherwise the DBCS directive does not change the range of characters that can be used in source programs - the program is still written using the COBOL character set (see Concepts of the COBOL Language).
There are extensions to the PICTURE and USAGE clauses to define items that are to contain DBCS data. A new format of literal is required for DBCS data.
There are additional rules for various options, clauses and statements to define the behavior of DBCS data.
Except where otherwise stated, all the rules and features of COBOL remain applicable when DBCS is in use. The following sections give only the additional rules and formats pertaining to DBCS.
SBCS and DBCS characters can be mixed freely in comments and comment-entries.
Either SBCS or DBCS characters can be used in user-defined words for: Alphabet-name, Class-name, Condition-name, Data-name/Identifier, Record-name, File-name, Index-name, Mnemonic-name, Paragraph-name, Section-name, and Symbolic-character.
SBCS and DBCS
characters can be freely mixed in user-defined words. Where a character exists
in both the DBCS and SBCS character sets, its DBCS and SBCS representations
will not be regarded as equivalent. See the section
Roman Script in
DBCS.
Spaces in data of class DBCS will be represented by the DBCS code for space. A space character represented by a 2-byte code is referred to as a DBCS space.
The values assigned
to a DBCS space are sensitive to the DBCS and DBSPACE Compiler
directives.
There is a class of data additional to the classes described in the section Class and Category of Data: it is called DBCS. It includes two data categories: DBCS and DBCS edited.
A data item of class DBCS is described by using the USAGE DISPLAY-1 clause. An item with this clause can have only the characters "G" and "B" in its PICTURE character-string. A " G" represents a DBCS character position; "B" is an editing character, and indicates a position that will always have a DBCS space inserted in editing. An item whose PICTURE character-string is all "G"s is of category DBCS; an item whose PICTURE character-string contains both "G" s and "B"s is of category DBCS edited.
Note that each "G" or "B" represents one 2-byte character position. Except where otherwise stated, the length of the data item for all purposes is the number of "G" s and "B"s in its PICTURE character-string.
For reference modification, the leftmost-character-position and length specify the number of DBCS characters, not bytes.
Data items of class DBCS can be used wherever data items of class alphanumeric can be used, subject to rules and exceptions given in the appropriate places in this chapter.
DBCS characters can be included in data stored in data items of category alphanumeric. In such data, SBCS characters are represented by SBCS codes and DBCS characters by DBCS codes. Each space character is represented by the SBCS code for space.
On input and output both the SBCS and the DBCS codes will be recognized. The first byte of a DBCS code is never a valid SBCS code; hence the two can be used together without confusion. But in operations within the program the data will be treated as ordinary alphanumeric data. It is the programmer's responsibility to ensure that the two halves of a DBCS code do not get separated.
The length of the data item for all purposes is its length in bytes.
There is a fourth type of literal in addition to the nonnumeric, numeric and national literals described in the section Literals, the DBCS literal.
A DBCS literal is a character-string delimited at both ends by quotation marks or apostrophes, with the beginning delimiter preceded by a "G". It can consist of any characters in the computer's DBCS character set. It can be up to 28 DBCS characters in length. It cannot be continued across lines.
Whether quotation marks or apostrophes are used, the presence of that delimiter within a DBCS literal can be represented by two contiguous occurrences. The presence of the character that is not serving as the delimiter is represented by a single occurrence. The value of a DBCS literal in the object program is the string of characters itself, except:
All DBCS literals can be used wherever nonnumeric literals can be used, subject to rules and exceptions given in the appropriate places in this chapter.
DBCS characters can be included in nonnumeric literals. A nonnumeric literal that includes DBCS characters is called a mixed literal. In such a literal, SBCS characters are represented by SBCS codes and DBCS characters by DBCS codes. Each space character is represented by the SBCS code for space.
On output both the SBCS and the DBCS codes will be recognized. The first byte of a DBCS code is never a valid SBCS code; hence the two can be used together without confusion. But in operations within the program the literal will be treated as an ordinary nonnumeric literal. It is the programmer's responsibility to ensure that the two halves of a DBCS code do not get separated.
A nonnumeric literal is of category alphanumeric, not DBCS, regardless of whether it includes DBCS characters.
A mixed literal cannot be continued across lines.
This restriction
has been removed.
If a figurative constant is used where only a DBCS literal is allowed (according to the rules concerning classes and categories given in the appropriate places in this chapter), it is a DBCS literal. Each space in this literal is a DBCS space.
Only the figurative constant SPACE(S) can be a DBCS literal.
Another format of literal, equivalent to the DBCS literal, is used in COBOL/370 and the MIA COBOL specification.
N"ABC""DEF"