Records and keys in files created using the File Handler can be compressed so they take up less physical disk space. You can enable data compression by using the File Handler to call compression routines from your program.
Data compression enables you to compress the data in a sequential or indexed file. There are two compression mechanisms provided with this COBOL system, run-length encoding (type 1), and extended run-length encoding (type 3).
When a file is defined with run-length encoding, any string of repeating characters is stored as a single character with a repetition count.
You enable data compression using the DATACOMPRESS Compiler directive.
Specifying data compression for a fixed structure sequential file changes it into a variable structure sequential file. For more inormation on file structures see the topics under
The compression used by a file is determined by the last processed DATACOMPRESS directive when the SELECT statement for the file is processed. Consequently, the compression type can be set for an individual file by using a line of the form:
$SET DATACOMPRESS
immediately before its SELECT statement. You must not forget to turn it off with a $SET NODATACOMPRESS before any other files are processed.
For details on the DATACOMPRESS Compiler directive, see the topic DATACOMPRESS.
Note: We recommend that you do not use the REWRITE statement on compressed sequential files. This is because a REWRITE operation will only succeed if the length of the compressed new record is the same as the length of the compressed old record.
Key compression is a technique that can be applied to the keys of an indexed file. There are four types of compression available:
You can specify key compression using the KEYCOMPRESS option in the File Handler configuration file. Use the following integers to indicate which type of compression you want:
1 | Duplicate key compression. |
2 | Leading character compression. |
4 | Trailing space compression. |
8 | Trailing null compression. |
You can add these numbers together to specify a combination of compression types (with the exception of trailing nulls and trailing spaces which are mutually exclusive).
Alternatively, you can use the KEYCOMPRESS Compiler directive when compiling the program.
The key compression used by a file is determined by the last processed KEYCOMPRESS directive when the SELECT statement for the file is processed so you can set key compression for an individual file by using a line of the form:
$SET KEYCOMPRESS"8"
immediately before its SELECT statement in your program. You can turn it off again by specifying $SET NOKEYCOMPRESS before any other files are processed.
For details on the KEYCOMPRESS Compiler directive, see the topic KEYCOMPRESS.
When a key is defined with compression of trailing nulls, trailing nulls in a key value are not stored in the file.
For example, assume you have a primary or alternate key that is 30 characters long, and that you write a record in which only the first 10 characters of the key are used, the rest being nulls. Without compression, all 30 characters of the key are stored requiring 30 bytes. With compression of trailing nulls, only 11 bytes are required (10 bytes for the 10 characters of the key and 1 byte which is used to maintain a count of the trailing nulls).
When a key is defined with compression of trailing spaces, trailing spaces in a key value are not stored in the file. However, information is stored so that the key can be correctly located.
For example, assume you have a prime or alternate key that is 30 characters long, and that you write a record in which only the first 10 characters of the key are used, the rest being spaces. Without compression, all 30 characters of the key are stored. With compression of trailing spaces, the key only occupies 11 bytes in the index file (10 bytes for the characters of the key and 1 byte as a count of the trailing spaces).
When a key is defined with compression of leading characters, all leading characters that match leading characters in the preceding key are not stored in the index file. However, information is stored to enable the key to be correctly reconstructed.
For example, assume that records are written with the following key values in a key defined with compression of leading characters:
AXYZBBB BBCDEFG BBCXYZA BBCXYEF BEFGHIJ CABCDEF
The keys actually stored in the index file are:
AXYZBBB BBCDEFG XYZA EF EFGHIJ CABCDEF
When an alternate key is defined with compression of duplicates, only the first duplicate key is contained in the file. The rest are not stored, but information is stored to enable correct recreation of the keys.
For example, suppose you write a record with an alternate key value "ABC". If you have enabled compression of duplicate keys, and you write another record with the same key value, the file handler does not physically store the duplicate key value in the index file. However, the record is still available along the alternate key path.
In the following program, data compression is specified for transfile but not for masterfile. For key compression, suppression of trailing spaces and of leading characters that are the same as in the previous key is specified for keys t-rec-key and m-rec-key. Suppression of repetition of duplicate keys is also turned on for m-alt-key-1 and m-alt-key-2.
$set callfh"extfh" $set datacompress"1" $set keycompress"6" select transfile assign to ... key is t-rec-key. $set nokeycompress $set nodatacompress
select masterfile assign to ... organization is indexed $set keycompress"6" record key is m-rec-key
$set keycompress"7" alternate key is m-alt-key-1 with duplicates alternate key is m-alt-key-2. $set nokeycompress
The routines that the File Handler uses to compress data are stand-alone modules. This means that you can use them in your own applications, or alternatively make the File Handler use your own data compression routine.
There can be up to 127 Micro Focus compression routines, and up to 127 user-supplied compression routines.
Micro Focus routines are stored in modules called CBLDCnnn, where nnn is within the range 001 to 127. To use Micro Focus compression routines, set fcd-data-compress in the FCD to a value between 001 and 127.
The compression routine CBLDC001 uses use a form of run-length encoding. This is a method of compression that detects strings (runs) of the same character and reduces them to an identifier, a count and one occurrence of the character.
Note: This routine is not effective for use with files that contain significant occurrences of double-byte characters, including double-byte spaces, as these are not compressed.
CBLDC001 put special emphasis on runs of spaces, binary zeros and character zeros (that can be reduced to a single character) and printable characters (that are reduced to two characters consisting of a count followed by the repeated character).
In the compressed file, bytes have the following meanings (hexadecimal values shown):
20-7F | (most printable characters) normal ASCII meaning. |
80-9F | 1-32 spaces respectively. |
A0-BF | 1-32 binary zeros respectively. |
C0-DF | 1-32 character zeros respectively. |
E0-FF | 1-32 occurrences of the character following. |
00-1F | 1-32 occurrences of the character following, and
that it should be interpreted literally, not as a compression code.
This is used when characters in the range 00-1F, 80-9F, A0-BF, C0-DF or E0-FF occur in the original data. (Thus, one such character is expanded to two bytes; otherwise, no penalty is incurred by the compression.) |
Like CBLDC00, this routine uses run-length encoding, but detects strings (runs) of single- or double-byte characters. This routine is therefore suitable for DBCS characters, but can also be used in place of CBLDC001.
The format of the compression is two header bytes followed by one or more characters. The bits in the header bytes indicate:
bit 15 | Unset - single character. |
bit 14 | Set - compressed sequence. Unset - uncompressed sequence. |
bit 0-13 | Compressed character(s) or count of uncompressed characters. |
The length of the character string depends on the header bits:
bit 14 and 15 set | Two repeating characters. |
Only bit 14 is set | One repeating character. |
Otherwise | Between 1 and 63 uncompressed characters. |
For data file compression the File Handler calls the compression routine that you specify in the DATACOMPRESS Compiler directive.
To call a Micro Focus data compression routine use the syntax:
call "CBLDCnnn" using input-buffer, input-buffer-size, output-buffer, output-buffer-size, compression-type
cbldcnnn(input_buffer, &input_buffer_size,
output_buffer, &output_buffer_size,
&compression-type);
where the parameters are:
nnn | A data compression routine in the range 001 to 127. |
input_buffer | A PIC X(size) data item. On entry to the routine it must contain the data to compress or decompress; maximum size is 65535. |
input_buffer_size | A two-byte (int in C, PIC XX COMP-5 in COBOL) data item. On entry it must contain the length of data in the input buffer. |
output_buffer | A PIC X(size) data item. On exit it contains the resulting data. |
output_buffer_size | A two-byte (int in C, PIC XX COMP-5 in COBOL) data item. On entry it must contain the size of the output buffer available; on exit it contains the length of the data in the buffer. |
compression-type | A one-byte (char in C, PIC X COMP-X in COBOL) data item. On entry this must specify if the input data is to be compressed or decompressed:
0 - compress 1 - decompress. |
The RETURN-CODE special register indicates whether the operation succeeded or not. Compression or decompression fails only if the output buffer is too small to accept the results. 0 indicates success and 1 indicates failure.
User-supplied compression routines must be stored in modules called USRDCnnn, where nnn is within the range 128 to 255.
To call a user-supplied routine, use the same syntax as for calling a Micro Focus routine, but use the filename USRDCnnn instead of CBLDCnnn where nnn must be a value in the range 128 through 255.
To make your compression routine available to your system, you must create a callable shared object that can be called when needed.
You can map calls to data compression routines in programs from previous UNIX COBOL systems to the new calls using the cob option:
-m CBL_DATA_COMPRESS_nnn=CBLDCnnn
Notes: