Copybook parser reference

Dataset names

You can use the following dataset definition (DD) files in your BQSH JCL procedure. Ensure that all MVS datasets referenced by a DD file uses the fixed block (FB) record format.

DD name Description
COPYBOOK An MVS dataset containing a COBOL copybook for the dataset referenced by an INFILE DD. You can use the COPYBOOK DD with a few restrictions. For more information, see COPYBOOK DD usage restrictions.
INFILE An MVS dataset containing a COBOL dataset to be uploaded to Cloud Storage.
KEYFILE An MVS dataset containing a Google Cloud IAM service account JSON keyfile.
QUERY DD An MVS dataset containing a BigQuery standard SQL query. The QUERY DD is a FB file with a logical record size (LRECL) of 80. This means that every record in the file is 80 bytes long.
STDIN Stream input used to provide shell commands.

Native COPYBOOK parser restrictions

Mainframe Connector supports two versions of the COPYBOOK parser: the parser and is available by default.

This section lists the fields supported by the Native COPYBOOK parser and outlines the restrictions for its use. For information on the Legacy COPYBOOK parser, see Legacy COPYBOOK parser restrictions.

  • Level 66 (ALIAS) or 77 (STANDALONE) are not supported.
  • Use only PICTURE fields. The following PICTURE fields are supported:
    • Pic N (national), Pic U (UTF8), and zoned decimal (max precision 38, max scale 38)
  • REDEFINES are not supported.
  • Use only the following COMP fields. ALIGN and OCCURS are not supported.
    • COMP
    • COMP4
    • BINARY
    • COMP3
    • PACKED-DECIMAL
  • DATE and TIMESTAMP are supported.
  • Double-byte character set (DBCS) field Pic N is supported. To use the Pic N field as DBCS without specifying USAGE DISPLAY-1, you must set the NSYMBOL environment variable to DBCS. By default, NSYMBOL is set to NATIONAL which sets USAGE NATIONAL to Pic N fields that don't have a USAGE clause. Note that NSYMBOL can only be set to NATIONAL or DBCS.
  • Variable-length character strings are supported.
  • The SIGN clause is supported.
  • You must justify all fields and use a single indentation level. Comments are supported.

Support for date and timestamp fields

Mainframe Connector supports moving date and timestamp data in and out of Cloud Storage. To do so, you must define environment variables that begin with the word SUFFIX in the following format:

SUFFIX_SUFFIX_STRING="--bqtype TYPE --format FORMAT --timezone TIMEZONE"

The following list describes the format in more detail:

  • SUFFIX_SUFFIX_STRING: The environment variable that you can use to define date and timestamp data. The SUFFIX_STRING name corresponds to the suffixes -SUFFIX_STRING or _SUFFIX_STRING that should be interpreted as either a date or timestamp when used as a suffix of field name in a copybook. Ensure that the SUFFIX_STRING doesn't contain a hyphen or underscore.
  • --bqtype: Defines the TYPE of the BigQuery field. The supported BigQuery types are DATE and TIMESTAMP.
  • --format: A parameter that defines the format of the date or timestamp. You can specify multiple formats separated by commas. If multiple formats can match a given input, the first format that matches is used. For more information on valid timezone formats, see Supported date and timestamp formats.
  • --timezone: An optional parameter for the type TIMESTAMP. By default, the timezone is UTC. For more information about supported timezone formats, see Supported timezone formats.

Examples:

  • If you define an environment variable as SUFFIX_DT8="--bqtype DATE --format yyyyMMdd", a field with suffix -DT8 or _DT8 will be a DATE type field in BigQuery, and its pattern will be yyyyMMdd.
  • If you define an environment variable as SUFFIX_DT10="--bqtype DATE --format MM-dd-yyyy", a field with suffix -DT10 or _DT10 will be a DATE type field in BigQuery, and its pattern will be MM-dd-yyyy.
  • If you define an environment variable as SUFFIX_DT="--bqtype DATE --format 'MM-dd-yyyy,MM/dd/yyyy'", a field with suffix -DT or _DT will be a DATE type field in BigQuery, and its pattern will be either MM-dd-yyyy or MM/dd/yyyy.
  • If you define an environment variable as SUFFIX_TIMESTAMP="--bqtype TIMESTAMP --format 'yyyy-MM-dd HH:mm:ss.SSSSSS' --timezone America/Los_Angeles", a field with suffix -TIMESTAMP or _TIMESTAMP will be a TIMESTAMP type field in BigQuery, and its pattern will be yyyy-MM-dd HH:mm:ss.SSSSSS with timezone America/Los_Angeles.

Support for DBCS fields

Ensure the following when using DBCS fields:

  • When you use PIC N DBCS fields, you must provide one of the following valid multi-byte character set (MBCS) encodings in the encoding option or in the ENCODING environment variable when using the gsutil cp or bq export commands:
    • x-IBM930
    • x-IBM933
    • x-IBM935
    • x-IBM937
    • x-IBM939
    • x-IBM942
    • x-IBM942C
    • x-IBM943
    • x-IBM943C
    • x-IBM949
    • x-IBM949C
    • x-IBM950
    • x-IBM964
    • x-IBM970
    • x-IBM1364
  • When a COPYBOOK field only contains DBCS bytes, but these bytes are not surrounded by shift-out (0x0E) and shift-in (0x0F), you must add the suffix DBCS to the field name to ensure that these bytes are decoded as DBCS bytes.

For example, if your data corresponding to the copybook field 03 FLD01 PIC N USAGE DISPLAY-1 contains bytes 0x43 and 0xC5 in encoding x-IBM930 that are not surrounded by 0x0E and 0x0F, you must rename the copybook field name to 03 FLD01-DBCS PIC N USAGE DISPLAY-1 in order to correctly decode the DBCS data.

Support for variable-length character strings

The Native copybook parser supports the following struct fields:

  • 10 var
  • 15 var-LEN PIC 9(4) USAGE COMP
  • 15 var-TEXT PIC X(n)

The first field in the struct field is the length of the second field, the string field. You might have to add some padding to the end of the record based on the record length as shown in the following figure.

Padding added to variable-length character strings.
Figure 1. Padding added to variable-length character strings.

Mainframe Connector removes the suffix from the variable name before saving the data in BigQuery. In this example, the variable name will be var.

To use struct fields, set the environment variable BQSH_FEATURE_VARIABLE_LENGTH_ENABLED to either yes or true.

When using struct fields, ensure the following:

  • The suffix of the first parameter in the struct is -LEN. If you want to use a different suffix, you must set the environment variable BQSH_FEATURE_VARIABLE_LENGTH_LEN_SUFFIX to the suffix that you want to use.
  • The suffix of the second parameter in the struct is -TEXT. If you want to use a different suffix you must set the environment variable BQSH_FEATURE_VARIABLE_LENGTH_LEN_SUFFIX to the suffix that you want to use.

Legacy COPYBOOK parser restrictions

You can use COPYBOOK DD with the following restrictions:

  • Level 66 (ALIAS) or 77 (STANDALONE) are not supported.
  • Use only PICTURE fields. REDEFINES are not supported.
  • Use only the following COMP fields. ALIGN and OCCURS are not supported.
    • COMP
    • COMP4
    • BINARY
    • COMP3
    • PACKED-DECIMAL
  • You must justify all fields and use a single level. Comments are not supported.