Mainframe Connector supports two versions of the copybook parser:
- Native copybook parser: The Native copybook parser implements an ANTLR4-based parser, supports COBOL copybooks, and is the recommended version of the parser.
- Legacy copybook parser: The Legacy copybook parser is an older version of the parser that has support for a very limited copybooks formats.
You can define which parser you want to use based on your copybook. For more information on defining the parser that you want to use, see Define the copybook parser.
Native copybook parser
The Native copybook parser is the latest version of the parser and is used by default. The native copybook parser implements an ANTLR4-based parser and supports COBOL copybooks.
This section lists the preprocessing tasks performed by the datatypes supported by the Native copybook parser and the restrictions for its use.
Preprocessing
Before parsing a copybook, the Native copybook parser preprocesses the data and performs the following tasks:
- Removes comment lines.
- Resolves line continuation.
- Blanks out line number areas and column 73 areas.
- Preserves preprocessor specific statements like
EJECT
,SPACE
, andTITLE
. These field are parsed, but ignored. Copybooks containing preprocessor parameters that can be used byCOPY REPLACING
are not supported by the Native copybook parser. In these copybooks, identifiers are surrounded by a colon (:).
Supported datatypes and restrictions
The following are the datatypes supported by the Native copybook parser and the restrictions for its use:
- Level 66 (ALIAS) or 77 (STANDALONE) are not supported.
- Use only PICTURE fields. The following PICTURE fields are supported:
- Pic A, Pic, B, Pic G (DBCS), Pic N (national or DBCS), Pic U (UTF8), Pic X, and zoned decimal (max precision 38, max scale 38)
- IBM Hexadecimal floating point (HFP) is supported.
- REDEFINES are not supported.
- Use only the following COMP fields. ALIGN and OCCURS are not supported.
- COMP
- COMP4
- BINARY
- COMP3
- PACKED-DECIMAL
- DATE and TIMESTAMP are supported.
- Double-byte character set (DBCS) field Pic G and Pic N are
supported and should be used instead of Pic T, which is now deprecated. To
use the Pic N field as DBCS without specifying
USAGE DISPLAY-1
, you must set theNSYMBOL
environment variable toDBCS
. By default,NSYMBOL
is set toNATIONAL
which setsUSAGE NATIONAL
to Pic N fields that don't have aUSAGE
clause. Note thatNSYMBOL
can only be set toNATIONAL
orDBCS
. - Variable-length character strings are supported.
- The SIGN clause is supported.
- You must justify all fields and use a single indentation level.
- Comments are supported.
Support for date and timestamp fields
Mainframe Connector supports moving date and timestamp data in and out
of BigQuery. To do so, you must define environment variables that begin
with the word SUFFIX
in the following format:
SUFFIX_SUFFIX_STRING="--bqtype TYPE --format FORMAT --timezone TIMEZONE"
The following list describes the format in more detail:
SUFFIX_SUFFIX_STRING
: The environment variable that you can use to define date and timestamp data. The SUFFIX_STRING name corresponds to the suffixes-SUFFIX_STRING
or_SUFFIX_STRING
that should be interpreted as either a date or timestamp when used as a suffix of a field name in a copybook. Ensure that the SUFFIX_STRING doesn't contain a hyphen or underscore.--bqtype
: Defines the TYPE of the BigQuery field. The supported BigQuery types areDATE
andTIMESTAMP
.--format
: A parameter that defines the format of the date or timestamp. You can specify at most five different formats separated by commas. If multiple formats can match a given input, the first format that matches is used for loading to BigQuery. If multiple formats are specified for exporting, only the first format is used. For more information on valid formats, see Supported date and timestamp formats.--timezone
: An optional parameter for the typeTIMESTAMP
. By default, the timezone is UTC. For more information about supported timezone formats, see Supported timezone formats.--omitsuffix
(Optional): If this parameter is specified,-SUFFIX_STRING
or_SUFFIX_STRING
is removed from the field name appearing in BigQuery.
To add an alias for a SUFFIX_SUFFIX_STRING
, you can set
an environment variable SUFFIX_SUFFIX_ALIAS=$SUFFIX_SUFFIX_STRING
.
Examples:
- If you define an environment variable as
SUFFIX_DT8="--bqtype DATE --format yyyyMMdd"
, a field with suffix-DT8
or_DT8
will be aDATE
type field in BigQuery, and its pattern will beyyyyMMdd
. - If you define an environment variable as
SUFFIX_DT10="--bqtype DATE --format MM-dd-yyyy"
, a field with suffix-DT10
or_DT10
will be aDATE
type field in BigQuery, and its pattern will beMM-dd-yyyy
. - If you define an environment variable as
SUFFIX_DT="--bqtype DATE --format 'MM-dd-yyyy,MM/dd/yyyy'"
, a field with suffix-DT
or_DT
will be aDATE
type field in BigQuery, and its pattern will be eitherMM-dd-yyyy
orMM/dd/yyyy
. - If you define two environment variables as
SUFFIX_TIMESTAMP="--bqtype TIMESTAMP --format 'yyyy-MM-dd HH:mm:ss.SSSSSS' --timezone America/Los_Angeles"
andSUFFIX_TS=$SUFFIX_TIMESTAMP
, a field with one of the following suffixes:-TIMESTAMP
,_TIMESTAMP
,-TS
, or_TS
will be aTIMESTAMP
type field in BigQuery, and its pattern will beyyyy-MM-dd HH:mm:ss.SSSSSS
with timezoneAmerica/Los_Angeles
.
Support for DBCS fields
Ensure the following when using DBCS fields:
- When you use PIC G or Pic N DBCS fields, you must provide one of the following
valid multi-byte character set (MBCS) encodings in the
encoding
option or in theENCODING
environment variable when using thegsutil cp
orbq export
commands:- x-IBM930
- x-IBM933
- x-IBM935
- x-IBM937
- x-IBM939
- x-IBM942
- x-IBM942C
- x-IBM943
- x-IBM943C
- x-IBM949
- x-IBM949C
- x-IBM950
- x-IBM964
- x-IBM970
- x-IBM1364
- When a copybook field only contains DBCS bytes, but these bytes are not
surrounded by shift-out (0x0E) and shift-in (0x0F), you must add the suffix
_DBCS
to the field name to ensure that these bytes are decoded as DBCS bytes.
For example, if your data corresponding to the copybook field
03 FLD01 PIC N USAGE DISPLAY-1
contains bytes 0x43
and
0xC5
in encoding x-IBM930 that are not surrounded by 0x0E
and
0x0F
, you must rename the copybook field name to
03 FLD01-DBCS PIC N USAGE DISPLAY-1
in order to correctly decode
the DBCS data.
Support for variable-length character strings
The Native copybook parser supports the following
struct
fields:
- 10 var
- 15 var-LEN PIC 9(4) USAGE COMP
- 15 var-TEXT PIC X(n)
The first field in the struct
field is the length of the second
field, the string field. You might have to add some padding to the end of the
record based on the record length as shown in the following figure.
Mainframe Connector removes the suffix from the variable name before
saving the data in BigQuery. In this example, the variable name will be
var
.
To use struct
fields, set the environment variable
BQSH_FEATURE_VARIABLE_LENGTH_ENABLED
to either yes
or
true
.
When using struct
fields, ensure the following:
- The suffix of the first parameter in the
struct
is-LEN
. If you want to use a different suffix, you must set the environment variableBQSH_FEATURE_VARIABLE_LENGTH_LEN_SUFFIX
to the suffix that you want to use. - The suffix of the second parameter in the
struct
is-TEXT
. If you want to use a different suffix you must set the environment variableBQSH_FEATURE_VARIABLE_LENGTH_LEN_SUFFIX
to the suffix that you want to use.
Unsupported fields and constructs
The following sections describe fields and constructs are not supported by the
COBOL constructs
COBOL constructs even though these constructs are not supported. If you use these constructs in your copybook, Mainframe Connector shows an error.
dataAlignedClause
dataBlankWhenZeroClause
dataCommonOwnLocalClause
dataIntegerStringClause
dataJustifiedClause
dataOccursClause
dataReceivedByClause
dataRecordAreaClause
dataRenamesClause
dataSignClause
dataSynchronizedClause
dataThreadLocalClause
dataTypeClause
dataTypeDefClause
dataUsingClause
Data types
COBOL data types like COMP-1 and COMP-2 are supported.
Legacy copybook parser
The legacy copybook parser is an older version of the parser that supports non-COBOL features. If you are using DSL-based copybook the legacy parser might be more suitable as the Native copybook parser might cause errors.
You can use copybook DD with the following restrictions:
- Level 66 (ALIAS) or 77 (STANDALONE) are not supported.
- REDEFINES are not supported.
- Comment lines are not supported.
- Fields of length 10 whose name ends with DATE or DT are dates. Decoding is different for those fields.
- Use only the following COMP fields. ALIGN and OCCURS are not supported.
- COMP
- COMP4
- BINARY
- COMP3
- PACKED-DECIMAL
- Use only PICTURE fields. Define PICTURE fields on the same line, directly after the field name.
- You must justify all fields and use a single level. Comments are not supported.
- Ensure that columns 1 to 6 always contain blanks.