The following table lists the BigQuery, Cloud Storage, and other Google Cloud commands that you can use with Mainframe Connector.
Product | Command | Description | Supports remote transcoding |
---|---|---|---|
BigQuery commands |
|
Use this command to create a binary file. The command accepts a
COPYBOOK DD
as input.
The bq export command supports some performance tuning
capabilities. For more information, see
Performance improvements for the
bq export command.
You can use customized character sets with the bq export
command. For more information, see
Use customized character sets.
Note: The bq export command fails requests
to export large Bigtable tables. To avoid errors, add the
-allowLargeResults flag to the bq export
command when you want to export large tables. |
Yes |
|
Use this command to load data into a table. For more information, see bq load. | No | |
|
Use this command to create BigQuery resources, such as
built-in tables or external tables, that need partitioning and clustering to
be set up. For more information, see bq mk.
You can also use the bq mk command to generate a
BigQuery table directly from parsing COBOL copybooks. For more
information, see Create a BigQuery table from a copybook.
|
No | |
|
Use this command to create a query job that runs the specified SQL query.
The command reads the SQL query either from the --sql flag or
from QUERY DD.
If both are provided, the query in the --sql flag takes precedence.
Use the --follow=true flag to generate a report that displays
the results of a select query. In order to write this report to a file in
the mainframe, define a DD statement AUDITL that points to the
file that should contain the audit logs report. Don't use the
--follow flag if you want normal logging behavior.
Some query results may return a large number of rows, sometimes in millions. In order for the output to remain human readable the number of lines displayed is capped. To control the number of rows being displayed, use the --report_row_limit flag. For example, use
--report_row_limit 10 to limit the results to 10 lines. By
default, the number of lines displayed is limited to 30.
To use bq query parameterization, see bq query parameterization.
For more information, see bq query. |
Yes | |
|
Use this command to permanently delete a BigQuery resource. As this command permanently deletes a resource, we recommend that you use it with caution. For more information, see bq rm. | No | |
Cloud Storage commands |
|
Use this command to copy text or binary data to Cloud Storage. You
can use the simple binary copy mode to copy a
dataset from IBM z/OS to
Cloud Storage unmodified as part of a data pipeline. Optionally, you can
convert the character encoding from extended binary coded decimal
interchange code (EBCDIC) to ASCII UTF-8, and add line breaks.
You can also use this command to copy application source code defined in job control language (JCL). |
No |
gsutil utility |
|
Use this command to transcode a dataset and write it to Cloud Storage
in the Optimized Row Columnar (ORC)
file format. The command reads the data from the INFILE DD, and the record layout from the
COPYBOOK file.
If you want the command to read the data from a Data Source Name (DSN) file,
use the following flags:
The command then opens a configurable number of parallel connections to the Cloud Storage API and transcodes the COBOL dataset to the columnar and GZIP compressed ORC file format. You can expect about 35% compression ratio. Optionally, you can use this command to interact with the Mainframe Connector gRPC service running on a VM on the mainframe. To do so, set the SRVHOST
and SRVPORT environment variables, or provide the hostname and
port number using command line options. When the gRPC service is used, the
input dataset is first copied to Cloud Storage by the
Mainframe Connector, and then a remote procedure (RPC) call is made
to instruct the gRPC service to transcode the file.
You can also perform the following tasks with the gsutil cp
command:
|
Yes |
|
Use this command to delete buckets or objects within a bucket. For more information, see rm - Remove objects. | No | |
gszutil utility |
|
The gszutil utility runs using the IBM JZOS Java SDK and provides a shell emulator
that accepts gsutil and BigQuery command line
invocations using JCL.
The gszutil utility extends the functionality of the
gsutil utility by accepting a schema in the form of a
COPYBOOK DD,
using it to transcode COBOL datasets directly to ORC before uploading to
Cloud Storage. The gszutil utility also lets you execute
BigQuery query and load using JCL.
The gszutil utility works with the gRPC
server, which helps you reduce the million instructions per second (MIPS)
consumption. We recommend using the gszutil utility in your
production environment to convert binary files in Cloud Storage to the
ORC format.
|
No |
Other Commands |
|
Use this command to send a message to a Pub/Sub topic. You can provide the message using the command line, or using a dataset. | No |
|
Use this command to trigger the execution of a Dataflow flex template. The command runs a job from the specified flex template path. For more information, see gcloud dataflow flex-template run. | No | |
|
Use this command to make an HTTP request to a web service or REST APIs. | No | |
|
Use this command to print the necessary system data to the standard
output (stdout). This allows the Mainframe Connector support team
to gather the required information to diagnose an issue without the need for
extensive customer interaction.
Based on the flag you use, the systemreport command prints the
following system data:
|
No |
Use customized character sets
Mainframe Connector supports different character sets that decode bytes into BigQuery strings, and the other way around. Mainframe Connector lets you configure your own customized charset. You can configure a customized character set by building a Unicode Character Mapping (UCM) file. Mainframe Connector supports the following subset of the UCM format:
<code_set_name> "<name>"
<uconv_class> "SBCS"
<subchar> \x1A #Example
CHARMAP
#_______ _________
<U0000> \x00 |0 #For the third column, only 0 is supported.
<U0001> \x01 |0
#etc
END CHARMAP
If you want to use a customized character set, define a configuration file in
the UCM format. You can use this customized character set with the
gsutil cp
or
bq export
commands by setting the --encoding=charset
flag.
When you create a customized character set, ensure the following:
- While defining a UCM file, keep the following in mind:
- Mainframe Connector only supports customized character sets using a single byte character set (SBCS).
- Mainframe Connector only supports the UCM precision indicator
|0
. - Ensure that the UCM files are located in the z/OS Unix System Services (USS) and not on a multiple virtual storage partitioned dataset (MVS PDS).
- Ensure that the UCM files are saved in American Standard Code for Information Interchange (ASCII) format and not in the Extended Binary Coded Decimal Interchange Code (EBCDIC) format.
- Provide explicit mapping for every possible single byte value to a Unicode
character. If you're unsure about which Unicode character you want to map a
byte to, we recommend that you map it to
U+FFFD
. You can map different byte sequences to the same Unicode character. However, in these cases the mapping is not bidirectional, that is, when you load data to BigQuery and later export it back to a binary file, the output might differ from the original input. - Ensure that the byte sequences in the second column are unique. If multiple byte sequences map to the same Unicode character, this Unicode character is decoded to a byte sequence of the last defined mapping in the UCM file.
- Ensure that Mainframe Connector can find the UCM file by setting the
environment variable
BQSH_FEATURE_CUSTOM_CHARSET
to the UCM file's path. If you want to use multiple character sets, you can provide the paths to multiple characters sets separated by the semi-colon delimiter. For example,BQSH_FEATURE_CUSTOM_CHARSET=path1;path2
. path can either point to a local file or to a file stored on Cloud Storage. If you execute thegsutil cp
orbq export
commands with the--remote
flag to perform remote transcoding, Mainframe Connector uses the local value set for theBQSH_FEATURE_CUSTOM_CHARSET
environment variable. The same applies when you run Mainframe Connector in standalone mode. If the--encoding
flag refers to a customized character set that doesn't correspond to the value you set forBQSH_FEATURE_CUSTOM_CHARSET
(or if you've not setBQSH_FEATURE_CUSTOM_CHARSET
at all), the command exits with an error message.
Performance tuning configuration for the bq export
command
Mainframe Connector supports the following performance tuning
configuration for the bq export
command:
exporter_thread_count
: (Optional) Set the number of worker threads. The default value is 4.max_read_streams
: (Optional) Set the maximum read streams. The default value is the same as that of the value set forexporter_thread_count
.order_response
: (Optional) If you set this flag to true, the exporter retains the query result order. This flag affects the export performance. The default value is false.max_read_queue
: (Optional) Set the maximum number of read record queues. The default value is twice the number of threads.transcoding_buffer
: (Optional) Set the size of the transcoding buffer per thread in MBs. The default value is 20 MB.
Note that you can also try increasing the transport window size by setting the
OVERRIDE_GRPC_WINDOW_MB
environment variable to improve performance. The default window size is 4 MB.
Create a BigQuery table from a copybook
You can use the bq mk
command to generate a BigQuery table
directly from parsing COBOL copybooks. The native copybook parser extracts
default values from the VALUE
clause within a copybook, and assigns
them to the corresponding columns in a newly created BigQuery table.
To help you test this feature, the bq mk
command also provides
a dry run mode. This mode lets you preview the generated
CREATE TABLE SQL
command without actually creating the table in
BigQuery.
The bq mk
command provides the following configuration options to
support this feature:
--schema_from_copybook
: Specifies the copybook to use to create the table.--dry_run
: (Optional) When enabled, the command only prints the generatedCREATE TABLE SQL
command without executing it. This flag is set to false by default.--tablespec "[PROJECT_ID]:[DATASET].[TABLE]"
: Specifies the BigQuery project ID, dataset, and table name for the target table.--encoding
: Specifies the encoding used to read the copybook file. The default value isCP037
.
The following VALUE
clauses are supported:
VAR1 PIC 9(5) VALUE 55.
*-- Set VAR1 to 55
VAR1 PIC X(5) VALUE aaaa. Set VAR1 to aaaa
VAR1 PIC 9(3) COMP VALUE 3. Set VAR1 to 3 (binary)
VAR1 PIC [9(5), X(5)] VALUE <literal>. Set VAR1 to <literal>
VAR1 PIC [9(5), X(5)] VALUE ZERO. Set VAR1 to 0 or "0"
VAR1 PIC [9(5), X(5)] VALUE ZEROS. Set VAR1 to 0 or "00000"
VAR1 PIC [9(5), X(5)] VALUE ZEROES. Set VAR1 to 0 or "00000"
VAR1 PIC X(5) VALUE SPACE. Set VAR1 to " "
VAR1 PIC X(5) VALUE SPACES. Set VAR1 to " "
HIGH-VALUE
and LOW-VALUE
clauses are supported for
alphanumeric variables only.
VAR1 PIC X(5) VALUE HIGH-VALUE. Set VAR1 to `X"FF "
VAR1 PIC X(5) VALUE HIGH-VALUES. Set VAR1 to 0 or `X"FFFFFFFFFF"
VAR1 PIC X(5) VALUE LOW-VALUE. Set VAR1 to `X"00" (NULL)
VAR1 PIC X(5) VALUE LOW-VALUES. Set VAR1 to `X"0000000000" (NULL)
VAR1 PIC X(5) VALUE QUOTE. Set VAR1 to `"`
VAR1 PIC X(5) VALUE `QUOTES`. Set VAR1 to 0 or `""""`
VAR1 PIC [9(5), X(5)] VALUE NULL. Not defined and won't be supported
VAR1 PIC [9(5), X(5)] VALUE ALL <literal>. Set all fields with the value ALL to <literal>
bq query
parameterization
Mainframe Connector lets you use parameterized queries with
bq query
.
The following is an example of how to you can use a parameterized
bq query
query:
Query file
SELECT * FROM `bigquery-public-data.samples.wikipedia` WHERE title = @xtitle
The following is an example with multiple parameters.
Query file
SELECT * FROM bigquery-public-data.samples.wikipedia WHERE title = @mytitle AND num_characters > @min_chars;
Execution example
bq query \
--project_id=mainframe-connector-dev \
--location="US" \
--parameters=mytitle::Hippocrates,min_chars:INT64:42600
Perform a dry run of the gsutil cp
command
The gsutil cp
command decodes a Queued Sequential Access Method
(QSAM) file using a COBOL copybook, and generates an ORC file on Cloud Storage.
You can perform a dry run of the gsutil cp
command using the
dry_run
flag and test the following steps:
- Parse a COBOL copybook or data file and check whether it is compatible with Mainframe Connector.
- Decode a QSAM file without writing it Cloud Storage.
Use the following command to perform a dry run:
gsutil cp \
--dry_run \
gs://result-dir
If all steps are executed successfully, the command exits with return code 0. If any issues are encountered, an error message is displayed.
When you use the dry_run
flag, all statistics such as total bytes
read, number of written records, total errors, are logged.
If you use the dry_run
flag and the data source doesn't exist, the
command doesn't return an error. It instead only checks the copybook parser and
and then completes execution.
Copy a file from Cloud Storage to your Mainframe
You can use the gsutil cp
command to copy a file from
Cloud Storage to a Mainframe dataset. Note that you cannot copy partitioned
data sets (PDS).
To copy a file from Cloud Storage to a Mainframe dataset, specify the DSN and space requirements of the file you want to download to the Mainframe in JCL, as shown in the following example:
//OUTFILE DD DSN=MAINFRAME.DSN.FILE,DISP=(,CATLG),
// RECFM=FB,DSORG=PS,
// SPACE=(10,(2,1),RLSE),
// AVGREC=M,
// UNIT=SYSDA
//SYSPRINT DD SYSOUT=*
//SYSDUMP DD SYSOUT=*
//STDIN DD *
Specify the gsutil cp
command in the following format. If the file
already exists on your Mainframe, ensure that you add the --replace
flag to the command.
gsutil cp GCS_URI DSN --recfm=RECFM --lrecl=LRECL --blksize=BLKSIZE --noseek
Replace the following:
- GCS_URI: The Cloud Storage uniform resource identifier (URI) of
the Cloud Storage file. For example,
gs://bucket/sample.mainframe.dsn
. - DSN: The DSN destination location on the Mainframe.
- RECFM: The record format (RECFM) of the Mainframe file. The valid values are F, FB, and U. Note that these values are case-insensitive.
- LRECL: (Optional) The record length (LRECL) of the file. The value must be an integer >= 0. If LRECL is not specified the file is assumed to be in the undefined-length record format (U).
- BLKSIZE: (Optional) The block-size of the file. If set to 0, the system will determine the optimal block-size. The value must be an integer >= 0. If you don't specify a value, the file is treated as an unblocked file.
- noseek: (Optional) Include this parameter if you want to improve download performance. This flag is set to false by default, that is, seek operations are enabled.
Execution example
gsutil cp gs://sample-bucket/MAINFRAME.DSN.FILE MAINFRAME.DSN.FILE \
--lrecl=16 --blksize=0 --recfm=fb
Performance tuning configuration for the gsutil cp
command
Mainframe Connector supports the following performance tuning
configuration for the gsutil cp
command.
- Use the
--parallelism
flag to set the number of threads. The default value is 1 (single threaded). - Use the
--maxChunkSize
argument to set the maximum size of each chunk. Each chunk will have its own ORC file. Increase this value to reduce the number of chunks created at the cost of larger memory requirements during the transcoding process. For details, see Parse themaxChunkSize
argument. The default value is 128 MiB. - Use
--preload_chunk_count
argument to set the amount of data to preload to memory while all workers are busy. This argument can improve performance at the cost of memory. The default value is 2.
Execution example
gsutil cp \
--replace \
--parser_type=copybook \
--parallelism=8 \
--maxChunkSize=256MiB \
gs://$BUCKET/test.orc
In this example, we've considered a large file and so have used 8 threads at which line rate is reached. If you have enough memory, we recommend that you increase the chunk size to 256 MiB or even 512 MiB since it reduces creating overhead and finalizing Cloud Storage objects. For small files using less threads and smaller chunks might produce better results.
Parse the maxChunkSize
argument
The maxChunkSize
flag accepts values in the form of an amount and a
unit of measurement, for example 5 MiB. You can use whitespace between the
amount and magnitude.
You can provide the value in the following formats:
- Java format: b/k/m/g/t, for byte, kibibyte, mebibyte, gibibyte, and tebibyte respectively
- International format: KiB/MiB/GiB/TiB, for kibibyte, mebibyte, gibibyte, and tebibyte respectively
- Metric format: b/kb/mb/gb/tb, for kilobyte, megabyte, gigabyte, and terabyte respectively
Data size parsing is case insensitive. Note that you can't specify partial amounts. For example, use 716 KiB instead of 0.7 MiB.