This page discusses how you can install Mainframe Connector on Cloud Run, transcode data, save it to BigQuery, and export it from BigQuery.
Mainframe Connector version 5.13.0 and later supports running Mainframe Connector as a standalone job on Google Cloud. This feature lets you run Mainframe Connector as a containerized batch job, for example, as a Cloud Run job, Google Kubernetes Engine job, or within a Docker container. This option helps you avoid installing Mainframe Connector locally on your mainframe, and makes it easier for you to integrate your mainframe queued sequential access method (QSAM) file parsing to existing extract, transform, and load (ETL) workflows.
When you use the standalone version of the Mainframe Connector, you must set up the ETL workflow that loads the QSAM file to Google Cloud by yourself.
Before you begin
- Deploy Mainframe Connector on Cloud Run.
- Create a service account or identify an existing service account to use with Mainframe Connector. This service account must have permissions to access Cloud Storage buckets, BigQuery datasets, and any other Google Cloud resource that you want to use.
- Ensure that the service account you created is assigned the Cloud Run Invoker role.
- Ensure that the mainframe data is already available on Google Cloud as a QSAM file.
Transcode data using Mainframe Connector in standalone mode on Cloud Run
To transcode your data using Mainframe Connector in standalone mode, use the following steps:
Create a YAML file with commands to read your dataset, transcode it to the ORC format, and upload it to Cloud Storage. The input dataset must be a QSAM file with fixed or variable record length. You can use the following sample YAML file to read your dataset, transcode it to the ORC format, and upload it to Cloud Storage.
In the following sample, read the data from the INFILE dataset, and the record layout from the COPYBOOK DD.
environmentVariables: - name: "INFILE" value: <var>"INFILE"</var> - name: "INFILE_DSN" value: <var>"INFILE_DSN"</var> - name: "GCSDSNURI" value: <var>"INFILE_DSN_FILEPATH"</var> - name: "COPYBOOK" value: <var>"COPYBOOK_FILEPATH"</var> - name: "LOG_PROJECT" value: <var>"LOG_PROJECT"</var> - name: "IBM_JAVA_OPTIONS" value: "-XX:+UseContainerSupport" command: gsutil cp gs://outputbucket/output --parallelism 8 --maxChunkSize "512Mib" --parser_type=copybook
Replace the following:
INFILE
: The name of the input file.INFILE_DSN
: The name of the input Data Source Name (DSN) file.INFILE_DSN_FILEPATH
: The path to the input DSN file.COPYBOOK_FILEPATH
: The path to the copybook DD.LOG_PROJECT
: The name of the log project.
The following is an example YAML file:
environmentVariables: - name: "INFILE" value: "input.dat" - name: "INFILE_DSN" value: "input.dat" - name: "GCSDSNURI" value: "gs://inputbucket/inputfolder" - name: "COPYBOOK" value: "gs://inputbucket/copybook.cpy" - name: "LOG_PROJECT" value: "the log project" - name: "IBM_JAVA_OPTIONS" value: "-XX:+UseContainerSupport" command: gsutil cp gs://outputbucket/output --parallelism 8 --maxChunkSize "512Mib" --parser_type=copybook
For the complete list of environment variables supported by Mainframe Connector, see Environment variables.
If you want to log the commands executed during this process, you can enable load statistics.
Create a
job.yaml
file with the following command.kind: Job metadata: name: JOB spec: template: spec: template: spec: containers: - image: IMAGE command: - bash - /opt/mainframe-connector/standalone.sh - --argsFrom - LOCATION_OF_THE_COMMAND_YAML_FILE
Replace the following:
- JOB with the name of your Cloud Run job. Job names must be 49 characters or less and must be unique per region and project.
- IMAGE with the URL of job container image, for example,
us-docker.pkg.dev/cloudrun/container/job:latest
. - LOCATION_OF_THE_COMMAND_YAML_FILE with the location of the YAML file that you created in the previous step.
Deploy the new job using the following command:
gcloud run jobs replace job.yaml
Execute the job using the following command:
gcloud run jobs execute JOB_NAME
Replace JOB_NAME with the name of the job.
For more information on creating and executing a Cloud Run job, see Create a new job and Execute a job.
Export BigQuery table into Mainframe dataset
You can export BigQuery table into Mainframe dataset by creating a YAML file that executes a SQL read from the QUERY DD file, and exports the resulting dataset to Cloud Storage as a binary file, as follows.
The steps to create and execute the Cloud Run job are the same as mentioned in section Transcode data using Mainframe Connector in standalone mode on Cloud Run. The only difference are the instructions mentioned in the YAML file.
environmentVariables:
- name: "COPYBOOK"
value: "<var>COPYBOOK_FILEPATH</var>"
- name: "LOG_PROJECT"
value: "<var>LOG_PROJECT</var>"
- name: "IBM_JAVA_OPTIONS"
value: "-XX:+UseContainerSupport"
command:
bq export --project_id="<var>PROJECT_NAME</var>" --location=<var>LOCATION</var> --sql="select * from project.dataset.table" --bucket="<var>BUCKET</var>"
Replace the following:
COPYBOOK_FILEPATH
: The path to the copybook DD.LOG_PROJECT
: The name of the log project.PROJECT_NAME
: The name of the project in which you want to execute the query.LOCATION
: The location for where the query will be executed. We recommended that you execute the query in a location close to the data.BUCKET
: The Cloud Storage bucket that will contain the output binary file.
The following is an example YAML file:
environmentVariables:
- name: "COPYBOOK"
value: "gs://inputbucket/copybook.cpy"
- name: "LOG_PROJECT"
value: "the log project"
- name: "IBM_JAVA_OPTIONS"
value: "-XX:+UseContainerSupport"
command:
bq export --project_id="<var>PROJECT_NAME</var>" --location=US --sql="select * from project.dataset.table" --bucket="<var>BUCKET</var>"