Transcode mainframe data remotely on Google Cloud

Transcoding data locally on a mainframe is a CPU-intensive process that results in high million instructions per second (MIPS) consumption. To avoid this, you can use Cloud Run to move and transcode mainframe data remotely on Google Cloud. This frees up your mainframe for business critical tasks and also reduces MIPS consumption.

The following diagram shows how you can move your mainframe data to a Cloud Storage bucket, transcode the data to the ORC format using Cloud Run, and then move the content to BigQuery.

Remotely transcode mainframe data
Remotely transcode mainframe data

Before you begin

  • Create a service account or identify an existing service account to use with Mainframe Connector. This service account must have permissions to access Cloud Storage buckets, BigQuery datasets, and any other Google Cloud resource that you want to use.
  • Ensure that the service account you created is assigned the Cloud Run Invoker role.

Move mainframe data to Google Cloud and transcode it remotely using Cloud Run

To move your mainframe data to Google Cloud and transcode it remotely using Cloud Run, you must perform the following tasks:

  1. Deploy Mainframe Connector remote service using Terraform. The Mainframe Connector remote service is based on Cloud Run that lets you perform remote transcoding. If you want to deploy multiple instances of Mainframe Connector remote service, see Deploy multiple instances of the Mainframe Connector.
  2. Read and transcode a dataset on a mainframe, and upload it to Cloud Storage in ORC format. Transcoding is done during the cp operation, where a mainframe extended binary coded decimal interchange code (EBCDIC) dataset is converted to the ORC format in UTF-8 during the copy to a Cloud Storage bucket.
  3. Load the dataset to a BigQuery table.
  4. (Optional) Execute a SQL query on the BigQuery table.
  5. (Optional) Export data from BigQuery into a binary file in Cloud Storage.

To perform these tasks, follow these steps:

  1. Download the Mainframe Connector deployment tar file.

  2. Extract the files in the deployment tar file.

    tar -xvf ./deployment.tar
    

    The main.tf and vars.tf files are extracted from deployment tar file.

  3. Review and edit the variables in the vars.tf file. Most of the variables already have default values. However, you must set the following values. The only mandatory variables you need to set are project and connector_service_account_email.

    • project: The Google Cloud project in which you want to install Mainframe Connector.
    • connector_service_account_email: The service account that has all the permissions for the operations you want to perform using Mainframe Connector.

    You can also set the network configuration using the connector_service_ingress and connector_service_vpc_access variables.

  4. Run the terraform init command with a Cloud Storage bucket and prefix as arguments. Using a Cloud Storage bucket and prefix helps you save the deployment state in the bucket. You can also reuse the same bucket and prefix when you upgrade Mainframe Connector.

    terraform init \
    -backend-config bucket=DEPLOYMENT_STATE_BUCKET \
    -backend-config prefix=BUCKET_PREFIX
    

    Replace the following:

    • DEPLOYMENT_STATE_BUCKET: The name of the Cloud Storage bucket.
    • BUCKET_PREFIX: The prefix that you want to use in the Cloud Storage bucket.
  5. Create a .tfvars file to define variables that you want Terraform to use during the Mainframe Connector deployment.

  6. Open the .tfvars file and define the following variables as key-value pairs.

    • instance_id: Define an instance_id to separate different workloads when you want to have multiple instances of Mainframe Connector, or to use different service accounts.
    • project: The project in which you want to deploy Mainframe Connector.
    • connector_service_ingress: The ingress type.
    • additional_labels: Additional labels if you want to test the deployment.
    • connector_service_account_email: The service account email ID of Mainframe Connector.

    Save your changes and close the file.

  7. Deploy Mainframe Connector.

    terraform apply -var-file=VARIABLE_FILE_NAME
    

    Replace VARIABLE_FILE_NAME with the variables file you created in the previous step.

  8. (Optional) To check if Mainframe Connector is deployed and running, go to the Cloud Run page, and select the Services tab. You should see your deployment listed in the table.

    Go to Cloud Run

  9. In your mainframe, create a job to read the dataset on your mainframe and transcode it to ORC format, as follows. Read the data from the INFILE dataset, and the record layout from the COPYBOOK DD. The input dataset must be a queued sequential access method (QSAM) file with fixed or variable record length.

    //STEP01 EXEC BQSH
    //INFILE DD DSN=<HLQ>.DATA.FILENAME,DISP=SHR
    //COPYBOOK DD DISP=SHR,DSN=<HLQ>.COPYBOOK.FILENAME
    //STDIN DD *
    gsutil cp --replace gs://mybucket/tablename.orc \
      --remoteHost <mainframe-connector-url>.a.run.app \
      --remotePort 443
    /*
    

    If you want to log the commands executed during this process, you can enable load statistics.

  10. (Optional) Create and submit a BigQuery query job that executes a SQL read from the QUERY DD file. Typically the query will be a MERGE or SELECT INTO DML statement that results in transformation of a BigQuery table. Note that Mainframe Connector logs in job metrics but doesn't write query results to a file.

    You can query BigQuery in various ways-inline, with a separate dataset using DD, or with a separate dataset using DSN.

    Example JCL
    //STEP03 EXEC BQSH
    //QUERY DD DSN=<HLQ>.QUERY.FILENAME,DISP=SHR
    //STDIN DD *
    PROJECT=PROJECT_NAME
    LOCATION=LOCATION
    bq query --project_id=$PROJECT \
      --location=$LOCATION/*
    /*
    

    Replace the following:

    • PROJECT_NAME: The name of the project in which you want to execute the query.
    • LOCATION: The location for where the query will be executed. We recommended that you execute the query in a location close to the data.
  11. (Optional) Create and submit an export job that executes a SQL read from the QUERY DD file, and exports the resulting dataset to Cloud Storage as a binary file.

    Example JCL
    //STEP04 EXEC BQSH
    //OUTFILE DD DSN=<HLQ>.DATA.FILENAME,DISP=SHR
    //COPYBOOK DD DISP=SHR,DSN=<HLQ>.COPYBOOK.FILENAME
    //QUERY DD DSN=<HLQ>.QUERY.FILENAME,DISP=SHR
    //STDIN DD *
    PROJECT=PROJECT_NAME
    DATASET_ID=DATASET_ID
    DESTINATION_TABLE=DESTINATION_TABLE
    BUCKET=BUCKET
    bq export --project_id=$PROJECT \
      --dataset_id=$DATASET_ID \
      --destination_table=$DESTINATION_TABLE \
      --location="US" \
      --bucket=$BUCKET \
      --remoteHost <mainframe-connector-url>.a.run.app \
      --remotePort 443
    /*
    

    Replace the following:

    • PROJECT_NAME: The name of the project in which you want to execute the query.
    • DATASET_ID: The BigQuery dataset ID that contains the table that you want to export.
    • DESTINATION_TABLE: The BigQuery table that you want to export.
    • BUCKET: The Cloud Storage bucket that will contain the output binary file.

Deploy multiple instances of the Mainframe Connector

To limit access for specific jobs, you might need to deploy multiple instances of the Mainframe Connector. You can do this by deploying the Mainframe Connector multiple times with different variables and service accounts. Since the Mainframe Connector remote service is based on Cloud Run you will only be billed when each service is actually running. You also don't need to set up high availability (HA) as each instance is already load balanced and highly available.