Export and import files in parallel

This page describes exporting and importing files into Cloud SQL instances in parallel.

Before you begin

Before you begin an export or import operation:

  • Ensure that your database has adequate free space.
  • Export and import operations use database resources, but they don't interfere with typical database operations unless the instance is under-provisioned.

  • Follow the best practices for exporting and importing data.
  • After completing an import operation, verify the results.

Export data from Cloud SQL for PostgreSQL to multiple files in parallel

The following sections contain information about exporting data from Cloud SQL for PostgreSQL to multiple files in parallel.

Required roles and permissions for exporting data from Cloud SQL for PostgreSQL to multiple files in parallel

To export data from Cloud SQL into Cloud Storage, the user initiating the export must have one of the following roles:

Additionally, the service account for the Cloud SQL instance must have one of the following roles:

  • The storage.objectAdmin Identity and Access Management (IAM) role
  • A custom role, including the following permissions:
    • storage.objects.create
    • storage.objects.list (for exporting files in parallel only)
    • storage.objects.delete (for exporting files in parallel only)

For help with IAM roles, see Identity and Access Management.

Export data to multiple files in parallel

You can export data in parallel from multiple files that reside in Cloud SQL to Cloud Storage. To do this, use the pg_dump utility with the --jobs option.

If you plan to import your data into Cloud SQL, then follow the instructions provided in Exporting data from an external database server so that your files are formatted correctly for Cloud SQL.

gcloud

To export data from Cloud SQL to multiple files in parallel, complete the following steps:

  1. Create a Cloud Storage bucket.
  2. To find the service account for the Cloud SQL instance that you're exporting files from, use the
    gcloud sql instances describe command.
    gcloud sql instances describe INSTANCE_NAME
    
  3. Replace INSTANCE_NAME with the name of your Cloud SQL instance.

    In the output, look for the value that's associated with the serviceAccountEmailAddress field.

  4. To grant the storage.objectAdmin IAM role to the service account, use the gsutil iam utility. For help with setting IAM permissions, see Use IAM permissions.
  5. To export data from Cloud SQL to multiple files in parallel, use the gcloud sql export sql command:
    gcloud sql export sql INSTANCE_NAME gs://BUCKET_NAME/BUCKET_PATH/FOLDER_NAME \
    --offload \
    --parallel \
    --threads=THREAD_NUMBER \
    --database=DATABASE_NAME \
    --table=TABLE_EXPRESSION
    

    Make the following replacements:

    • INSTANCE_NAME: the name of the Cloud SQL instance from which you're exporting files in parallel.
    • BUCKET_NAME: the name of the Cloud Storage bucket.
    • BUCKET_PATH: the path to the bucket where the export files are stored.
    • FOLDER_NAME: the folder where the export files are stored.
    • THREAD_NUMBER: the number of threads that Cloud SQL uses to export files in parallel. For example, if you want to export three files at a time in parallel, then specify 3 as the value for this parameter.
    • DATABASE_NAME: the name of the database inside of the Cloud SQL instance from which the export is made. You must specify only one database.
    • TABLE_EXPRESSION: the tables to export from the specified database.

    The export sql command doesn't contain triggers or stored procedures, but does contain views. To export triggers or stored procedures, use a single thread for the export. This thread uses the pg_dump tool.

    After the export completes, you should have files in a folder in the Cloud Storage bucket in the pg_dump directory format.

  6. If you don't need the IAM role that you set in Required roles and permissions for exporting from Cloud SQL for PostgreSQL, then revoke it.

REST v1

To export data from Cloud SQL to multiple files in parallel, complete the following steps:

  1. Create a Cloud Storage bucket:
    gsutil mb -p PROJECT_NAME -l LOCATION_NAME gs://BUCKET_NAME
    
    Make the following replacements:
    • PROJECT_NAME: the name of the Google Cloud project that contains the Cloud Storage bucket you're creating.
    • LOCATION_NAME: the location of the bucket where you want to store the files you're exporting. For example, us-east1.
    • BUCKET_NAME: the name of the bucket, subject to naming requirements. For example, my-bucket.
  2. Provide your instance with the legacyBucketWriter IAM role for your bucket. For help with setting IAM permissions, see Use IAM permissions.
  3. Export data from Cloud SQL to multiple files in parallel:

    Before using any of the request data, make the following replacements:

    • PROJECT_NAME: the name of the Google Cloud project that contains the Cloud Storage bucket you created.
    • INSTANCE_NAME: the name of the Cloud SQL instance from which you're exporting files in parallel.
    • BUCKET_NAME: the name of the Cloud Storage bucket.
    • BUCKET_PATH: the path to the bucket where the export files are stored.
    • FOLDER_NAME: the folder where the export files are stored.
    • DATABASE_NAME: the name of the database inside of the Cloud SQL instance from which the export is made. You must specify only one database.
    • THREAD_NUMBER: the number of threads that Cloud SQL uses to export files in parallel. For example, if you want to export three files at a time in parallel, then specify 3 as the value for this parameter.

    HTTP method and URL:

    POST https://sqladmin.googleapis.com/v1/projects/PROJECT_NAME/instances/INSTANCE_NAME/export

    Request JSON body:

    {
     "exportContext":
       {
          "fileType": "SQL",
          "uri": "gs://BUCKET_NAME/BUCKET_PATH/FOLDER_NAME",
          "databases": ["DATABASE_NAME"],
          "offload": [TRUE|FALSE],
          "sqlExportOptions": {
            "parallel": [TRUE|FALSE],
            "threads": [THREAD_NUMBER]
           }
       }
    }
    

    To send your request, expand one of these options:

    You should receive a JSON response similar to the following:

  4. After the export completes, you should have files in a folder in the Cloud Storage bucket in the pg_dump directory format.

  5. If you don't need the IAM role that you set in Required roles and permissions for exporting from Cloud SQL for PostgreSQL, then revoke it.
For the complete list of parameters for the request, see the Cloud SQL Admin API page.

REST v1beta4

To export data from Cloud SQL to multiple files in parallel, complete the following steps:

  1. Create a Cloud Storage bucket:
    gsutil mb -p PROJECT_NAME -l LOCATION_NAME gs://BUCKET_NAME
    
    Make the following replacements:
    • PROJECT_NAME: the name of the Google Cloud project that contains the Cloud Storage bucket you're creating.
    • LOCATION_NAME: the location of the bucket where you want to store the files you're exporting. For example, us-east1.
    • BUCKET_NAME: the name of the bucket, subject to naming requirements. For example, my-bucket.
  2. Provide your instance with the storage.objectAdmin IAM role for your bucket. For help with setting IAM permissions, see Use IAM permissions.
  3. Export data from Cloud SQL to multiple files in parallel:

    Before using any of the request data, make the following replacements:

    • PROJECT_NAME: the name of the Google Cloud project that contains the Cloud Storage bucket you created.
    • INSTANCE_NAME: the name of the Cloud SQL instance from which you're exporting files in parallel.
    • BUCKET_NAME: the name of the Cloud Storage bucket.
    • BUCKET_PATH: the path to the bucket where the export files are stored.
    • FOLDER_NAME: the folder where the export files are stored.
    • DATABASE_NAME: the name of the database inside of the Cloud SQL instance from which the export is made. You must specify only one database.
    • THREAD_NUMBER: the number of threads that Cloud SQL uses to export files in parallel. For example, if you want to export three files at a time in parallel, then specify 3 as the value for this parameter.

    HTTP method and URL:

    POST https://sqladmin.googleapis.com/sql/v1beta4/projects/PROJECT_NAME/instances/INSTANCE_NAME/export

    Request JSON body:

    {
     "exportContext":
       {
          "fileType": "SQL",
          "uri": "gs://BUCKET_NAME/BUCKET_PATH/FOLDER_NAME",
          "databases": ["DATABASE_NAME"],
          "offload": [TRUE|FALSE],
          "sqlExportOptions": {
            "parallel": [TRUE|FALSE],
            "threads": [THREAD_NUMBER]
           }
       }
    }
    

    To send your request, expand one of these options:

    You should receive a JSON response similar to the following:

  4. After the export completes, you should have files in a folder in the Cloud Storage bucket in the pg_dump directory format.

  5. If you don't need the IAM role that you set in Required roles and permissions for exporting from Cloud SQL for PostgreSQL, then revoke it.
For the complete list of parameters for the request, see the Cloud SQL Admin API page.

Import data from multiple files in parallel to Cloud SQL for PostgreSQL

The following sections contain information about importing data from multiple files in parallel to Cloud SQL for PostgreSQL.

Required roles and permissions for importing data from multiple files in parallel to Cloud SQL for PostgreSQL

To import data from Cloud Storage into Cloud SQL, the user initiating the import must have one of the following roles:

Additionally, the service account for the Cloud SQL instance must have one of the following roles:

  • The storage.objectAdmin IAM role
  • A custom role, including the following permissions:
    • storage.objects.get
    • storage.objects.list (for importing files in parallel only)

For help with IAM roles, see Identity and Access Management.

Import data to Cloud SQL for PostgreSQL

You can import data in parallel from multiple files that reside in Cloud Storage to your database. To do this, use the pg_restore utility with the --jobs option.

gcloud

To import data from multiple files in parallel into Cloud SQL, complete the following steps:

  1. Create a Cloud Storage bucket.
  2. Upload the files to your bucket.

    For help with uploading files to buckets, see Upload objects from files.

  3. To find the service account for the Cloud SQL instance that you're importing files to, use the
    gcloud sql instances describe command.
    gcloud sql instances describe INSTANCE_NAME
    
  4. Replace INSTANCE_NAME with the name of your Cloud SQL instance.

    In the output, look for the value that's associated with the serviceAccountEmailAddress field.

  5. To grant the storage.objectAdmin IAM role to the service account, use the gsutil iam utility. For help with setting IAM permissions, see Use IAM permissions.
  6. To import data from multiple files in parallel into Cloud SQL, use the gcloud sql import sql command:
    gcloud sql import sql INSTANCE_NAME gs://BUCKET_NAME/BUCKET_PATH/FOLDER_NAME \
    --offload \
    --parallel \
    --threads=THREAD_NUMBER \
    --database=DATABASE_NAME
    

    Make the following replacements:

    • INSTANCE_NAME: the name of the Cloud SQL instance to which you're importing files in parallel.
    • BUCKET_NAME: the name of the Cloud Storage bucket.
    • BUCKET_PATH: the path to the bucket where the import files are stored.
    • FOLDER_NAME: the folder where the import files are stored.
    • THREAD_NUMBER: the number of threads that Cloud SQL uses to import files in parallel. For example, if you want to import three files at a time in parallel, then specify 3 as the value for this parameter.
    • DATABASE_NAME: the name of the database inside of the Cloud SQL instance from which the import is made. You must specify only one database.

    If the command returns an error like ERROR_RDBMS, then review the permissions; this error is often due to permissions issues.

  7. If you don't need the IAM permissions that you set in Required roles and permissions for importing to Cloud SQL for PostgreSQL, then use gsutil iam to remove them.

REST v1

To import data from multiple files in parallel into Cloud SQL, complete the following steps:

  1. Create a Cloud Storage bucket:
    gsutil mb -p PROJECT_NAME -l LOCATION_NAME gs://BUCKET_NAME
    
    Make the following replacements:
    • PROJECT_NAME: the name of the Google Cloud project that contains the Cloud Storage bucket you're creating.
    • LOCATION_NAME: the location of the bucket where you want to store the files you're importing. For example, us-east1.
    • BUCKET_NAME: the name of the bucket, subject to naming requirements. For example, my-bucket.
  2. Upload the files to your bucket.

    For help with uploading files to buckets, see Upload objects from files.

  3. Provide your instance with the storage.objectAdmin IAM role for your bucket. For help with setting IAM permissions, see Use IAM permissions.
  4. Import data from multiple files in parallel into Cloud SQL:

    Before using any of the request data, make the following replacements:

    • PROJECT_NAME: the name of the Google Cloud project that contains the Cloud Storage bucket you created.
    • INSTANCE_NAME: the name of the Cloud SQL instance to which you're importing files in parallel.
    • BUCKET_NAME: the name of the Cloud Storage bucket.
    • BUCKET_PATH: the path to the bucket where the import files are stored.
    • FOLDER_NAME: the folder where the import files are stored.
    • DATABASE_NAME: the name of the database inside of the Cloud SQL instance from which the import is made. You must specify only one database.
    • THREAD_NUMBER: the number of threads that Cloud SQL uses to import files in parallel. For example, if you want to import three files at a time in parallel, then specify 3 as the value for this parameter.

    HTTP method and URL:

    POST https://sqladmin.googleapis.com/v1/projects/PROJECT_NAME/instances/INSTANCE_NAME/import

    Request JSON body:

    {
     "importContext":
       {
          "fileType": "SQL",
          "uri": "gs://BUCKET_NAME/BUCKET_PATH/FOLDER_NAME",
          "databases": ["DATABASE_NAME"],
          "offload": [TRUE|FALSE],
          "sqlImportOptions": {
            "parallel": [TRUE|FALSE],
            "threads": [THREAD_NUMBER]
           }
       }
    }
    

    To send your request, expand one of these options:

    You should receive a JSON response similar to the following:

    To use a different user for the import, specify the importContext.importUser property.

    For the complete list of parameters for the request, see the Cloud SQL Admin API page.
  5. If you don't need the IAM permissions that you set in Required roles and permissions for importing to Cloud SQL for PostgreSQL, then use gsutil iam to remove them.

REST v1beta4

To import data from multiple files in parallel into Cloud SQL, complete the following steps:

  1. Create a Cloud Storage bucket:
    gsutil mb -p PROJECT_NAME -l LOCATION_NAME gs://BUCKET_NAME
    
    Make the following replacements:
    • PROJECT_NAME: the name of the Google Cloud project that contains the Cloud Storage bucket you're creating.
    • LOCATION_NAME: the location of the bucket where you want to store the files you're importing. For example, us-east1.
    • BUCKET_NAME: the name of the bucket, subject to naming requirements. For example, my-bucket.
  2. Upload the files to your bucket.

    For help with uploading files to buckets, see Upload objects from files.

  3. Provide your instance with the storage.objectAdmin IAM role for your bucket. For help with setting IAM permissions, see Use IAM permissions.
  4. Import data from multiple files in parallel into Cloud SQL:

    Before using any of the request data, make the following replacements:

    • PROJECT_NAME: the name of the Google Cloud project that contains the Cloud Storage bucket you created.
    • INSTANCE_NAME: the name of the Cloud SQL instance from which you're importing files in parallel.
    • BUCKET_NAME: the name of the Cloud Storage bucket.
    • BUCKET_PATH: the path to the bucket where the import files are stored.
    • FOLDER_NAME: the folder where the import files are stored.
    • DATABASE_NAME: the name of the database inside of the Cloud SQL instance from which the import is made. You must specify only one database.
    • THREAD_NUMBER: the number of threads that Cloud SQL uses to import files in parallel. For example, if you want to import three files at a time in parallel, then specify 3 as the value for this parameter.

    HTTP method and URL:

    POST https://sqladmin.googleapis.com/sql/v1beta4/projects/PROJECT_NAME/instances/INSTANCE_NAME/import

    Request JSON body:

    {
     "importContext":
       {
          "fileType": "SQL",
          "uri": "gs://BUCKET_NAME/BUCKET_PATH/FOLDER_NAME",
          "databases": ["DATABASE_NAME"],
          "offload": [TRUE|FALSE],
          "sqlImportOptions": {
            "parallel": [TRUE|FALSE],
            "threads": [THREAD_NUMBER]
           }
       }
    }
    

    To send your request, expand one of these options:

    You should receive a JSON response similar to the following:

    To use a different user for the import, specify the importContext.importUser property.

    For the complete list of parameters for the request, see the Cloud SQL Admin API page.
  5. If you don't need the IAM permissions that you set in Required roles and permissions for importing to Cloud SQL for PostgreSQL, then use gsutil iam to remove them.

Limitations

  • If you specify too many threads when you import or export data from multiple files in parallel, then you might use more memory than your Cloud SQL instance has. If this occurs, then an internal error message appears. Check the memory usage of your instance and increase the instance's size, as needed. For more information, see About instance settings.
  • When performing an export, commas in database names or table names in the databases or tables fields aren't supported.
  • Make sure that you have enough disk space for the initial dump file download. Otherwise, a no space left on disk error appears.
  • If your instance has only one virtual CPU (vCPU), then you can't import or export multiple files in parallel. The number of vCPUs for your instance can't be smaller than the number of threads that you're using for the import or export operation, and the number of threads must be at least two.
  • The pg_dump utility can't chunk any tables that you export. Therefore, if you have one very large table, then it can become a bottleneck for the speed of the export operation.

What's next