Exporting Databases

This page describes how to export Cloud Spanner databases from Cloud Spanner using the Google Cloud Platform Console.

The process uses Cloud Dataflow and exports data to a folder in a Cloud Storage bucket. The resulting folder contains a set of Avro files and JSON manifest files.

Before you begin

To export a Cloud Spanner database, first you need to enable the Cloud Spanner, Cloud Storage, Compute Engine, and Cloud Dataflow APIs:

Enable the APIs

You also need enough quota and the required Cloud IAM permissions.

Quota requirements

The quota requirements for export jobs, by GCP service, are as follows:

  • Cloud Spanner: No additional nodes are required to export a database, though you might need to add more nodes so that your job finishes in a reasonable amount of time. See Optimizing jobs for more details.
  • Cloud Storage: You must create a bucket for your exported files if you do not already have one. You can do this in the GCP Console, either through the Cloud Storage page or while creating your export through the Cloud Spanner page. You do not need to set a size for your bucket.
  • Cloud Dataflow: Export jobs are subject to the same CPU, disk usage, and IP address Compute Engine quotas as other Cloud Dataflow jobs.
  • Compute Engine: Before running your export job, you must set up initial quotas for Compute Engine, which Cloud Dataflow uses. These quotas represent the maximum number of resources that you allow Cloud Dataflow to scale your job to. Recommended starting values are:

    • CPUs: 200
    • In-use IP addresses: 200
    • Standard persistent disk: 50 TB

    Generally, you do not have to make any other adjustments. Cloud Dataflow provides autoscaling so that you only pay for the actual resources used during the export. If your job can make use of more resources, the Cloud Dataflow UI displays a warning icon. The job should finish even if there is a warning icon.

Cloud IAM requirements

To export a database, you also need to have Cloud IAM roles with sufficient permissions to use all of the services involved in an export job. For information on granting roles and permissions, see Applying IAM Roles.

To export a database, you need the following roles:

  • At the GCP project level:
    • Cloud Spanner Viewer
    • Cloud Dataflow Admin
    • Storage Admin
  • At the Cloud Spanner database or instance level, or at the GCP project level:
    • Cloud Spanner Reader

Exporting a database

After you satisfy the quota and IAM requirements described above, you can export an existing Cloud Spanner database.

To export your Cloud Spanner database to a Cloud Storage bucket, follow these steps.

  1. Go to the Cloud Spanner Instances page.
    Go to the instances page
  2. Click the name of the instance that contains your database.
  3. Click Export Screenshot of export UI element.
  4. Under Choose where to store your export, click Browse.
  5. If you do not already have a Cloud Storage bucket for your export:

    1. Click New bucket Screenshot of new bucket UI element.
    2. Enter a name for your bucket. Bucket names must be unique across Cloud Storage.
    3. Select a default storage class and location, then click Create.
    4. Click your bucket to select it.

    If you already have a bucket, either select the bucket from the initial list or click Search Screenshot of search UI element to filter the list, then click your bucket to select it.

  6. Click Select.

  7. Select the database that you want to export in the Choose a database to export drop-down menu.
  8. Select a region in the Choose a region for the export job drop-down menu.

  9. Select the checkbox under Confirm charges to acknowledge that there are charges in addition to those incurred by your existing Cloud Spanner nodes.

  10. Click Export.

    The GCP Console displays the Database details page, which now shows a box describing your export job, including the job's elapsed time:

    Screenshot of in-progress job

When the job finishes or terminates, the GCP Console displays a message on the Database details page. If the job succeeds, a success message appears:

Export job success message

If the job does not succeed, a failure message appears:

Export job failure message

If your job fails, check the job's Cloud Dataflow logs for error details.

To avoid Cloud Storage charges for files your failed export job created, delete the folder and its files. See Viewing your export for information on how to find the folder.

Viewing your export in Cloud Storage

To view the folder that contains your exported database in the GCP Console, navigate to the Cloud Storage browser and click on the bucket you previously selected:
Go to the Cloud Storage browser

The bucket now contains a folder with the exported database inside. The folder name begins with your instance's ID, database name, and the timestamp of your export job. The folder contains:

  • A spanner-export.json file
  • A TableName-manifest.json file for each table in the database you exported.
  • One or more TableName.avro-#####-of-##### files. The first number in the extension .avro-#####-of-##### represents the index of the Avro file, starting at zero, and the second represents the number of Avro files generated for each table.

    For example, Songs.avro-00001-of-00002 is the second of two files that contain the data for the Songs table.

Choosing a region for your export job

You might want to choose a different region based on whether your Cloud Spanner instance uses a regional or multi-region configuration. To avoid network egress charges, choose a region that overlaps with your Cloud Spanner instance location.

Regional instance configurations

If your Cloud Spanner instance configuration is regional, choose the same region for your export job to take advantage of free egress within the same region.

If the same region is not available, charges will apply. Refer to the Cloud Spanner network egress pricing to choose a region that will incur the lowest network egress charges.

Multi-region instance configurations

If your Cloud Spanner instance configuration is multi-region, choose one of the regions that make up the multi-region configuration to take advantage of free egress within the same region.

If an overlapping region is not available, egress charges will apply. Refer to the Cloud Spanner network egress pricing to choose a region that will incur the lowest network egress charges.

Viewing or troubleshooting jobs in the Cloud Dataflow UI

After you start an export job, you can view details of the job, including logs, in the Cloud Dataflow section of the GCP Console.

Viewing Cloud Dataflow job details

To see details for a currently running job:

  1. Navigate to the Database details page for the database you exported.
  2. Click View job details in Cloud Dataflow in the job status message:

    In-progress job status message

    The GCP Console displays details of the Cloud Dataflow job.

To view a job that you ran recently:

  1. Navigate to the Database details page for the database you exported.
  2. Click the Import/Export tab.
  3. Click your job's name in the list.

    The GCP Console displays details of the Cloud Dataflow job.

To view a job that you ran more than one week ago:

  1. Go to the Cloud Dataflow jobs page in the GCP Console.
    Go to the jobs page
  2. Find your job in the list, then click its name.

    The GCP Console displays details of the Cloud Dataflow job.

Viewing Cloud Dataflow logs for your job

To view a Cloud Dataflow job's logs, navigate to the job's details page as described above, then click Logs to the right of the job's name.

If a job fails, look for errors in the logs. If there are errors, the error count displays next to Logs:

Error count example next to Logs button

To view job errors:

  1. Click on the error count next to Logs.

    The GCP Console displays the job's logs. You may need to scroll to see the errors.

  2. Locate entries with the error icon Error icon.

  3. Click on an individual log entry to expand its contents.

For more information about troubleshooting Cloud Dataflow jobs, see Troubleshooting Your Pipeline.

Optimizing slow running export jobs

If you have followed the suggestions in initial settings, you should generally not have to make any other adjustments. If your job is running slowly, there are a few other optimizations you can try:

  • Optimize the job and data location: Run your Cloud Dataflow job in the same region where your Cloud Spanner instance and Cloud Storage bucket are located.

  • Ensure sufficient Cloud Dataflow resources: If the relevant Compute Engine quotas limit your Cloud Dataflow job's resources, the job's Cloud Dataflow page in the Google Cloud Platform Console displays a warning icon Warning icon and log messages:

    Screenshot of quota limit warning

    In this situation, increasing the quotas for CPUs, in-use IP addresses, and standard persistent disk might shorten the run time of the job, but you might incur more Compute Engine charges.

  • Check the Cloud Spanner CPU utilization: If you see that the CPU utilization for the instance is over 75%, you can increase the number of nodes in that instance. The extra nodes add more Cloud Spanner resources and the job should speed up, but you incur more Cloud Spanner charges.

Factors affecting export job performance

Several factors influence the time it takes to complete an export job.

  • Cloud Spanner database size: Processing more data takes more time and resources.

  • Cloud Spanner database schema (including indexes): The number of tables, the size of the rows, and the number of secondary indexes influence the time it takes to run an export job.

  • Data location: Data is transferred between Cloud Spanner and Cloud Storage using Cloud Dataflow. Ideally all three components are located in the same region. If the components are not in the same region, moving the data across regions slows the job down.

  • Number of Cloud Dataflow workers: By using autoscaling, Cloud Dataflow chooses the number of workers for the job depending on the amount of work that needs to be done. The number of workers will, however, be capped by the quotas for CPUs, in-use IP addresses, and standard persistent disk. The Cloud Dataflow UI displays a warning icon if it encounters quota caps. In this situation, progress is slower, but the job should still complete.

  • Existing load on Cloud Spanner: An export job typically adds a light load on a Cloud Spanner instance. If the instance already has a substantial existing load, then the export job runs more slowly.

  • Number of Cloud Spanner nodes: If the CPU utilization for the instance is over 75%, then the job runs more slowly.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Spanner Documentation