Exporting to BigQuery

Stay organized with collections Save and categorize content based on your preferences.

This topic shows you how to export the asset metadata for your organization, folder, or project to a BigQuery table, and then run data analysis on your inventory. BigQuery provides a SQL-like experience for users to analyze data and produce meaningful insights without the use of custom scripts.

Before you begin

Before you begin, complete the following steps.

  1. Enable the Cloud Asset Inventory API on the project where you'll be running the API commands.
    Enable the Cloud Asset Inventory API

  2. Configure the permissions that are required to call the Cloud Asset Inventory API using either the gcloud CLI or the API.

  3. Complete the following steps to set up your environment.

    gcloud

    To set up your environment to use the gcloud CLI to call the Cloud Asset Inventory API, install the Google Cloud CLI on your local client.

    API

    To set up your environment to call the Cloud Asset Inventory API with the Unix curl command, complete the following steps.

    1. Install oauth2l on your local machine so you can interact with the Google OAuth system.
    2. Confirm that you have access to the Unix curl command.
    3. Ensure that you grant your account one of the following roles on your project, folder, or organization.

      • Cloud Asset Viewer role (roles/cloudasset.viewer)
      • Owner basic role (roles/owner)

    Note that the gcloud CLI uses the billing project as the consumer project. If you receive a permission denied message, you can check if the billing project is different from the core project:

    gcloud config list
    

    To set the billing project to the consumer project:

    gcloud config set billing/quota_project CONSUMER_PROJECT_NUMBER
    
  4. If you're exporting to a BigQuery dataset in a project that does not have the Cloud Asset Inventory API enabled, you must also grant the following roles to the service-${CONSUMER_PROJECT_NUMBER}@gcp-sa-cloudasset.iam.gserviceaccount.com service account in the destination project.

    • BigQuery Data Editor role (roles/bigquery.dataEditor)
    • BigQuery User role (roles/bigquery.user)

    The service account will be created by calling the API once, or you can use the following commands to create the service account and grant the service agent role manually:

      gcloud beta services identity create --service=cloudasset.googleapis.com --project=PROJECT_ID
      gcloud projects add-iam-policy-binding PROJECT_ID --member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-cloudasset.iam.gserviceaccount.com --role=roles/cloudasset.serviceAgent
    

  5. Create a BigQuery dataset.

Limitations

  • BigQuery tables encrypted with custom Cloud Key Management Service (Cloud KMS) keys are not supported.

  • Appending the export output to an existing table is not supported unless you are exporting to a partitioned table. The destination table must be empty or you must overwrite it. To overwrite it, use the --output-bigquery-force flag with the gcloud CLI, or use force with the API.

  • Google Kubernetes Engine (GKE) resource types, except for container.googleapis.com/Cluster and container.googleapis.com/NodePool, are not supported when exporting to separate tables per resource type.

Setting the BigQuery schema for the export

Every BigQuery table is defined by a schema that describes the column names, data types, and other information. Setting the content type during the export determines the schema for your table.

  • Resource or unspecified: When you set the content type to RESOURCE or do not specify it, and you set the per-asset-type flag to false or do not use it, you create a BigQuery table that has the schema shown in figure 1. Resource.data is the resource metadata represented as a JSON string.

    When you set the content type to RESOURCE or do not set the content type, and set the per-asset-type flag to true, you create separate tables per asset type. The schema of each table include RECORD-type columns mapped to the nested fields in the Resource.data field of that asset type (up to the 15 nested levels that BigQuery supports). For per-type BigQuery example tables, see projects/export-assets-examples/datasets/structured_export.

  • IAM policy: When you set the content type to IAM_POLICY in the REST API or iam-policy in the gcloud CLI, you create a BigQuery table that has the schema shown in figure 2. The iam_policy RECORD is fully expanded.

  • Organization policy: When you set the content type to ORG_POLICY in the REST API or org-policy in the gcloud CLI, you create a BigQuery table that has the schema shown in figure 3.

  • VPCSC policy: When you set content type to ACCESS_POLICY in the REST API or access-policy in the gcloud CLI, you create a BigQuery table that has the schema shown in figure 4.

  • OSConfig instance inventory: When you set content type to OS_INVENTORY in the REST API or os-inventory in the gcloud CLI, you create a BigQuery table that has the schema shown in figure 5.

  • Relationship: When you set content type to RELATIONSHIP in the REST API or relationship in the gcloud CLI, you create a BigQuery table that has the following schema shown in figure 6.

Separate tables per resource type

To export an asset snapshot at a given timestamp, complete the following steps.

gcloud

To export assets in a project, run the following command. This command stores the exported snapshot in a BigQuery table at BIGQUERY_TABLE.

  gcloud asset export \
     --content-type CONTENT_TYPE \
     --project 'PROJECT_ID' \
     --snapshot-time 'SNAPSHOT_TIME' \
     --bigquery-table 'BIGQUERY_TABLE' \
     --output-bigquery-force

Where:

  • CONTENT_TYPE is the asset content type.
  • PROJECT_ID is the ID of the project whose metadata is being exported. This project can be the one from which you're running the export or a different project.
  • SNAPSHOT_TIME (Optional) is the time at which you want to take a snapshot of your assets. The value must be the current time or a time in the past. By default, a snapshot is taken at the current time. For information on time formats, see gcloud topic datetimes.
  • BIGQUERY_TABLE is the table to which you're exporting your metadata, in the format projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_NAME.
  • --output-bigquery-force overwrites the destination table if it exists.

To export the assets of an organization or folder, you can use one of the following flags in place of --project.

access-policy can only be exported for an --organization.

API

To export the asset metadata in your project, run the following command. This command stores the exported snapshot in a BigQuery table named TABLE_NAME. Learn more about the exportAssets method.

gcurl -d '{"contentType":"CONTENT_TYPE", \
  "outputConfig":{ \
    "bigqueryDestination": { \
      "dataset": "projects/PROJECT_ID/datasets/DATASET_ID",\
      "table": "TABLE_NAME", \
      "force": true \
    } \
  }}' \
  https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER:exportAssets

Exporting separate tables for each resource type

To export assets in a project to separate BigQuery tables for each resource type, use the --per-asset-type flag. Each table's name is BIGQUERY_TABLE concatenated with _ (underscore) and ASSET_TYPE_NAME. Non-alphanumeric characters are replaced with _.

Note that GKE resource types, except for container.googleapis.com/Cluster and container.googleapis.com/NodePool, are not supported for this type of export.

To export assets separate BigQuery tables for each resource type, run the following command.

gcloud

  gcloud asset export \
     --content-type CONTENT_TYPE \
     --project 'PROJECT_ID' \
     --snapshot-time 'SNAPSHOT_TIME' \
     --bigquery-table 'BIGQUERY_TABLE' \
     --output-bigquery-force \
     --per-asset-type

Where:

  • CONTENT_TYPE is the asset content type. This value also determines the schema for the export.
  • PROJECT_ID is the ID of the project whose metadata is being exported. This project can be the one from which you're running the export, or a different project.
  • SNAPSHOT_TIME (Optional) is the time at which you want to take a snapshot of your assets. The value must be the current time or a time in the past. By default, a snapshot is taken at the current time. See the gcloud topic datetimes for more information on valid time formats.
  • BIGQUERY_TABLE is the table to which you're exporting your metadata, in the format projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_NAME.
  • --output-bigquery-force overwrites the destination table if it exists.
  • --per-asset-type exports to multiple BigQuery tables per resource type.

API

gcurl -d '{"contentType":"CONTENT_TYPE", \
  "outputConfig":{ \
    "bigqueryDestination": { \
      "dataset": "projects/PROJECT_ID/datasets/DATASET_ID",\
      "table": "TABLE_NAME", \
      "force": true \
      "separateTablesPerAssetType": true \
    } \
  }}' \
  https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER:exportAssets

Learn more about the exportAssets method.

If exporting to any table fails, the entire export operation fails and return the first error. Results of previous successful exports persist.

The following types are packed in a JSON string to overcome the compatibility issue between JSON3 and BigQuery types.

  • google.protobuf.Timestamp
  • google.protobuf.Duration
  • google.protobuf.FieldMask
  • google.protobuf.ListValue
  • google.protobuf.Value
  • google.protobuf.Struct
  • google.api.*

Exporting to a partitioned table

To export assets in a project to partitioned tables, use the --partition-key flag. The exported snapshot is stored in a BigQuery table at BIGQUERY_TABLE with daily granularity and two additional timestamp columns, readTime and requestTime, one of which is the partition-key parameter.

To export assets in a project to partitioned tables, run the following command.

gcloud

  gcloud asset export \
     --content-type CONTENT_TYPE \
     --project 'PROJECT_ID' \
     --snapshot-time 'SNAPSHOT_TIME' \
     --bigquery-table 'BIGQUERY_TABLE' \
     --partition-key 'PARTITION_KEY' \
     --output-bigquery-force \

Where:

  • CONTENT_TYPE is the asset content type. This value also determines the schema for the export.
  • PROJECT_ID is the ID of the project whose metadata exported. This project can be the one from which you're running the export or a different project.
  • SNAPSHOT_TIME (Optional) is the time at which you want to take a snapshot of your assets. The value must be the current time or a time in the past. By default, a snapshot is taken at the current time. For information on time formats, see gcloud topic datetimes.
  • BIGQUERY_TABLE is the table to which you're exporting your metadata, in the format projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_NAME.
  • PARTITION_KEY is the partition key column when exporting to BigQuery partitioned tables.
  • --output-bigquery-force overwrites the destination table if it exists.

API

gcurl -d '{"contentType":"CONTENT_TYPE", \
  "outputConfig":{ \
    "bigqueryDestination": { \
      "dataset": "projects/PROJECT_ID/datasets/DATASET_ID",\
      "table": "TABLE_NAME", \
      "force": true \
      "partitionSpec": {"partitionKey": "PARTITION_KEY"} \
    } \
  }}' \
  https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER:exportAssets

Learn more about the exportAssets method.

In the case when a destination table already exists, the existing table's schema is updated as necessary by appending additional columns. This schema update fails if any columns change their type or mode, such as optional to repeated. Then, if the output-bigquery-force flag is set to TRUE, the corresponding partition is overwritten by the snapshot results, however data in one or more different partitions remains intact. If output-bigquery-force is unset or FALSE, it appends the data to the corresponding partition.

The export operation fails if the schema update or attempt to append data fails.

Checking the status of an export

To check the status of an export, run the following commands.

gcloud

To check the status of the export, you can run the following command. It is displayed in the gcloud CLI after running the export command.

gcloud asset operations describe OPERATION_ID

API

To view the status of your export, run the following command with the operation ID returned in the response to your export.

  1. You can find the OPERATION_ID in the name field of the response to the export, which is formatted as follows:

    "name": "projects/PROJECT_NUMBER/operations/ExportAssets/CONTENT_TYPE/OPERATION_ID"
    
  2. To check the status of your export, run following command with the OPERATION_ID:

    gcurl https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER/operations/ExportAssets/CONTENT_TYPE/OPERATION_ID
    

Viewing an asset snapshot

To view the table containing the asset snapshot metadata, complete the following steps.

Console

  1. Go to the BigQuery page in the Google Cloud console.
    Go to the BigQuery page

  2. To display the tables and views in the dataset, open the navigation panel. In the Resources section, select your project to expand it, and then select a dataset.

  3. From the list, select your table.

  4. Select Details and note the value in Number of rows. You may need this value to control the starting point for your results using the gcloud CLI or API.

  5. To view a sample set of data, select Preview.

API

To browse your table's data, call tabledata.list. In the tableId parameter, specify the name of your table.

You can configure the following optional parameters to control the output.

  • maxResults is the maximum number of results to return.
  • selectedFields is a comma-separated list of columns to return; If unspecified, all columns are returned.
  • startIndex is the zero-based index of the starting row to read.

Values are returned wrapped in a JSON object that you must parse, as described in the tabledata.list reference documentation.

The export lists the assets and their resource names.

Querying an asset snapshot

After you export your snapshot to BigQuery, you can run queries on your asset metadata. See Exporting to BigQuery Sample Queries to learn more about several typical use cases.

By default, BigQuery runs interactive, or on-demand, query jobs, which means that the query is executed as soon as possible. Interactive queries count towards your concurrent rate limit and your daily limit.

Query results are saved to either a temporary or permanent table. You can choose to append or overwrite data in an existing table or to create a new table, if none exists with the same name.

To run an interactive query that writes the output to a temporary table, complete the following steps.

Console

  1. Go to the BigQuery page in the Google Cloud console.
    Go to the BigQuery page

  2. Select Compose new query.

  3. In the Query editor text area, enter a valid BigQuery SQL query.

  4. (Optional) To change the data processing location, complete the following steps.

    1. Select More, and then select Query settings.
    2. Under Processing location, select Auto-select, and then choose your data's location.
    3. To update the query settings, select Save.
  5. Select Run.

API

  1. To start a new job, call the jobs.insert method. In the job resource, set the following parameters.

    • In the configuration field, set the query field to a JobConfigurationQuery that describes the BigQuery query job.

    • In the jobReference field, set the location field appropriately for your job.

  2. To poll for results, call getQueryResults. Poll until jobComplete equals true. You can check for errors and warnings in the errors list.