Exporting to BigQuery

This topic shows you how to export the asset metadata for your organization, folder, or project to a BigQuery table, and then run data analysis on your inventory. BigQuery provides a SQL-like experience for users to analyze data and produce meaningful insights without the use of custom scripts.

Before you begin

Before you begin, complete the following steps.

  1. Enable the Cloud Asset Inventory API on the project where you'll be running the API commands.
    Enable the Cloud Asset Inventory API

  2. Configure the permissions that are required to call the Cloud Asset Inventory API using either the gcloud tool or the API.

  3. Complete the following steps to set up your environment.

    gcloud

    To set up your environment to use the gcloud tool to call the Cloud Asset Inventory API, install the Cloud SDK on your local client.

    API

    To set up your environment to call the Cloud Asset Inventory API with the Unix curl command, complete the following steps.

    1. Install oauth2l on your local machine so you can interact with the Google OAuth system.
    2. Confirm that you have access to the Unix curl command.
    3. Ensure that you grant your account one of the following roles on your project, folder, or organization.

      • Cloud Asset Viewer role (roles/cloudasset.viewer)
      • Owner basic role (roles/owner)

    Note that the gcloud tool uses the billing project as the consumer project. If you receive a permission denied message, you can check if the billing project is different from the core project:

    gcloud config list
    

    To set the billing project to the consumer project:

    gcloud config set billing/quota_project CONSUMER_PROJECT_NUMBER
    
  4. If you're exporting to a BigQuery dataset in a project that does not have the Cloud Asset Inventory API enabled, you must also grant the following roles to the service-${CONSUMER_PROJECT_NUMBER}@gcp-sa-cloudasset.iam.gserviceaccount.com service account in the destination project.

    • BigQuery Data Editor role (roles/bigquery.dataEditor)
    • BigQuery User role (roles/bigquery.user)

    The service account will be created by calling the API once, or you can use the following commands to create the service account and grant the service agent role manually:

      gcloud beta services identity create --service=cloudasset.googleapis.com --project=PROJECT_ID
      gcloud projects add-iam-policy-binding PROJECT_ID --member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-cloudasset.iam.gserviceaccount.com --role=roles/cloudasset.serviceAgent
    

  5. Create a BigQuery dataset.

Exporting an asset snapshot

To export an asset snapshot at a given timestamp, complete the following steps.

gcloud

To export assets in a project, run the following command. This command stores the exported snapshot in a BigQuery table at BIGQUERY_TABLE.

  gcloud asset export \
     --content-type CONTENT_TYPE \
     --project 'PROJECT_ID' \
     --snapshot-time 'SNAPSHOT_TIME' \
     --bigquery-table 'BIGQUERY_TABLE' \
     --output-bigquery-force

Where:

  • CONTENT_TYPE is the asset content type.
  • PROJECT_ID is the ID of the project whose metadata is being exported. This project can be the one from which you're running the export or a different project.
  • SNAPSHOT_TIME (Optional) is the time at which you want to take a snapshot of your assets. The value must be the current time or a time in the past. By default, a snapshot is taken at the current time. For information on time formats, see gcloud topic datetimes.
  • BIGQUERY_TABLE is the table to which you're exporting your metadata, in the format projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_NAME.
  • --output-bigquery-force overwrites the destination table if it exists.

To export the assets of an organization or folder, you can use one of the following flags in place of --project.

access-policy can only be exported for an --organization.

API

To export the asset metadata in your project, run the following command. This command stores the exported snapshot in a BigQuery table named TABLE_NAME. Learn more about the exportAssets method.

gcurl -d '{"contentType":"CONTENT_TYPE", \
  "outputConfig":{ \
    "bigqueryDestination": { \
      "dataset": "projects/PROJECT_ID/datasets/DATASET_ID",\
      "table": "TABLE_NAME", \
      "force": true \
    } \
  }}' \
  https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER:exportAssets

Setting the content type

Every BigQuery table is defined by a schema that describes the column names, data types, and other information. Setting the content type during the export determines the schema for your table.

  • Resource or unspecified: When you set the content type to RESOURCE or do not set the content type, you create a BigQuery table that has the schema shown in figure 1. Resource.data is the resource metadata represented as a JSON string.

  • IAM policy: When you set the content type to IAM_POLICY in the REST API or iam-policy in the gcloud tool, you create a BigQuery table that has the schema shown in figure 2. The iam_policy RECORD is fully expanded.

  • Organization policy: When you set the content type to ORG_POLICY in the REST API or org-policy in the gcloud tool, you create a BigQuery table that has the schema shown in figure 3.

  • VPCSC policy: When you set content type to ACCESS_POLICY in the REST API or access-policy in the gcloud tool, you create a BigQuery table that has the schema shown in figure 4.

  • OSConfig instance inventory: When you set content type to OS_INVENTORY in the REST API or os-inventory in the gcloud tool, you create a BigQuery table that has the schema shown in figure 5.

Separate tables per resource type

To export an asset snapshot at a given timestamp per resource type, complete the following steps:

gcloud

To export assets in a project per resource type, run the following command. This command stores the exported snapshot in zero tables if the snapshot results are empty, or in multiple BigQuery tables. Each table contains the results of one asset type, and has BIGQUERY_TABLE concatenated with _ (underscore) and the asset type name. Non-alphanumeric characters are replaced with _.

  gcloud asset export \
     --content-type CONTENT_TYPE \
     --project 'PROJECT_ID' \
     --snapshot-time 'SNAPSHOT_TIME' \
     --bigquery-table 'BIGQUERY_TABLE' \
     --output-bigquery-force \
     --per-asset-type

Where:

  • CONTENT_TYPE is the asset content type.
  • PROJECT_ID is the ID of the project whose metadata is being exported. This project can be the one from which you're running the export, or a different project.
  • SNAPSHOT_TIME (Optional) is the time at which you want to take a snapshot of your assets. The value must be the current time or a time in the past. By default, a snapshot is taken at the current time. See the gcloud topic datetimes for more information on valid time formats.
  • BIGQUERY_TABLE is the table to which you're exporting your metadata, in the format projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_NAME.
  • --output-bigquery-force overwrites the destination table if it exists.
  • --per-asset-type exports to multiple BigQuery tables per resource type.

API

To export assets in a project per resource type, run the following command. This command stores the exported snapshot in zero tables if the snapshot results are empty, or multiple BigQuery tables. Each table contains the results of one asset type, and has BIGQUERY_TABLE concatenated with _ (underscore) and the asset type name. Non-alphanumeric characters are replaced with _. See the exportAssets method for more information.

gcurl -d '{"contentType":"CONTENT_TYPE", \
  "outputConfig":{ \
    "bigqueryDestination": { \
      "dataset": "projects/PROJECT_ID/datasets/DATASET_ID",\
      "table": "TABLE_NAME", \
      "force": true \
      "separateTablesPerAssetType": true \
    } \
  }}' \
  https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER:exportAssets

If exporting to any table fails, the entire export operation will fail and return the first error. But the results of succeeded exports will persist.

Setting the content type during the export determines the schema for each of the table.

  • Resource: When you set the content type to RESOURCE, the schema of each table will include RECORD-type columns mapped to the nested fields in the Resource.data field of that asset type (up to the 15 nested level BigQuery supports). Please see Per-type bigquery schemas of example tables in projects/export-assets-examples/datasets/structured_export.

  • IAM policy, Organization policy, VPCSC policy or unspecified: When you set the content type to IAM_POLICY, ORG_POLICY, ACCESS_POLICY or do not set the content type, each table has the same schema as you set the per-asset-type to False, please see the schemas in the Setting the content type section for more details.

The following types are packed in a JSON string to overcome the compatibility issue between JSON3 and BigQuery types.

  • google.protobuf.Timestamp
  • google.protobuf.Duration
  • google.protobuf.FieldMask
  • google.protobuf.ListValue
  • google.protobuf.Value
  • google.protobuf.Struct
  • google.api.*

Export to partitioned tables

To export an asset snapshot at a given timestamp in a partitioned table, complete the following steps.

gcloud

To export assets in a project in partitioned table(s), run the following command. This command stores the exported snapshot in a BigQuery table at BIGQUERY_TABLE with daily granularity and two additional timestamp columns, readTime and requestTime, one of which will be the partition key according to your partition-key parameter.

  gcloud asset export \
     --content-type CONTENT_TYPE \
     --project 'PROJECT_ID' \
     --snapshot-time 'SNAPSHOT_TIME' \
     --bigquery-table 'BIGQUERY_TABLE' \
     --partition-key 'PARTITION_KEY' \
     --output-bigquery-force \

Where:

  • CONTENT_TYPE is the asset content type.
  • PROJECT_ID is the ID of the project whose metadata exported. This project can be the one from which you're running the export or a different project.
  • SNAPSHOT_TIME (Optional) is the time at which you want to take a snapshot of your assets. The value must be the current time or a time in the past. By default, a snapshot is taken at the current time. For information on time formats, see gcloud topic datetimes.
  • BIGQUERY_TABLE is the table to which you're exporting your metadata, in the format projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_NAME.
  • PARTITION_KEY is the partition key column when exporting to BigQuery partitioned tables.
  • --output-bigquery-force overwrites the destination table if it exists.

API

To export assets in a project in partitioned table(s), run the following command. This command stores the exported snapshot in a BigQuery table at BIGQUERY_TABLE with daily granularity and two additional timestamp columns, readTime and requestTime, one of which will be the partition key according to your partition-key parameter. Learn more about the exportAssets method.

gcurl -d '{"contentType":"CONTENT_TYPE", \
  "outputConfig":{ \
    "bigqueryDestination": { \
      "dataset": "projects/PROJECT_ID/datasets/DATASET_ID",\
      "table": "TABLE_NAME", \
      "force": true \
      "partitionSpec": {"partitionKey": "PARTITION_KEY"} \
    } \
  }}' \
  https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER:exportAssets

In the case when a destination table already exists, the existing table's schema will be updated as necessary by appending additional columns. This schema update will fail if any columns change their type or mode, such as optional to repeated. Then, if the output-bigquery-force flag is set to TRUE, the corresponding partition will be overwritten by the snapshot results, however data in one or more different partitions will remain intact. If output-bigquery-force is unset or FALSE, it will append the data to the corresponding partition.

The export operation will fail if the schema update or attempt to append data fails.

Checking the status of an export

To check the status of an export, run the following commands.

gcloud

To check the status of the export, you can run the following command. It is displayed in the gcloud tool after running the export command.

gcloud asset operations describe OPERATION_ID

API

To view the status of your export, run the following command with the operation ID returned in the response to your export.

  1. You can find the OPERATION_ID in the name field of the response to the export, which is formatted as follows:

    "name": "projects/PROJECT_NUMBER/operations/ExportAssets/CONTENT_TYPE/OPERATION_ID"
    
  2. To check the status of your export, run following command with the OPERATION_ID:

    gcurl https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER/operations/ExportAssets/CONTENT_TYPE/OPERATION_ID
    

Viewing an asset snapshot

To view the table containing the asset snapshot metadata, complete the following steps.

Console

  1. Go to the BigQuery page in the Cloud Console.
    Go to the BigQuery page

  2. To display the tables and views in the dataset, open the navigation panel. In the Resources section, select your project to expand it, and then select a dataset.

  3. From the list, select your table.

  4. Select Details and note the value in Number of rows. You may need this value to control the starting point for your results using the gcloud tool or API.

  5. To view a sample set of data, select Preview.

API

To browse your table's data, call tabledata.list. In the tableId parameter, specify the name of your table.

You can configure the following optional parameters to control the output.

  • maxResults is the maximum number of results to return.
  • selectedFields is a comma-separated list of columns to return; If unspecified, all columns are returned.
  • startIndex is the zero-based index of the starting row to read.

Values are returned wrapped in a JSON object that you must parse, as described in the tabledata.list reference documentation.

The export lists the assets and their resource names.

Querying an asset snapshot

After you export your snapshot to BigQuery, you can run queries on your asset metadata. See Exporting to BigQuery Sample Queries to learn more about several typical use cases.

By default, BigQuery runs interactive, or on-demand, query jobs, which means that the query is executed as soon as possible. Interactive queries count towards your concurrent rate limit and your daily limit.

Query results are saved to either a temporary or permanent table. You can choose to append or overwrite data in an existing table or to create a new table, if none exists with the same name.

To run an interactive query that writes the output to a temporary table, complete the following steps.

Console

  1. Go to the BigQuery page in the Cloud Console.
    Go to the BigQuery page

  2. Select Compose new query.

  3. In the Query editor text area, enter a valid BigQuery SQL query.

  4. (Optional) To change the data processing location, complete the following steps.

    1. Select More, and then select Query settings.
    2. Under Processing location, select Auto-select, and then choose your data's location.
    3. To update the query settings, select Save.
  5. Select Run.

API

  1. To start a new job, call the jobs.insert method. In the job resource, set the following parameters.

    • In the configuration field, set the query field to a JobConfigurationQuery that describes the BigQuery query job.

    • In the jobReference field, set the location field appropriately for your job.

  2. To poll for results, call getQueryResults. Poll until jobComplete equals true. You can check for errors and warnings in the errors list.