Exporting to BigQuery

This topic shows you how to export the asset metadata for your organization, folder, or project to a BigQuery table, and then run data analysis on your inventory. BigQuery provides a SQL-like experience for users to analyze data and produce meaningful insights without the use of custom scripts.

Before you begin

Before you begin, complete the following steps.

  1. Enable the Cloud Asset Inventory API on the project where you'll be running the API commands.
    Enable the Cloud Asset Inventory API

  2. Configure the permissions that are required to call the Cloud Asset Inventory API using either the gcloud tool or the API.

  3. Complete the following steps to set up your environment.

    gcloud

    To set up your environment to use the gcloud tool to call the Cloud Asset Inventory API, install the Cloud SDK on your local client.

    API

    To set up your environment to call the Cloud Asset Inventory API with the Unix curl command, complete the following steps.

    1. Install oauth2l on your local machine so you can interact with the Google OAuth system.
    2. Confirm that you have access to the Unix curl command.
    3. Ensure that you grant your account one of the following roles on your project, folder, or organization.

      • Cloud Asset Viewer role (roles/cloudasset.viewer)
      • Owner basic role (roles/owner)
  4. If you're exporting to a BigQuery dataset in a project that does not have the Cloud Asset Inventory API enabled, you must also grant the following roles to the service-${CONSUMER_PROJECT_NUMBER}@gcp-sa-cloudasset.iam.gserviceaccount.com service account in the destination project.

    • BigQuery Data Editor role (roles/bigquery.dataEditor)
    • BigQuery User role (roles/bigquery.user)

    The service account will be created by calling the API once, or you can use the following command:

      gcloud beta services identity create --service=cloudasset.googleapis.com --project=PROJECT_ID
    

  5. Create a BigQuery dataset.

Exporting an asset snapshot

To export an asset snapshot at a given timestamp, complete the following steps.

gcloud

To export assets in a project, run the following command. This command stores the exported snapshot in a BigQuery table at BIGQUERY_TABLE.

  gcloud asset export \
     --content-type CONTENT_TYPE \
     --project 'PROJECT_ID' \
     --snapshot-time 'SNAPSHOT_TIME' \
     --bigquery-table 'BIGQUERY_TABLE' \
     --output-bigquery-force

Where:

  • CONTENT_TYPE is the asset content type.
  • PROJECT_ID is the ID of the project that is having its metadata exported. This project can be the one from which you're running the export or a different project.
  • SNAPSHOT_TIME (Optional) is the time at which you want to take a snapshot of your assets. The value must be the current time or a time in the past. By default, a snapshot is taken at the current time. For information on time formats, see gcloud topic datetimes.
  • BIGQUERY_TABLE is the table to which you're exporting your metadata, in the format projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_NAME.
  • --output-bigquery-force overwrites the destination table if it exists.

To export the assets of an organization or folder, you can use one of the following flags in place of --project.

access-policy can only be exported for an --organization.

API

To export the asset metadata in your project, run the following command. This command stores the exported snapshot in a BigQuery table named TABLE_NAME. Learn more about the exportAssets method.

gcurl -d '{"contentType":"CONTENT_TYPE", \
  "outputConfig":{ \
    "bigqueryDestination": { \
      "dataset": "projects/PROJECT_ID/datasets/DATASET_ID",\
      "table": "TABLE_NAME", \
      "force": true \
    } \
  }}' \
  https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER:exportAssets

Setting the content type

Every BigQuery table is defined by a schema that describes the column names, data types, and other information. Setting the content type during the export determines the schema for your table.

  • Resource or unspecified: When you set the content type to RESOURCE or do not set the content type, you create a BigQuery table that has the schema shown in figure 1. Resource.data is the resource metadata represented as a JSON string.

  • IAM policy: When you set the content type to IAM_POLICY in the REST API or iam-policy in the gcloud tool, you create a BigQuery table that has the schema shown in figure 2. The iam_policy RECORD is fully expanded.

  • Organization policy: When you set the content type to ORG_POLICY in the REST API or org-policy in the gcloud tool, you create a BigQuery table that has the schema shown in figure 3.

  • VPCSC policy: When you set content type to ACCESS_POLICY in the REST API or access-policy in the gcloud tool, you create a BigQuery table that has the schema shown in figure 4.

Checking the status of an export

To check the status of an export, run the following commands.

gcloud

To check the status of the export, you can run the following command. It is displayed in the gcloud tool after running the export command.

gcloud asset operations describe OPERATION_ID

API

To view the status of your export, run the following command with the operation ID returned in the response to your export.

  1. You can find the OPERATION_ID in the name field of the response to the export, which is formatted as follows:

    "name": "projects/PROJECT_NUMBER/operations/ExportAssets/CONTENT_TYPE/OPERATION_ID"
    
  2. To check the status of your export, run following command with the OPERATION_ID:

    gcurl https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER/operations/ExportAssets/CONTENT_TYPE/OPERATION_ID
    

Viewing an asset snapshot

To view the table containing the asset snapshot metadata, complete the following steps.

Console

  1. Go to the BigQuery page in the Cloud Console.
    Go to the BigQuery page

  2. To display the tables and views in the dataset, open the navigation panel. In the Resources section, select your project to expand it, and then select a dataset.

  3. From the list, select your table.

  4. Select Details and note the value in Number of rows. You may need this value to control the starting point for your results using the gcloud tool or API.

  5. To view a sample set of data, select Preview.

API

To browse your table's data, call tabledata.list. In the tableId parameter, specify the name of your table.

You can configure the following optional parameters to control the output.

  • maxResults is the maximum number of results to return.
  • selectedFields is a comma-separated list of columns to return; If unspecified, all columns are returned.
  • startIndex is the zero-based index of the starting row to read.

Values are returned wrapped in a JSON object that you must parse, as described in the tabledata.list reference documentation.

The export lists the assets and their resource names.

Querying an asset snapshot

After you export your snapshot to BigQuery, you can run queries on your asset metadata.

By default, BigQuery runs interactive, or on-demand, query jobs, which means that the query is executed as soon as possible. Interactive queries count towards your concurrent rate limit and your daily limit.

Query results are saved to either a temporary or permanent table. You can choose to append or overwrite data in an existing table or to create a new table, if none exists with the same name.

To run an interactive query that writes the output to a temporary table, complete the following steps.

Console

  1. Go to the BigQuery page in the Cloud Console.
    Go to the BigQuery page

  2. Select Compose new query.

  3. In the Query editor text area, enter a valid BigQuery SQL query.

  4. (Optional) To change the data processing location, complete the following steps.

    1. Select More, and then select Query settings.
    2. Under Processing location, select Auto-select, and then choose your data's location.
    3. To update the query settings, select Save.
  5. Select Run.

API

  1. To start a new job, call the jobs.insert method. In the job resource, set the following parameters.

    • In the configuration field, set the query field to a JobConfigurationQuery that describes the BigQuery query job.

    • In the jobReference field, set the location field appropriately for your job.

  2. To poll for results, call getQueryResults. Poll until jobComplete equals true. You can check for errors and warnings in the errors list.

Example queries

This section lists example SQL queries. For more information on BigQuery syntax, see Standard SQL Query Syntax.

Query directly on available columns

To find the quantity of each asset type, run the following query.

SELECT asset_type, COUNT(*) AS asset_count
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME`
GROUP BY asset_type
ORDER BY asset_count DESC

Query on repeated fields

To find Identity and Access Management (IAM) policies that contain contain Gmail accounts as a member, run the following query. BigQuery uses UNNEST to flatten repeated fields into a table that you can query directly.

SELECT name, asset_type, bindings.role
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME`
JOIN UNNEST(iam_policy.bindings) AS bindings
JOIN UNNEST(bindings.members) AS members
WHERE members like "%@gmail.com"

To find an organization, folder, or project that allows creating using a public IP, run the following query. This query is useful because allowing public IPs with Cloud SQL instances can introduce vulnerabilities unless SSL or a proxy is configured.

SELECT name
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME`
JOIN UNNEST(org_policy) AS op
WHERE
  op.constraint = "constraints/sql.restrictPublicIp"
  AND (op.boolean_policy IS NULL OR op.boolean_policy.enforced = FALSE);

To find an organization, folder, or project within the same VPC Service Controls service perimeter of a project, run the following query.

SELECT service_perimeter.title, service_perimeter.status.resources
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME`
CROSS JOIN UNNEST(service_perimeter.status.resources) as resource
WHERE resource = "projects/PROJECT_NUMBER";

Query on extracted JSON string

To find open firewall rules, run the following query. Learn more about JSON functions used in BigQuery.

CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(json).map(x=>JSON.stringify(x));
""";

SELECT firewall.name, firewall.resource.parent, JSON_EXTRACT(firewall.resource.data, '$.sourceRanges') AS sourceRanges
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME` AS firewall
JOIN UNNEST(json2array(JSON_EXTRACT(firewall.resource.data, '$.sourceRanges'))) AS source_ranges
WHERE asset_type="compute.googleapis.com/Firewall" AND JSON_EXTRACT(firewall.resource.data, '$.sourceRanges') IS NOT NULL AND JSON_EXTRACT_SCALAR(source_ranges, '$') = "0.0.0.0/0"

Query on joined resource types

To join tables of different resource types, run the following query. The following example shows how to find all subnetworks that have no VM attached. First, the query finds all subnetworks. Then, from that list, it selects those subnetworks whose selfLinks are not present.

SELECT name, JSON_EXTRACT(subnetwork.resource.data, '$.selfLink') AS selflink
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME` AS subnetwork
WHERE asset_type = "compute.googleapis.com/Subnetwork" AND (JSON_EXTRACT(subnetwork.resource.data, '$.selfLink') NOT IN
(SELECT DISTINCT JSON_EXTRACT(network_interfaces, '$.subnetwork')
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME` as instance
JOIN UNNEST(json2array(JSON_EXTRACT(instance.resource.data, '$.networkInterfaces'))) AS network_interfaces
WHERE asset_type ="compute.googleapis.com/Instance"
AND JSON_EXTRACT(instance.resource.data, '$.networkInterfaces') IS NOT NULL
)) IS NULL