Exporting assets to BigQuery

This page explains how to export all the asset metadata of your project to a BigQuery table and run data analysis against your inventory.

Overview

BigQuery provides a SQL-like experience for users to analyze data and produce meaningful insights without the need of custom scripts. You can export Cloud Asset Inventory asset snapshots to BigQuery, enabling deeper data analysis of you exported metadata.

This allows you to get answers to questions like:

  • What instances have the label "prod"?
  • How many firewalls in my organization apply to a specific network tag?
  • What Cloud Storage buckets are open to the public?

Before you begin

gcloud

  1. Enable the Cloud Asset API before you can use the gcloud tool to access Cloud Asset Inventory. Note that the API only needs to be enabled on the project you'll be running Cloud Asset API commands from.
    Enable the Cloud Asset Inventory API
  2. Install the Cloud SDK on your local client.
  3. Create a BigQuery dataset under your consumer project.
  4. If you're exporting to a BigQuery table in another project, grant the Cloud Asset Inventory agent the following roles.
    • roles/bigquery.dataEditor
    • roles/bigquery.user

api

  1. Enable the Cloud Asset API to access Cloud Asset Inventory. Note that the API only needs to be enabled on the project you'll be running Cloud Asset API commands from.
    Enable the Cloud Asset Inventory API
  2. Install oauth2l on your local machine for interacting with the Google OAuth system.
  3. Confirm that you have access to the Unix curl command.
  4. Ensure that you grant your account one of the following roles on your project, folder, or organization.
    • roles/owner
    • roles/cloudasset.viewer
  5. Create a BigQuery dataset under your consumer project. If you're exporting to a BigQuery in another project, give the Cloud Asset Inventory agent the role of BigQuery Data Editor and BigQuery User role.

Initial setup

To set up your environment to call the Cloud Asset API:

gcloud

Set up the gcloud command-line tool

To get started with the gcloud tool, review the Cloud SDK Documentation. You can get help for the tool, resources, and commands by using the --help flag:

gcloud asset --help

The help displayed with the --help flag is also available in the Cloud SDK reference for gcloud asset.

Configure an account

To call the Cloud Asset API, you need to configure either a user account or a service account.

Configuring a user account

  1. Log in with your user account using the following command.

    gcloud auth login USER_ACCOUNT_EMAIL
    

  2. Optional. If the target project you want to call the Cloud Asset API on isn't the same as your Cloud Asset Inventory enabled project, specify your project with the following command.

    gcloud asset --billing-project PROJECT_ID
    

  3. Grant your user account the cloudasset.viewer Cloud IAM role on the project whose metadata you want to export. This project can be the same as your Cloud Asset API enabled project.

    gcloud projects add-iam-policy-binding EXPORT_TARGET_PROJECT_ID \
           --member user:USER_ACCOUNT_EMAIL \
           --role roles/cloudasset.viewer
    

Configuring a service account

This service account should be created for the project you're running Cloud Asset API commands from.

  1. If you don't already have a service account, in the project that is Cloud Asset API enabled, create a new service account with the following command.

    gcloud iam service-accounts create SERVICE_ACCOUNT_NAME \
           --display-name "SERVICE_ACCOUNT_DISPLAY_NAME"
    

  2. Create a private key for your service account.

    gcloud iam service-accounts keys create YOUR_FILE_PATH/key.json \
           --iam-account SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com
    

  3. Activate your service account for use with the gcloud tool with the following command.

    gcloud auth activate-service-account SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com \
           --key-file=YOUR_FILE_PATH/key.json
    

  4. Grant your new service account the cloudasset.viewer Cloud IAM role on a project whose metadata you want to export. This project can be the same as your Cloud Asset API enabled project.

    gcloud projects add-iam-policy-binding EXPORT_TARGET_PROJECT_ID \
           --member serviceAccount:SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com \
           --role roles/cloudasset.viewer
    

api

Downloading the credentials file

A JSON credentials file is needed to call the Cloud Asset API. Download the file by following the process below.

  1. Go to the Credentials page.

  2. Open the Create Credentials dropdown and select OAuth client ID.

  3. If you are creating a Client ID for a new project, you must set up the OAuth consent screen. The consent screen is displayed any time an application using your Client ID requests access to private data. If prompted:

    1. Click Configure consent screen and enter in the required information for your consent screen.

    2. Save your changes to return to creating your Client ID.

  4. On the Create client ID page under Application type, select Other.

  5. Enter a name for the credential, then click Create. A confirmation dialog appears with a client ID and client secret.

  6. Close the confirmation dialog and click the download icon on the right to save your new Client ID JSON file.

  7. Name and move the downloaded JSON file so that the path is ~/credentials.json.

Pereparing your environment

Prepare you environment for making calls to the Cloud Asset API by following the process below.

  1. Verify your initial setup with the following command.

    oauth2l header --json ~/credentials.json cloud-platform
    

    You should see an output similar to the following:

    Authorization: Bearer y29.xxxxxxx
    
  2. Define a shell alias to call Google REST APIs from with the following command.

    alias gcurl='curl -H "$(oauth2l header --json ~/credentials.json \
    cloud-platform)" -H "Content-Type: application/json" '
    

Setting permissions

If you're running the export command on your Cloud Asset Inventory-enabled project, no addition permissions are needed. By enabling the Cloud Asset API on your project a service account, named service-${CONSUMER_PROJECT_NUMBER}@gcp-sa-cloudasset.iam.gserviceaccount.com is automatically granted the permissions for BigQuery export with the Cloud Asset Service Agent role.

If you're exporting a snapshot to a BigQuery table in another project , grant the Cloud Asset Inventory service account of your Cloud Asset Inventory-enabled project the BigQuery Data Editor and BigQuery User roles.

Exporting an asset snapshot

To export an asset snapshot at a given timestamp:

gcloud

The following example exports assets within a project. To export the assets of an organization or folder, use the --organization=ORGANIZATION_ID or --organization=FOLDER_ID flag in your command.

  1. Export asset metadata within your project with the following command. This stores the exported snapshot in a BigQuery table at BIGQUERY_TABLE.

    gcloud asset export --project='PROJECT_ID' --snapshot-time='SNAPSHOT_TIME'  --bigquery-table='BIGQUERY_TABLE' --output-bigquery-force --content-type='resource'
    

    Where:

    • PROJECT_ID is the ID of the project that is having its metadata exported. This project can be either the Cloud Asset API enabled project you're running the export from, or a different project.
    • SNAPSHOT_TIME is optional. The value must be current time or a time in the past that you want to take a snapshot of your assets at. By default, a snapshot is taken at the current time.
    • BIGQUERY_TABLE is the table you're exporting metadata to , in format of projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_NAME.
  2. Optional. Run the command displayed in the gcloud tool that appears after running the export command to check the status of the export.

    gcloud asset operations describe OPERATION_NUMBER
    

API

To export an asset snapshot for an organization, use the https://cloudasset.googleapis.com/v1/organizations/ORGANIZATION_NUMBER:exportAssets REST method in the following gcurl commands.

To export an asset snapshot for a folder, use the https://cloudasset.googleapis.com/v1/folders/FOLDER_NUMBER:exportAssets REST method.

  1. Export asset metadata within your project with the following command. This stores the exported snapshot in a BigQuery table named TABLE_NAME.

    gcurl -d '{"outputConfig":{"bigqueryDestination": \
          {"dataset":"projects/PROJECT_ID/datasets/DATASET_ID" ,\
            "table": "TABLE_NAME", "force": true}}}' \
           https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER:exportAssets
    
  2. Optional. Use the operation number returned in the response of your export to view the status of your export.

    1. Get the operation number from the name field of the response.
    "name": "projects/PROJECT_NUMBER/operations/ExportAssets/CONTENT_TYPE/OPERATION_NUMBER"
    
    1. Enter the operation number into the following command.
    gcurl https://cloudasset.googleapis.com/v1/projects/PROJECT_NUMBER/operations/ExportAssets/CONTENT_TYPE/OPERATION_NUMBER

BigQuery table schema

  • Resource/Unspecified If you set the content type to RESOURCE or do not set content type, the resulting BigQuery table will have the following schema: Resource_schema Resource.data is the resource metadata as a Json string.

  • IAM Policy If you set content type to IAM_POLICY, the resulting BigQuery table will have the following schema: Iam_schema1 Iam_schema2 Iam_schema3

With the RECORD iam_policy fully expanded.

Viewing an asset snapshot

To view the table your asset snapshot was exported to:

Console

  1. Open the BigQuery web UI in the Cloud Console.
    Go to the Cloud Console

  2. In the navigation panel, in the Resources section, expand your project and select a dataset. This displays the tables and views in the dataset.

  3. Click your table in the list.

  4. Cick Details and note the value in Number of rows. You may need this value to control the starting point for your results using the CLI or API.

  5. Click Preview. A sample set of data is displayed.

API

Browse through your table's data by calling tabledata.list. Specify the name of your table in the tableId parameter.

Configure these optional parameters to control the output:

  • maxResults — Maximum number of results to return
  • selectedFields — Comma-separated list of columns to return; If unspecified, all columns are returned
  • startIndex — Zero-based index of the starting row to read

Values are returned wrapped in a JSON object that you must parse, as described in the tabledata.list reference documentation.

The export lists the assets and their resource names.

Querying an asset snapshot

You can run queries on your asset metadata once you've exported your snapshot to BigQuery.

By default, BigQuery runs interactive (on-demand) query jobs, which means that the query is executed as soon as possible. Interactive queries count towards your concurrent rate limit and your daily limit.

Query results are always saved to either a temporary or permanent table. You can choose whether to append or overwrite data in an existing table or whether to create a new table if none exists with the same name.

To run an interactive query that writes to a temporary table:

Console

  1. Open the BigQuery web UI in the Cloud Console.
    Go to the Cloud Console

  2. Click Compose new query.

    Compose new query

  3. Enter a valid BigQuery SQL query in the Query editor text area.

  4. (Optional) To change the data processing location, click More, then Query settings. Under Processing location, click Auto-select and choose your data's location. Finally, click Save to update the query settings.

  5. Click Run.

This creates a query job that writes the output to a temporary table.

API

To run a query using the API, insert a new job and populate the jobs#configuration.query property. Specify your location in the location property in the jobReference section of the job resource.

Poll for results by calling getQueryResults. Poll until jobComplete equals true. Check for errors and warnings in the errors list.

Example queries

The following are example SQL queries that you can use to provide insight into your asset metadata.

Query directly on available columns

For more information on BigQuery syntax, see Standard SQL Query Syntax.

To find the quantity of each asset type:

SELECT asset_type, COUNT(*) AS asset_count
FROM ’PROJECT_ID.DATASET_ID.TABLE_NAME’
GROUP BY asset_type
ORDER BY asset_count DESC

Query on repeated fields (array)

BigQuery uses UNNEST to flatten repeated fields into a table that you can query directly.

To find Cloud IAM policies that contain contain Gmail accounts as a member:

SELECT name, asset_type, bindings.role
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME`
JOIN UNNEST(iam_policy.bindings) AS bindings
JOIN UNNEST(bindings.members) AS members
WHERE members like "%@gmail.com"

Query on extracted JSON string

You can find JSON functions used in BigQuery from the public link.

To find open firewall rules, use the following query:

CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(json).map(x=>JSON.stringify(x));
""";

SELECT firewall.name, firewall.resource.parent, JSON_EXTRACT(firewall.resource.data, '$.sourceRange') AS sourceRange
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME` AS firewall
JOIN UNNEST(json2array(JSON_EXTRACT(firewall.resource.data, '$.sourceRange'))) AS source_range
WHERE asset_type="compute.googleapis.com/Firewall" AND JSON_EXTRACT(firewall.resource.data, '$.sourceRange') IS NOT NULL AND JSON_EXTRACT_SCALAR(source_range, '$') = "0.0.0.0/0"

Query on joined resource types

Sometimes you need to join tables of different resource types to find insight info. The example below demonstrates how to find all subnetworks that have no VM attached by first trying to find all subnetworks from an instance's metadata as Table A. Then, for each subnetwork, select those whose selfLinks are not in Table A:

SELECT name, JSON_EXTRACT(subnetwork.resource.data, '$.selfLink') AS selflink
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME` AS subnetwork
WHERE asset_type = "compute.googleapis.com/Subnetwork" AND (JSON_EXTRACT(subnetwork.resource.data, '$.selfLink') NOT IN
(SELECT DISTINCT JSON_EXTRACT(network_interface, '$.subnetwork')
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME` as instance
JOIN UNNEST(json2array(JSON_EXTRACT(instance.resource.data, '$.networkInterface'))) AS network_interface
WHERE asset_type ="compute.googleapis.com/Instance"
AND JSON_EXTRACT(instance.resource.data, '$.networkInterface') IS NOT NULL
)) IS NULL
Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
Cloud Asset Inventory Documentation