Querying Cloud DLP findings in BigQuery

When your findings are transferred to BigQuery, the data is written to either a new or existing table. For more information about actions, see the Actions conceptual topic.

Sample queries

You can use the following sample queries to analyze your findings. You can also use the queries in a visualization tool such as Google Data Studio. These queries are provided to help you get started querying your findings data.

In each of the following queries, replace [DATASET] with the BigQuery dataset name, [TABLE_ID] with the table ID, and [PROJECT_ID] with the project identifier.

Select the count of each infoType

GCP Console

SELECT info_type.name,
COUNT(info_type.name) AS count
FROM [PROJECT_ID].[DATASET].[TABLE_ID],
GROUP BY info_type.name
ORDER BY count DESC;

Command-line

bq query --use_legacy_sql=false ' SELECT info_type.name,
COUNT(info_type.name) AS count
FROM [PROJECT_ID].[DATASET].[TABLE_ID],
GROUP BY info_type.name ORDER BY count DESC;'

Select the count of each infoType by day

GCP Console

SELECT info_type.name, cast(TIMESTAMP_SECONDS(create_time.seconds) as date) as day,
COUNT(locations.container_name) AS count
FROM [PROJECT_ID].[DATASET].[TABLE_ID],
UNNEST(location.content_locations) AS locations
GROUP BY info_type.name, day
ORDER BY count DESC;

Command-line

bq query --use_legacy_sql=false ' SELECT info_type.name,
cast(TIMESTAMP_SECONDS(create_time.seconds) as date) as day,
COUNT(locations.container_name) AS count FROM [PROJECT_ID].[DATASET].[TABLE_ID],
UNNEST(location.content_locations) AS locations
GROUP BY info_type.name, day ORDER BY count DESC;'

Selects the count of each infoType in each container

GCP Console

SELECT info_type.name, locations.container_name,
COUNT(locations.container_name) AS count
FROM [PROJECT_ID].[DATASET].[TABLE_ID],
UNNEST(location.content_locations) AS locations
GROUP BY locations.container_name, info_type.name
ORDER BY count DESC;

Command-line

bq query --use_legacy_sql=false ' SELECT info_type.name, locations.container_name,
COUNT(locations.container_name) AS count FROM [PROJECT_ID].[DATASET].[TABLE_ID],
UNNEST(location.content_locations) AS locations
GROUP BY locations.container_name,info_type.name ORDER BY count DESC;'

Selects the finding types found for each column of a table

This query will group all the findings by column name and is intended to work on findings from a BigQuery inspection job, useful if you are trying to identify the likely types for a given column. You can adjust settings by modifying the WHERE and HAVING clauses. For example, if multiple table results are included in your findings table, you can limit these to just one job run or one table name.

GCP Console

SELECT
  table_counts.field_name,
  STRING_AGG( CONCAT(" ",table_counts.name," [count: ",CAST(table_counts.count_total AS String),"]")
  ORDER BY
    table_counts.count_total DESC) AS infoTypes
FROM (
  SELECT
    locations.record_location.field_id.name AS field_name,
    info_type.name,
    COUNT(*) AS count_total
  FROM
    [PROJECT_ID].[DATASET].[TABLE_ID],
    UNNEST(location.content_locations) AS locations
  WHERE
    (likelihood = 'LIKELY'
      OR likelihood = 'VERY_LIKELY'
      OR likelihood = 'POSSIBLE')
  GROUP BY
    locations.record_location.field_id.name,
    info_type.name
  HAVING
    count_total>200 ) AS table_counts
GROUP BY
  table_counts.field_name
ORDER BY
  table_counts.field_name

The above query might produce a result like this for a sample table, where the infoTypes column tells us how many instances of each infoType was found for that given column.

field_name infoTypes
field1 CUSTOM_USER_US [count: 7004], CUSTOM_USER_EU [count: 2996]
field2 US_VEHICLE_IDENTIFICATION_NUMBER [count: 9597]
field3 EMAIL_ADDRESS [count: 10000]
field4 IP_ADDRESS [count: 10000]
field5 PHONE_NUMBER [count: 7815]
field6 US_SOCIAL_SECURITY_NUMBER [count: 10000]
field7 CREDIT_CARD_NUMBER [count: 10000]
Hai trovato utile questa pagina? Facci sapere cosa ne pensi:

Invia feedback per...

Cloud Data Loss Prevention