当结果转移到 BigQuery 时,数据会写入新表或现有表。如需详细了解操作,请参阅操作概念主题。
查询示例
您可以使用以下示例查询来分析您的结果。您还可以在 Google Data Studio 等可视化工具中使用查询。这些查询可帮助您开始查询您的结果数据。
在以下每个查询中,将 [DATASET]
替换为 BigQuery 数据集名称,将 [TABLE_ID]
替换为表 ID,将 [PROJECT_ID]
替换为项目标识符。
选择每个 infoType 的数量
Google Cloud Console
SELECT info_type.name, COUNT(info_type.name) AS count FROM [PROJECT_ID].[DATASET].[TABLE_ID], GROUP BY info_type.name ORDER BY count DESC;
命令行
bq query --use_legacy_sql=false ' SELECT info_type.name, COUNT(info_type.name) AS count FROM [PROJECT_ID].[DATASET].[TABLE_ID], GROUP BY info_type.name ORDER BY count DESC;'
按天选择每个 infoType 的数量
Google Cloud Console
SELECT info_type.name, cast(TIMESTAMP_SECONDS(create_time.seconds) as date) as day, COUNT(locations.container_name) AS count FROM [PROJECT_ID].[DATASET].[TABLE_ID], UNNEST(location.content_locations) AS locations GROUP BY info_type.name, day ORDER BY count DESC;
命令行
bq query --use_legacy_sql=false ' SELECT info_type.name, cast(TIMESTAMP_SECONDS(create_time.seconds) as date) as day, COUNT(locations.container_name) AS count FROM [PROJECT_ID].[DATASET].[TABLE_ID], UNNEST(location.content_locations) AS locations GROUP BY info_type.name, day ORDER BY count DESC;'
选择每个容器中各个 infoType 的数量
Google Cloud Console
SELECT info_type.name, locations.container_name, COUNT(locations.container_name) AS count FROM [PROJECT_ID].[DATASET].[TABLE_ID], UNNEST(location.content_locations) AS locations GROUP BY locations.container_name, info_type.name ORDER BY count DESC;
命令行
bq query --use_legacy_sql=false ' SELECT info_type.name, locations.container_name, COUNT(locations.container_name) AS count FROM [PROJECT_ID].[DATASET].[TABLE_ID], UNNEST(location.content_locations) AS locations GROUP BY locations.container_name,info_type.name ORDER BY count DESC;'
选择为表的每一列查找的结果类型
此查询将按列名称对所有结果进行分组,旨在处理来自 BigQuery 检查作业的结果,对于识别给定列的可能性类型会很有用。您可以通过修改 WHERE 和 HAVING 子句来调整设置。例如,如果多个表结果包含在结果表中,您可以将其限制为仅一个作业运行或一个表名称。
Google Cloud Console
SELECT table_counts.field_name, STRING_AGG( CONCAT(" ",table_counts.name," [count: ",CAST(table_counts.count_total AS String),"]") ORDER BY table_counts.count_total DESC) AS infoTypes FROM ( SELECT locations.record_location.field_id.name AS field_name, info_type.name, COUNT(*) AS count_total FROM [PROJECT_ID].[DATASET].[TABLE_ID], UNNEST(location.content_locations) AS locations WHERE (likelihood = 'LIKELY' OR likelihood = 'VERY_LIKELY' OR likelihood = 'POSSIBLE') GROUP BY locations.record_location.field_id.name, info_type.name HAVING count_total>200 ) AS table_counts GROUP BY table_counts.field_name ORDER BY table_counts.field_name
上面的查询可能为示例表生成类似这样的结果,其中 infoType 列显示为该指定列找到的每个 infoType 的实例数量。
field_name | infoType |
---|---|
field1 |
CUSTOM_USER_US [count: 7004] , CUSTOM_USER_EU [count: 2996] |
field2 |
US_VEHICLE_IDENTIFICATION_NUMBER [count: 9597] |
field3 |
EMAIL_ADDRESS [count: 10000] |
field4 |
IP_ADDRESS [count: 10000] |
field5 |
PHONE_NUMBER [count: 7815] |
field6 |
US_SOCIAL_SECURITY_NUMBER [count: 10000] |
field7 |
CREDIT_CARD_NUMBER [count: 10000] |