If you configured the sensitive data discovery service to send all successfully generated data profiles to BigQuery, you can query those data profiles to gain insights about your data. You can also use visualization tools like Looker Studio to build custom reports tailored to your business needs. Alternatively, you can use a premade report that Sensitive Data Protection provides, tune it, and share it as needed.
This page provides example SQL queries that you can use to learn more about your data profiles. It also shows you how you can visualize data profiles in in Looker Studio.
For more information on data profiles, see Data profiles.
Before you begin
This page assumes that you have configured profiling at the organization, folder, or project level. In your discovery scan configuration, make sure to turn on data profile exporting by clicking the Save data profile copies to BigQuery toggle. For more information about how to create a discovery scan configuration, see Create a scan configuration.
In this document, the table that contains the exported data profiles is called the output table.
Make sure you have the project ID, dataset ID, and table ID of the output table readily available. You need them to perform the procedures on this page.
The latest
view
When Sensitive Data Protection exports data profiles to your output table, it
also creates the latest
view. This view is a
prefiltered virtual table that includes only the latest snapshots of your data
profiles. The latest
view has the same schema as the output table, so you can
use the two interchangeably in your SQL queries and Looker Studio
reports. The results can differ because the output table contains older
snapshots of the data profiles.
The latest
view is stored in the same location as the output table. Its name
has the following format:
OUTPUT_TABLE_latest_VERSION
Replace the following:
- OUTPUT_TABLE: the ID of the table that contains the exported data profiles.
- VERSION: the version number of the view.
For example, if your output table's name is table-profile
, then the latest
view has a name like table-profile_latest_v1
.
When using the latest
view in SQL queries, use the view's full name, which
includes the project ID, dataset ID, table ID, and suffix—for example,
myproject.mydataset.table-profile_latest_v1
.
PROJECT_ID.DATASET_ID.OUTPUT_TABLE_latest_VERSION
Choose between the output table and the latest
view
The latest
view includes only the latest data profile snapshots, while the
output table has all data profile snapshots, including snapshots that are
outdated. For example, a query on the output table can return multiple
column data profiles
for the same column—one for each time that column was profiled.
When choosing between using the output table and the latest
view in your SQL queries or Looker Studio reports, consider the
following:
The
latest
view is useful if you have data assets that have been reprofiled and you only want to see the latest profiles—not their earlier versions. That is, you want to see the current state of your profiled data.The output table is useful if you want to get a historical view of your profiled data. For example, you're trying to determine whether your organization has ever stored a particular infoType, or you want to see the changes a particular data profile has undergone.
Sample SQL queries
This section provides example queries that you can use when analyzing data profiles. To run these queries, see Running interactive queries.
In the following examples, replace TABLE_OR_VIEW with either of the following:
- The name of the output table, which is the table that contains the exported
data profiles—for example,
myproject.mydataset.table-profile
. - The name of the
latest
view of the output table—for example,myproject.mydataset.table-profile_latest_v1
.
In either case, you must include the project ID and the dataset ID.
For more information, see
Choose between the output table and the latest
view on this
page.
To troubleshoot any errors you encounter, see Error messages.
List all columns that have a high free-text score and evidence of other infoType matches
SELECT
column_profile.table_full_resource,
column_profile.COLUMN,
other_matches.info_type.name,
column_profile.profile_last_generated
FROM
`TABLE_OR_VIEW`
LEFT JOIN UNNEST(column_profile.other_matches) AS other_matches
WHERE
column_profile.free_text_score = 1
AND ( column_profile.column_info_type.info_type.name>""
OR ARRAY_LENGTH(column_profile.other_matches)>0 )
For information about how to remediate these findings, see Recommended strategies for mitigating data risk.
For more information about the Free text score and Other infoTypes metrics, see Column data profiles.
List all tables that contain a column of credit card numbers
SELECT
column_profile.table_full_resource,
column_profile.profile_last_generated
FROM
`TABLE_OR_VIEW`
WHERE
column_profile.column_info_type.info_type.name="CREDIT_CARD_NUMBER"
CREDIT_CARD_NUMBER
is a built-in infoType
that represents a credit card number.
For information about how to remediate these findings, see Recommended strategies for mitigating data risk.
List table profiles that contain columns of credit card numbers, US Social Security numbers, and person names
SELECT
table_full_resource,
COUNT(*) AS count_findings
FROM (
SELECT
DISTINCT column_profile.table_full_resource,
column_profile.column_info_type.info_type.name
FROM
`TABLE_OR_VIEW`
WHERE
column_profile.column_info_type.info_type.name IN ('PERSON_NAME',
'CREDIT_CARD_NUMBER',
'US_SOCIAL_SECURITY_NUMBER')
ORDER BY
column_profile.table_full_resource ) ot1
GROUP BY
table_full_resource
#increase this number to match the total distinct infoTypes that must be present
HAVING
count_findings>=3
This query uses the following built-in infoTypes:
CREDIT_CARD_NUMBER
: represents a credit card numberPERSON_NAME
: represents the full name of a personUS_SOCIAL_SECURITY_NUMBER
represents a US Social Security number
For information about how to remediate these findings, see Recommended strategies for mitigating data risk.
List buckets where the sensitivity score is SENSITIVITY_HIGH
SELECT file_store_profile.file_store_path, file_store_profile.resource_visibility, file_store_profile.sensitivity_score
FROM `TABLE_OR_VIEW`
WHERE file_store_profile.sensitivity_score.score ='SENSITIVITY_HIGH'
;
For more information, see File store data profiles.
List all bucket paths, clusters, and file extensions scanned where the sensitivity score is SENSITIVITY_HIGH
SELECT file_store_profile.file_store_path, summaries.file_cluster_type.cluster, STRING_AGG(scanned_file_extensions.file_extension) AS scanned_extensions, file_store_profile.profile_last_generated.timestamp
FROM `TABLE_OR_VIEW`
LEFT JOIN UNNEST(file_store_profile.file_cluster_summaries) as summaries
LEFT JOIN UNNEST(summaries.file_store_info_type_summaries) as info_types
LEFT JOIN UNNEST(summaries.file_extensions_scanned) as scanned_file_extensions
WHERE file_store_profile.data_source_type.data_source = 'google/storage/bucket'
AND summaries.sensitivity_score.score ='SENSITIVITY_HIGH'
GROUP BY 1, 2, 4
;
For more information, see File store data profiles.
List all bucket paths, clusters, and file extensions scanned where credit card numbers were detected
SELECT file_store_profile.file_store_path, summaries.file_cluster_type.cluster, STRING_AGG(scanned_file_extensions.file_extension) AS scanned_extensions
FROM `TABLE_OR_VIEW`
LEFT JOIN UNNEST(file_store_profile.file_cluster_summaries) as summaries
LEFT JOIN UNNEST(summaries.file_store_info_type_summaries) as info_types
LEFT JOIN UNNEST(summaries.file_extensions_scanned) as scanned_file_extensions
WHERE file_store_profile.data_source_type.data_source = 'google/storage/bucket'
AND info_types.info_type.name='CREDIT_CARD_NUMBER'
GROUP BY 1, 2
;
CREDIT_CARD_NUMBER
is a built-in infoType
that represents a credit card number.
For more information, see File store data profiles.
List all bucket paths, clusters, and file extensions scanned where a credit card number, person name, or US Social Security number was detected
SELECT file_store_profile.file_store_path, summaries.file_cluster_type.cluster, STRING_AGG(scanned_file_extensions.file_extension) AS scanned_extensions
FROM `TABLE_OR_VIEW`
LEFT JOIN UNNEST(file_store_profile.file_cluster_summaries) as summaries
LEFT JOIN UNNEST(summaries.file_store_info_type_summaries) as info_types
LEFT JOIN UNNEST(summaries.file_extensions_scanned) as scanned_file_extensions
WHERE file_store_profile.data_source_type.data_source = 'google/storage/bucket'
AND info_types.info_type.name IN ('CREDIT_CARD_NUMBER', 'PERSON_NAME', 'US_SOCIAL_SECURITY_NUMBER')
GROUP BY 1, 2
;
This query uses the following built-in infoTypes:
CREDIT_CARD_NUMBER
: represents a credit card numberPERSON_NAME
: represents the full name of a personUS_SOCIAL_SECURITY_NUMBER
represents a US Social Security number
For more information, see File store data profiles.
Work with data profiles in Looker Studio
To visualize your data profiles in Looker Studio, you can use a premade report, or you can create your own.
Use a premade report
Sensitive Data Protection provides a premade Looker Studio report that highlights the rich insights of data profiles. The Sensitive Data Protection Dashboard is a multi-page report that gives you a quick high-level view of your data profiles, including breakdowns by risk, by infoType, and by location. Explore the other tabs to see views by geographic region and posture risk, or drill down to specific metrics. You can use this premade report as is, or you can customize it as needed. This is the recommended version of the premade report.
To view the premade report with your data, enter the required values in the following URL. Then, copy the resulting URL to your browser.
https://lookerstudio.google.com/c/u/0/reporting/create?c.reportId=c9826374-e016-4c96-a495-7281328375c6&ds.connector=BIG_QUERY&ds.projectId=PROJECT_ID&ds.datasetId=DATASET_ID&ds.tableId=TABLE_OR_VIEW&ds.type=TABLE&ds.useFreshSchema=false
Replace the following:
- PROJECT_ID: the project that contains the output table.
- DATASET_ID: the dataset that contains the output table.
TABLE_OR_VIEW: either of the following:
- The name of the output table, which is the table that contains the exported
data profiles—for example,
myproject.mydataset.table-profile
. - The name of the
latest
view of the output table—for example,myproject.mydataset.table-profile_latest_v1
.
For more information, see Choose between the output table and the
latest
view on this page.- The name of the output table, which is the table that contains the exported
data profiles—for example,
It can take a few minutes for Looker Studio to load the report with your data. If you encounter errors or if the report doesn't load, see Troubleshoot errors with the premade report on this page.
In the following example, the dashboard shows that low-sensitivity and high-sensitivity data are present in multiple countries around the world.
Earlier version of the premade report
The first version of the premade report is still available at the following address:
https://lookerstudio.google.com/c/u/0/reporting/create?c.reportId=907a2b73-ffe4-40b2-b9a1-c2aa0bbd69fd&ds.connector=BIG_QUERY&ds.projectId=PROJECT_ID&ds.datasetId=DATASET_ID&ds.tableId=TABLE_OR_VIEW&ds.type=TABLE&ds.useFreshSchema=false
Create a report
Looker Studio lets you create interactive reports. In this section, you create a simple table report in Looker Studio that is based on the data profiles exported to your output table in BigQuery.
Make sure you have the
project ID, dataset ID, and table ID of the output table or the
latest
view readily available. You need them to perform this
procedure.
This example shows you how to create a report containing a table that shows each infoType reported in your data profiles and its corresponding frequency.
In general, you incur BigQuery usage costs when accessing BigQuery through Looker Studio. For more information, see Visualizing BigQuery data using Looker Studio.
To create a report, do the following:
- Open Looker Studio and log in.
- Click Blank Report.
- On the Connect to data tab, click the BigQuery card.
- If prompted, authorize Looker Studio to access your BigQuery projects.
Connect to your BigQuery data:
- For Project, select the project that contains the output table. You can search for the project in the Recent projects, My projects, and Shared projects tabs.
- For Dataset, select the dataset that contains the output table.
For Table, select either the output table or the
latest
view of the output table.For more information, see Choose between the output table and the
latest
view on this page.Click Add.
In the dialog that appears, click Add to report.
To add a table that shows each infoType reported and its corresponding frequency (record count), follow these steps:
- Click Add a chart.
- Select a table style.
Click the area where you want to position the chart.
The chart appears in table format.
Resize the table as needed.
As long as the table is selected, its properties appear in the Chart pane.
In the Chart pane, on the Setup tab, remove any preselected dimensions and metrics.
For Dimension, add
column_profile.column_info_type.info_type.name
orfile_store_profile.file_cluster_summaries.file_store_info_type_summaries.info_type.name
.These examples provide data at the column and file cluster levels. You can try other dimensions as well. For example, you can use table-level and bucket-level dimensions.
For Metric, add Record Count.
The resulting table looks similar to the following:
Learn more about tables in Looker Studio.
Troubleshoot errors with the premade report
If you see any errors, missing controls, or missing charts when loading the premade report, make sure that the premade report is using the latest fields:
If your premade report is connected to the output table, confirm that this table is attached to an active discovery scan configuration. To view the settings of your scan configurations, see View a scan configuration.
If your premade report is connected to the
latest
view, confirm that this view is still present in BigQuery. If it is present, try making a change to the view. Alternatively, make a copy of the view and connect the premade report to that copy. For more information about thelatest
view, see Thelatest
view on this page.
If you continue to see errors after trying these steps, contact Cloud Customer Care.
What's next
Learn about actions you can take to remediate data profile findings.