Jump to Content
Data Analytics

Data governance in Google Cloud–new ways to securely access and discover data

November 17, 2020
Uri Gilad

Product Manager, Google Cloud Data Governance

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

BigQuery column-level security now GA 

As organizations bring more and more-sensitive data analytics workloads to the cloud, BigQuery is entrusted to provide fine-grained security and governance controls that help them satisfy principles of least-privileged access. Today, we are pleased to announce the general availability (GA) of BigQuery column-level security. Introduced earlier this year, column-level security enables you to mark data as sensitive and set specific access controls, (policy tags), on who can query columns marked as sensitive. This allows for the protection of those data classes, such as personally identifiable information (PII), which may be spread across multiple tables. With column-level security you can effectively define a data dictionary of all your organization’s data classes, of all sensitivities, and later categorize individual columns according to who should be able to access them.

Beyond being available in all regions in Google Cloud Platform where you can access BigQuery, column-level policy tags are now indexed and searchable within Data Catalog. Plus, Data Catalog’s integration into the main BigQuery UI now allows for much faster, and governed discoverability of BigQuery resources. Lets review these new capabilities in more detail:

Searching for your policy tags 

Your data dictionary consists of a hierarchy of policy tags, where the policy tags can be used to tag, or describe, data in your data warehouse. Tagging a column not only attaches meaning to that column, but also allows you to effect an access policy on that column. Now, those tags are indexed, and you can search for all columns tagged with a certain policy tag. Just run a faceted search with policytag:[search expression] as a predicate. This will match the search expression as a substring of any policy tag display name. The search will return a list of results including the descendants of the policy tag in the hierarchy.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Search_results_data_catalog_with_policy_tags.max-900x900.jpg

Drilling into the search results, you can see where columns and policy tags are detailed.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Data_catalog__columns_policy_tags_detail.max-1300x1300.jpg

Efficient, ACL'd search in BigQuery

As mentioned, Data Catalog is now integrated into the main BigQuery UI. This means you can now access the powerful search capabilities of Data Catalog in context, while working within the BigQuery UI. Enable the search/autocomplete preview to allow Data Catalog search into the resources panel. When a partial table name is specified, Data Catalog will helpfully show a list of matches, without the need to explicitly run a search and wait for results.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Data_catalog_list_of_matches.max-300x300.jpg

Data Catalog search works with the resource permissions the user is assigned, and can surface tables shared explicitly with the user even if the containing dataset is not shared with that user. This new method of surfacing search results interactively should streamline work for data analysts, and in addition requires less compute compared to listing out (and selecting) all the tables in a project or dataset. 

Logging access to classified data

All BigQuery requests involving data governance policy tags are also captured in Cloud Logging, our scalable logging solution, so you can easily review all access attempts spanning sensitive or classified data. Simply enter policytags in the Logging search bar, and narrow down your search by selecting a specific policy tag identifier (which you can copy from Data Catalog) as the search term. 

https://storage.googleapis.com/gweb-cloudblog-publish/images/Policy_tags_query_preview_cloud_logging.max-1000x1000.jpg

Next steps

Data analytics workloads commonly span multiple data designations and some of the data may require additional access controls. BigQuery can now, in all supported regions, apply additional protections to your data, and provides an end-to-end solution including; access-based discovery of the data, reducing unnecessary query execution time, and logging all user access to classified data. If you’d like to learn more, check out the following articles; Introduction to BigQuery column-level security, Best practices for BigQuery column-level security, Introduction to BigQuery table-level access controls, or our approach to data governance at Google Cloud.

Posted in