This document in the Google Cloud Architecture Framework provides best practices for implementing data security.
As part of your deployment architecture, you must consider what data you plan to process and store in Google Cloud, and the sensitivity of the data. Design your controls to help secure the data during its lifecycle, to identify data ownership and classification, and to help protect data from unauthorized use.
For a security blueprint that deploys a BigQuery data warehouse with the security best practices described in this document, see Secure a BigQuery data warehouse that stores confidential data.
Automatically classify your data
Perform data classification as early in the data management lifecycle as possible, ideally when the data is created. Usually, data classification efforts require only a few categories, such as the following:
- Public: Data that has been approved for public access.
- Internal: Non-sensitive data that isn't released to the public.
- Confidential: Sensitive data that's available for general internal distribution.
- Restricted: Highly sensitive or regulated data that requires restricted distribution.
Use Cloud DLP to discover and classify data across your Google Cloud environment. Cloud DLP has built-in support for scanning and classifying sensitive data in Cloud Storage, BigQuery, and Datastore. It also has a streaming API to support additional data sources and custom workloads.
Cloud DLP can identify sensitive data using built-in infotypes. It can automatically classify, mask, tokenize, and transform sensitive elements (such as PII data) to let you manage the risk of collecting, storing, and using data. In other words, it can integrate with your data lifecycle processes to ensure that data in every stage is protected.
For more information, see the following:
- Automating the classification of data uploaded to Cloud Storage
- Help secure the pipeline from your data lake to your data warehouse
- De-identification and re-identification of PII in large-scale datasets using Cloud DLP
Manage data governance using metadata
Data governance is a combination of processes that ensure that data is secure, private, accurate, available, and usable. Although you are responsible for defining a data governance strategy for your organization, Google Cloud provides tools and technologies to help you put your strategy into practice. Google Cloud also provides a framework for data governance (PDF) in the cloud.
Use Data Catalog to find, curate, and use metadata to describe your data assets in the cloud. You can use Data Catalog to search for data assets, then tag the assets with metadata. To help accelerate your data classification efforts, integrate Data Catalog with Cloud DLP to automatically identify confidential data. After data is tagged, you can use Google Identity and Access Management (IAM) to restrict which data users can query or use through Data Catalog views.
Use Dataproc Metastore or Hive metastore to manage metadata for workloads. Data Catalog has a hive connector that allows the service to discover metadata that's inside a hive metastore.
Use Dataprep by Trifacta to define and enforce data quality rules through a console. You can use Dataprep from within Cloud Data Fusion or use Dataprep as a standalone service.
Protect data according to its lifecycle phase and classification
After you define data within the context of its lifecycle and classify it based on its sensitivity and risk, you can assign the right security controls to protect it. You must ensure that your controls deliver adequate protections, meet compliance requirements, and reduce risk. As you move to the cloud, review your current strategy and where you might need to change your current processes.
The following table describes three characteristics of a data security strategy in the cloud.
|Identification||Understand the identity of users, resources, and applications as they
create, modify, store, use, share, and delete data.
Use Cloud Identity and IAM to control access to data. If your identities require certificates, consider Certificate Authority Service.
For more information, see Manage identity and access.
|Boundary and access||Set up controls for how data is accessed, by whom, and under what
circumstances. Access boundaries to data can be managed at these
|Visibility||You can audit usage and create reports that demonstrate how data is controlled and accessed. Google Cloud Logging and Access Transparency provide insights into the activities of your own cloud administrators and Google personnel. For more information, see Monitor your data.|
Encrypt your data
By default, Google Cloud encrypts customer data stored at rest, with no action required from you. In addition to default encryption, Google Cloud provides options for envelope encryption and encryption key management. For example, Compute Engine persistent disks are automatically encrypted, but you can supply or manage your own keys.
You must identify the solutions that best fit your requirements for key generation, storage, and rotation, whether you're choosing the keys for your storage, for compute, or for big data workloads.
Google Cloud includes the following options for encryption and key management:
- Customer-managed encryption keys (CMEK). You can generate and manage your encryption keys using Cloud Key Management Service (Cloud KMS). Use this option if you have certain key management requirements, such as the need to rotate encryption keys regularly.
- Customer-supplied encryption keys (CSEK). You can create and manage your own encryption keys, and then provide them to Google Cloud when necessary. Use this option if you generate your own keys using your on-premises key management system to bring your own key (BYOK). If you provide your own keys using CSEK, Google replicates them and makes them available to your workloads. However, the security and availability of CSEK is your responsibility because customer-supplied keys aren't stored in instance templates or in Google infrastructure. If you lose access to the keys, Google can't help you recover the encrypted data. Think carefully about which keys you want to create and manage yourself. You might use CSEK for only the most sensitive information. Another option is to perform client-side encryption on your data and then store the encrypted data in Google Cloud, where the data is encrypted again by Google.
- Third-party key management system with Cloud External Key Manager (Cloud EKM). Cloud EKM protects your data at rest by using encryption keys that are stored and managed in a third-party key management system that you control outside of the Google infrastructure. When you use this method, you have high assurance that your data can't be accessed by anyone outside of your organization. Cloud EKM lets you achieve a secure hold-your-own-key (HYOK) model for key management. For compatibility information, see the Cloud EKM enabled services list.
Cloud KMS also lets you encrypt your data with either software-backed encryption keys or FIPS 140-2 Level 3 validated hardware security modules (HSMs). If you're using Cloud KMS, your cryptographic keys are stored in the region where you deploy the resource. Cloud HSM distributes your key management needs across regions, providing redundancy and global availability of keys.
For information on how envelope encryption works, see Encryption at rest in Google Cloud.
Control cloud administrators' access to your data
You can control access by Google support and engineering personnel to your environment on Google Cloud. Access Approval lets you explicitly approve before Google employees access your data or resources on Google Cloud. This product complements the visibility provided by Access Transparency, which generates logs when Google personnel interact with your data. These logs include the office location and the reason for the access.
Using these products together, you can deny Google the ability to decrypt your data for any reason.
Configure where your data is stored and where users can access it from
You can control the network locations from which users can access data by using VPC Service Controls. This product lets you limit access to users in a specific region. You can enforce this constraint even if the user is authorized according to your Google IAM policy. Using VPC Service Controls, you create a service perimeter which defines the virtual boundaries from which a service can be accessed, which prevents data from being moved outside those boundaries.
For more information, see the following:
- Data governance in the cloud
- Data warehouse to BigQuery data governance
- Data Lineage in a data warehouse
- Cloud Hives metastore now available
Manage secrets using Secret Manager
Secret Manager lets you store all of your secrets in a centralized place. Secrets are configuration information such as database passwords, API keys, or TLS certificates. You can automatically rotate secrets, and you can configure applications to automatically use the latest version of a secret. Every interaction with Secret Manager generates an audit log, so you view every access to every secret.
Cloud Data Loss Prevention also has a category of detectors to help you identify credentials and secrets in data that could be protected with Secret Manager.
Monitor your data
To view administrator activity and key use logs, use Cloud Audit Logs. To help secure your data, monitor logs using Cloud Monitoring to ensure proper use of your keys.
Cloud Logging captures Google Cloud events and lets you add additional sources if necessary. You can segment your logs by region, store them in buckets, and integrate custom code for processing logs. For an example, see Building your own analysis solution.
You can also export logs to BigQuery to perform security and access analytics to help identify unauthorized changes and inappropriate access to your organization's data.
Security Command Center can help you identify and resolve insecure-access problems to sensitive organizational data that's stored in the cloud. Through a single management interface, you can scan for a wide variety of security vulnerabilities and risks to your cloud infrastructure. For example, you can monitor for data exfiltration, scan storage systems for confidential data, and detect which Cloud Storage buckets are open to the internet.
Learn more about data security with the following resources:
Deploy applications securely (next document in this series)
Secure a BigQuery data warehouse that stores confidential data