Implementing HKMA’s Secure Tertiary Data Backup (STDB) on Google Cloud
Henry Cheng
Principal Architect
Benny Chan
Customer Engineer
In this post, we will discuss how Google Cloud can be used as a backup storage solution to support the objectives of the Secure Tertiary Data Backup (STDB) guideline developed by the Hong Kong Monetary Authority (HKMA) and Hong Kong Association of Banks. We will focus on the storage location and its surrounding security controls. The solution should work with any backup tool that can configure Google Cloud Storage as the data storage destination.
To help satisfy STDB requirements, the data backup destination or storage location should be able to fulfill the following requirements:
Immutable and controlled
1. Data should be immutable by the data writer. For instance, the backup software should not have the ability to update or delete the data once it is written to the storage
Secure and verifiable
2. All data transfer should go through a private and encrypted connection
3. All data at rest must be encrypted and by keys that are managed by the data owner
4. All operations on the backup location, both by the owner and solution provider must be logged and auditable
Air-gapped
5. The backup target is “air-gapped” or disconnected from the data source / on-prem network when backup is not taking place
- Limit the network connection by disconnecting the data source and backup storage as much as possible.
- Grant read and write-only access rights based on the principle of least privilege and through role AND time-based access control.
6. To minimize the surface of attack, the number of components in the solution should be minimized while multiple layers of access control should be enforced to restrict network connection and resource access control.
High performance
7. Storage location should provide instantaneous (milliseconds) access to the backed up data when needed, enabling rapid recovery of data.
Survivable
8. The backup destination should be highly durable and available.
9. The storage location should have capacity to support a long history of backups to ensure a clean copy is in place. This is especially critical for recovery from ransomware attacks.
10. The solution must be designed for resiliency against attacks.Optionally, the following requirements can also be considered:
11. All interactions against the backup target should be accessible via API only. In other words, the inability to access the user interface should not interfere with the backup operation and the management of the solution.
Design Details
1. To help ensure all data transfer to and from Google Cloud Storage is private and secured, an Interconnect (recommended for production) or VPN (sufficient for dev or small bandwidth requirements) should be established between on-premise and Google Cloud. IPSec tunnel can optionally be used to encrypt the connectivity.
- To support the air-gapped requirement, either disconnect the Interconnect/VPN from the on-premise router or at the Cloud Router, or both. This disconnection can be done programmatically through gcloud CLI or Terraform. Combining it with Cloud Schedule, on-premise cron job, or any scheduler to disconnect the Interconnect at a specified time or on a scheduled and recurring basis.
2. To protect against data exfiltration risks, Google Cloud supports the concepts of VPC Service Control, which allows a security perimeter to be established around a project, network, and/or services so that network communication is constrained within such perimeter. This perimeter provides an additional layer of security protection that complement role based access control provided by Identity and Access Management (IAM), and more importantly, this protection can work with managed services that do not have a fixed IP/port range, which is something that a network firewall can protect with ease.
- Using an example in the diagram above, a client or VM may carry sufficient IAM role to access the Cloud Storage, but because the Cloud Storage is protected by VPC Service Control, such access will still be blocked as the is not configured within the security perimeter.
- In this case a “Writer” IAM role having storage.objects.create permission would be granted to the backup software, so that it does not have permission to manipulate the bucket and objects already written to Cloud Storage.
- To further support the air-gapped requirement, implement time-based controls on the “Writer” IAM role. For example, defining a conditional role binding using date/time attribute results in the write permission being only enabled at a specified time or on a scheduled and recurring basis.
- For the break glass scenario, create an “Admin” IAM, say containing storage.objects.delete permission, and manage this IAM role in a PAM system (e.g. CyberArk).
- IAM Recommendation - Google Cloud has a built-in IAM policy intelligence that uses machine learning to predict access needs, so administrators may grant the right levels of access to avoid over-granting. The tool shows a list of recommendations, which includes recommending users to revoke unused roles (e.g. revoke a role if it hasn’t been accessed for over N days), or provision restricted roles to replace the overly broad ones.
4. In order for the backup software to copy data into Cloud Storage, it must first have the “Writer” IAM role, and be added into the perimeter by whitelisting its private IP. The data transfer would stay private between on-premise data centers and Google Cloud.
5. Data written to the Cloud Storage is encrypted by default. Also customers can enable Customer Managed Encryption Keys to have more control over the keys used to encrypt data, e.g. Rotating the key, or backing the key with Cloud HSM key. It is actually possible to disable or destroy a key such that Cloud Storage can no longer decrypt the objects, rendering the object unretrievable.
- Archive storage class is used to achieve ultra low cost and highly durable storage for the backup objects. Despite being an Archive class storage, it can still support instantaneous retrieval and requires no thawing, allowing instant recovery of data when it is needed.
6. Access and maintenance of the storage is logged and auditable, including operations done by Google. If User Interface access (i.e. Google Cloud Console) is not desired, customers can block access at the corporate proxy but still allow backup process and management of the service to take place programmatically using gcloud CLI or client libraries.
Security Controls
Fundamental security controls:
Data Encryption by default for at rest using customer managed encryption key.
Fine grained access control using IAM supporting time-based control of permissions.
Define Security Perimeter to protect against data exfiltration risks (in addition to IAM).
Private connectivity through a secure/dedicated communication channel with end-to-end network encryption for data transport.
Centralized logging for administrator/user activity monitoring and auditing. Access transparency ensures all operations done by Google are also logged and auditable.
Security Command Centre to help detect misconfigurations, abnormal activity, such as service account abuse and data exfiltration.
Advanced security features:
Context Aware Access for contextual access control (e.g. who, where, device, etc.) and automatically prompt for re-authentication if required.
DLP protection integrated with Cloud Storage, which can automatically detect and flag the existence of PII using machine learning.
Mapping features into STDB goals
Immutable and controlled
Security Perimeter (VPC Service Controls)
VPC Service Controls delivers an extra layer of control with a defense-in-depth approach for multi-tenant services that helps protect service access from both insider and outsider threats. It enforces a security perimeter with VPC Service Controls to isolate resources of multi-tenant Google Cloud services by reducing the risk of data exfiltration or data breach.Enabling the bucket lock feature can provide immutable storage on Cloud Storage. Once enabled, all objects in the bucket can only be deleted or replaced once their age is greater than the retention period. This feature is recommended to be used in conjunction with Detailed Audit Logging mode when seeking various compliance requirements.
Secure and verifiable
IPsec over Interconnect
It provides customers a managed solution to encrypt their traffic over Interconnect, so that all data transfer go through a private and encrypted connection.Default Encryption at rest
Cloud Storage encrypts your data on the server side, before it is written to disk, at no additional charge. Customer managed encryption keys (CMEK) and Customer supplied encryption keys (CSEK) are also supported as a Server-side encryption which acts as an additional encryption layer on top of the standard Cloud Storage encryption.Cloud IAM manages resource permissions by creating granular access control policies to resources based on attributes like device security status, IP address, resource type, and date/time.
Cloud Audit Logs with Cloud Storage help you answer the questions, "Who did what, where, and when?" by generating Admin Activity logs and Data Access logs. With Detailed Audit Logging mode enabled, data access logs will contain detailed request and response information including query parameters, path parameters, and request body parameters.
Access Transparency
Provides near real-time logs when Google administrators access your content based on support tickets. Access Approval allows users to approve/dismiss requests for access by GCP administrators working to support your service.
Enable corporate proxy to block UI/console access to Cloud Storage.
IAM policy intelligence that uses machine learning to predict access needs, so administrators may grant the right levels of access to avoid over-granting.
Data IntegrityGoogle Cloud Storage uses MD5 hash (or ETag) or CRC32c to validate integrity of the data. In the case of composite objects, CRC32c is used. When supplying an object's expected MD5 or CRC32C hash in an upload request, Cloud Storage will only create the object if the provided hash matches the value Cloud Storage calculates. Likewise, a download integrity check can be performed by hashing downloaded data on the fly and comparing the results to the server-supplied hashes. If gsutil is used for data transfer, the above data integrity checking is handled automatically.
Restoration Validation
A regular data restoration drill is recommended to ensure a successful recovery and a clean backup image is in place. Both restoration drill and actual restore can be performed by running the gsutil command to get the data/files back. Or if any backup software is used, performing a data restore operation using the backup software can also serve the same purpose.
Air-gapped
Use of date/time attributes in Cloud IAM to enforce time-based controls when accessing a given resource. You can grant temporary access to a project that starts and stops at a specified time.
Simple architecture that sits in a Google-privately owned and managed network (without any traffic in the public network) with a full stack of unique security controls including VPC Service Controls, default encryption, Cloud IAM with time-based control, access transparency, etc.
High performance
For the 4 Storage Classes of Google Cloud Storage, all 4 of them (including the lowest cost tier of Archive Storage) have the same SLA with extremely low retrieval latency - time to first byte is typically in tens of milliseconds.
Object Lifecycle Management
To optimize storage costs, it is possible to transition to a lower cost storage class when the backup artifact is no longer current. For instance, using Object Lifecycle Management, users can configure a rule to downgrade the storage class of objects older than 365 days to Coldline Storage.
Survivable
Archive Storage offers ultra low-cost, highly-durable, highly available archival storage. For data accessed less than once a year, Archive is a cost-effective storage option for long-term preservation of data.
Archive Storage Class description
Unlimited storage with no minimum object size.
Worldwide accessibility and worldwide storage locations.
Low latency (time to first byte typically tens of milliseconds).
High durability (99.999999999% annual durability).
Geo-redundancy if the data is stored in a multi-region or dual-region.
Typical monthly availability.
99.95% in multi-regions and dual-regions and 99.9% in regions.A uniform experience with Cloud Storage features, security, tools, and APIs.