This document helps you assess the storage requirements of your cloud workload, understand the available storage options in Google Cloud, and design a storage strategy that provides optimal business value.
This document is for cloud architects responsible for designing and implementing storage for workloads in Google Cloud.
Process overview
As a storage architect, when you plan storage for a cloud workload, you need to first consider the functional characteristics of the workload, security constraints, resilience requirements, performance expectations, and cost goals. Next, you need to review the available storage services and features in Google Cloud. Then, based on your requirements and the available options, you select the storage services and features that you need.
The following diagram shows this 3-phase design process:
Define your requirements
Use the questionnaires in this section to define the key storage requirements of the workload that you want to deploy in Google Cloud.
Guidelines for defining storage requirements
When answering the questionnaires, consider the following guidelines:
Define requirements granularly
For example, if your application needs Network File System (NFS)-based file storage, identify the required NFS version.
Consider future requirements
For example, your current deployment might serve users in countries within Asia, but you might plan to expand the business to other continents. In this case, consider any storage-related regulatory requirements of the new business territories.
Consider cloud-specific opportunities and requirements
Take advantage of cloud-specific opportunities.
For example, to optimize the storage cost for data stored in Cloud Storage, you can control the storage duration by using data retention policies and lifecycle configurations.
Consider cloud-specific requirements.
For example, the on-premises data might exist in a single data center, and you might need to replicate the migrated data across two Google Cloud locations for redundancy.
Questionnaires
The questionnaires that follow are not exhaustive checklists for planning. Use them as a starting point to systematically analyze all the storage requirements of the workload that you want to deploy to Google Cloud.
Assess your workload's characteristics
What kind of data do you need to store?
Examples
- Static website content
- Backups and archives for disaster recovery
- Audit logs for compliance
- Large data objects that users download directly
- Transactional data
- Unstructured, and heterogeneous data
How much capacity do you need? Consider your current and future requirements.
Should capacity scale automatically with usage?
What are the access requirements? For example, should the data be accessible from outside Google Cloud?
What are the expected read-write patterns?
Examples
- Frequent writes and reads
- Frequent writes, but occasional reads
- Occasional writes and reads
- Occasional writes, but frequent reads
Does the workload need file-based access, using NFS for example?
Should multiple clients be able to read or write data simultaneously?
Identify security constraints
What are your data-encryption requirements? For example, do you need to use keys that you control?
Are there any data-residency requirements?
Define data-resilience requirements
- Does your workload need low-latency caching or scratch space?
- Do you need to replicate the data in the cloud for redundancy?
- Do you need strict read-write consistency for replicated datasets?
Set performance expectations
What is the required I/O rate?
What levels of read and write throughput does your application need?
What environments do you need storage for? For a given workload, you might need high-performance storage for the production environment, but could choose a lower-performance option for the non-production environments.
Review the storage options
Google Cloud offers storage services for all the key storage formats: block, file, and object. Review and evaluate the features, design options, and relative advantages of the services available for each storage format.
Overview
Block storage
The data that you store in block storage is divided into chunks, each stored as a separate block with a unique address. Applications access data by referencing the appropriate block addresses. Block storage is optimized for high-IOPS workloads, such as transaction processing. It's similar to on-premises storage area network (SAN) and directly attached storage (DAS) systems.
The block storage options in Google Cloud are a part of the Compute Engine service.
Option | Overview |
---|---|
Persistent Disk | Dedicated hard-disk drives (HDD) and solid-state drives (SSD) for enterprise and database applications deployed to Compute Engine VMs and Google Kubernetes Engine (GKE) clusters. |
Local SSD | Ephemeral, locally attached block storage for high-performance applications. |
File storage
Data is organized and represented in a hierarchy of files that are stored in folders, similar to on-premises network-attached storage (NAS). File systems can be mounted on clients using protocols such as NFS and Server Message Block (SMB). Applications access data using the relevant filename and directory path.
Google Cloud provides a range of fully managed and third-party solutions for file storage.
Solution | Overview |
---|---|
Google Cloud Filestore |
NFSv3 file servers for Compute Engine VMs and Google Kubernetes Engine clusters. You can choose a service tier (Basic, High Scale, or Enterprise) that suits your use case. |
NetApp Cloud Volumes | File-based storage using NFS, SMB, or iSCSI. |
Dell Cloud PowerScale | File-based storage using NFS, SMB, or Hadoop Distributed File System (HDFS). |
More options | See Summary of file server options. |
Object storage
Data is stored as objects in a flat hierarchy of buckets. Each object is assigned a globally unique ID. Objects can have system-assigned and user-defined metadata, to help you organize and manage the data. Applications access data by referencing the object IDs, using REST APIs or client libraries. Object storage is similar to on-premises SAN in terms of the ability to scale, but is easier to manage and less expensive.
Cloud Storage provides low-cost, highly durable, no-limit object storage for diverse data types. The data you store in Cloud Storage can be accessed from anywhere, within and outside Google Cloud. Geo-redundant replication provides maximum reliability. You can select a storage class that suits your data-retention and access-frequency requirements.
Comparative analysis
The following table provides a comparative analysis of the key capabilities of the storage services in Google Cloud.
Persistent Disk | Local SSD | Filestore | Cloud Storage | |
---|---|---|---|---|
Capacity |
10 GB to 64 TB per disk 257 TB per VM |
375 GB per disk 9 TB per VM |
1–100 TiB per Filestore instance (the minimum and maximum capacity and the scaling increments vary by service tier) | No lower or upper limit |
Scaling |
|
Not scalable |
|
Scales automatically based on usage |
Sharing |
Limited sharing
|
Not shareable | Mountable on multiple Compute Engine VMs, remote clients, and GKE clusters |
|
Encryption keys |
Google-managed, customer-managed, or customer-supplied keys | Google-managed keys |
|
Google-managed, customer-managed, or customer-supplied keys |
Persistence |
Lifetime of the disk | Ephemeral (data lives until the VM is stopped or deleted) | Lifetime of the Filestore instance | Lifetime of the bucket |
Availability |
|
Not supported |
| |
Performance |
Linearly scaling high performance, based on disk size and CPU count | Highest performance |
|
Autoscaling read-write rates, and dynamic load redistribution |
Management |
Manually format and mount | Manually format, stripe, and mount | Fully managed | Fully managed |
Workloads |
|
|
|
|
Design a storage strategy
There are two parts to selecting a storage strategy:
- Deciding which storage services you need.
- Choosing the required features and design options in a given service.
Examples of service-specific features and design options
Persistent Disk
- Deployment region and zone
- Regional replication
- Disk type, size, count, and performance class
- Encryption keys: Google-managed, customer-managed, or customer-supplied
- Snapshot schedule
Cloud Storage
- Location: multi-region, dual-region, single region
- Storage class: Standard, Nearline, Coldline, Archive
- Access control: uniform or fine-grained
- Encryption keys: Google-managed, customer-managed, or customer-supplied
- Retention policy
Filestore
- Deployment region and zone
- Instance tier
- Capacity
- IP range: auto-allocated or custom
- Access control
Storage strategy recommendations
Use the following recommendations as a starting point to choose the storage services and features that meet your requirements. These recommendations are also presented as a decision tree later in this document.
For applications that need multi-writer file storage with predictable performance, choose a suitable file storage service based on the required access protocol.
Access protocol Recommendation NFSv3 Use Filestore.
Choose a service tier (Basic, High Scale, or Enterprise) that suits your use case.NFSv4, SMB, and other protocols Review the available Compute Engine file-server options. For workloads that need primary storage with high IOPS and low latency, and the ability to start small and scale gradually, use Persistent Disks.
You can format each disk as a file system that the operating system of your VM supports (for example, Ext4 for Linux and NTFS for Windows).
Depending on your workload's performance and persistence requirements, choose between ephemeral and persistent disks.
Requirement
Recommendation
Fast scratch disk or cache Use local SSD disks (ephemeral).
For data persistence, use Persistent Disks.Sequential IOPS Use Persistent Disks with the pd-standard
disk type.High rates of random IOPS Use local SSD or extreme Persistent Disks. Depending on your redundancy requirements, choose between zonal and regional disks.
Requirement
Recommendation
Redundancy within a single zone in a region Use zonal Persistent Disks. Redundancy across multiple zones within a region Use regional Persistent Disks.
For a detailed comparative analysis, see Persistent Disk options.
For unlimited-scale, geo-redundant, high-throughput, and lowest-cost storage, use Cloud Storage.
Choose a suitable Cloud Storage class.
Requirement Recommendation Storage for data that's accessed frequently, including for high-throughput analytics, data lakes, websites, streaming videos, and mobile apps. Use Standard Storage.
To cache frequently accessed data and serve it from locations that are close to the clients, use Cloud CDN.Low-cost storage for infrequently accessed data that can be stored for at least 30 days (for example, backups and long-tail multimedia content). Use Nearline Storage. Low-cost storage for infrequently accessed data that can be stored for at least 90 days (for example, disaster recovery). Use Coldline Storage. Lowest-cost storage for infrequently accessed data that can be stored for at least 365 days, including regulatory archives. Use Archive Storage. For a detailed comparative analysis, see Cloud Storage classes.
Decision tree
The following decision tree guides you through the recommendations discussed earlier:
What's next
- Estimate storage cost using the Google Cloud Pricing Calculator.
- Learn about the best practices for building a cloud topology that's optimized for security, resilience, cost, and performance.
- Review the storage options in Google Cloud.
- Learn about the differences between object, block, and file storage in Google Cloud (video).
- Learn when to use parallel file systems like Lustre for HPC workloads.