Google Cloud for data center professionals: Storage

This article discusses storage services on Google Cloud, and how these services relate to traditional storage models. The article discusses the following storage types and their cloud counterparts:

  • Disk storage, including direct-attached storage (DAS), network-attached storage (NAS), and storage area networks (SAN)
  • Object storage
  • Archival storage, including disk arrays and magnetic tape media

Core storage components

This section provides a brief survey of the core storage components provided by Google Cloud.

Block storage

Google Cloud provides two block storage types. Both disk types are integrated into Compute Engine, Google Cloud's infrastructure-as-a-service (IaaS) product, and can be mounted to Compute Engine virtual machine (VM) instances:

  • Persistent disks, which are network-attached volumes that you can attach to your VM instances.
  • Local SSDs, which are directly attached to the physical machine on which your VM instance is running.

Persistent disks

Persistent disks are virtual block storage volumes. When you create a VM instance, the instance comes with a single bootable persistent disk that contains the machine's operating system. You can attach up to 128 volumes or 257 TB of persistent disks, depending on your VM's machine type.

A persistent disk's performance depends on the type of the disk, the size of the disk, and the maximum throughput of the VM instance to which the disk is attached. Each persistent disk can be either an HDD or a SSD, and each type has the performance characteristics typically associated with its physical counterpart: standard HDD persistent disks are efficient and economical for handling sequential read/write operations, whereas SSD persistent disks are more appropriate for high rates of random IOPS. The performance of a given persistent disk will increase with the disk's size until you hit the attached VM instance's throughput limits.

The machines containing the persistent disks are colocated in the same zone as the machines containing your VM instances, and are connected with Google's network fabric. Google's networks typically can deliver more than one petabit per second of total bisection bandwidth, helping to ensure that these networked persistent disks have comparable throughput and I/O properties to traditional locally attached disks.

While persistent disks are available within the zone in which they reside, they are not replicated across zones. If the VM instance attached to a persistent disk goes offline, that persistent disk retains its data but becomes inaccessible. To ensure high availability when using persistent disks, you must design for high availability between the regions and zones in which your workload is running. For more information about regions and zones, see Regions and zones.

Local SSD

A local SSD is physically attached to the same host machine as your Compute Engine VM instance. Local SSDs have higher throughput and lower latency than standard persistent disks or SSD persistent disks. However, local SSDs come with some caveats:

  • Local SSDs are less flexible than persistent disks. Unlike persistent disks, local SSDs are restricted to 375 GB in size. You can attach up to 8 volumes to a given non-shared-core VM instance, and you cannot attach local SSDs to a shared-core VM instance. In addition, you cannot use a local SSD as a boot device.
  • Local SSDs are not generally persistent. Though your data persists if you restart your VM instance, the data will be lost if you stop your instance.

Object storage

Cloud Storage is a hosted object storage service that allows you to store and access large numbers of binary objects, or blobs, of varying sizes. Cloud Storage buckets are the most scalable and durable storage option available on Google Cloud. If your applications do not require block storage, you should strongly consider storing your data in a Cloud Storage bucket.

Cloud Storage offers four main classes of storage. All Cloud Storage classes provide fast access to all data and support the same set of API calls:

  • Standard offers the highest availability of the Cloud Storage classes in a given location. This class is ideal for large-scale content storage and media file serving. When used in the same region as your compute resources, it's ideal for data analytics, machine learning, and compute workloads such as media processing.
  • Nearline, Coldline, and Archive provide cost-effective storage for data you don't intend to frequently access, such as backup data, disaster recovery data, and archival data.

Service model comparisons

This section maps the most common data center storage models to Google Cloud storage offerings, and discusses how cloud services departs from traditional data center technologies.

Direct-attached storage (DAS)

In a data center, direct-attached storage (DAS), sometimes called local disk, is a physical volume directly attached to a physical server. This volume can be internally attached, as with a boot disk, or externally attached, as with an external hard drive. If you want to allow other servers to access the data on a local disk, you must explicitly allow the server's operating system to share the disk across your network.

In the cloud, local disk and DAS aren't necessarily synonymous. On Google Cloud, for example, both local SSDs and persistent disks can support workloads that expect local disks:

  • Persistent disks, while network-attached behind the scenes, are the default analogue for DAS on Compute Engine. Both persistent disk types provide strong performance for their respective use cases, and both are less expensive than local SSDs as well, making them a good choice for most workloads.
  • Local SSDs are more directly analogous to traditional DAS, and are a good choice for workloads that have high storage-performance requirements. However, local SSDs are not persistent. They cannot be used as boot disks, and they should not be used for workloads that expect persistent local storage.

Network-attached storage (NAS)

In a data center, a network-attached storage (NAS) device, also called a filer, provides a way for applications to read and update files that are shared across machines. Typically, a filer uses a protocol that enables client machines to mount a file system and access the files as if they were hosted locally.

Google Cloud provides Filestore, a native filer solution as a service. You can also run a filer on Google Cloud in a variety of ways. For more information, see File servers on Compute Engine.

Storage area network (SAN)

In a data center, a storage area network (SAN) is a remote storage unit that provides both block-level access and an internal management layer through which you can provision individual logical unit numbers (LUNs) to resources. When connecting to a SAN, users mount the SAN itself as a disk rather than connecting to a server with attached disks, as with a filer.

On Google Cloud, you can use persistent disks to support workloads that expect SANs. Used in a SAN context, persistent disks are analogous to the logical disk volumes you would access through logical unit number (LUN) devices, and can be provisioned in a similar way. As with LUN-based logical disk volumes, you can mount multiple persistent disks to a single VM instance. You can also mount a single read-only persistent disk to multiple VM instances.

Persistent disks are local to a given zone, which means that you can continue to use LUN zoning as a strategy for restricting machine access if necessary.

When you move your SAN to Google Cloud, you also get some additional benefits that are unique to the cloud:

  • A persistent-disk-based SAN has almost no storage ceiling. You can provision new persistent disks on the fly without worrying about running out of physical storage space. Individual VM instances, however, are limited to 128 persistent disks or 64 TB of block storage.
  • You don't have to worry about physical hardware considerations, such as optimizing the number of VM instances you can connect to a given LUN.
  • Because persistent disks are replicated by default, you don't need to worry about choosing an optimal RAID level.

Archival storage

In a data center, you use standard archival storage media types, such as magnetic tape media, a storage disk array, or both, for archival data that must be retained long-term for business or legal purposes. Each of these storage types has its drawbacks:

  • Storage disk arrays can be expensive. In addition to the up-front cost of the hardware itself, if your backup policy requires that data be moved offsite, you might need to keep backups in a second data center. This situation brings the additional costs of establishing and maintaining connectivity to that data center.
  • Magnetic tape media is cheaper than a storage disk array, but requires more administrative overhead. To effectively manage your tape archives, you need a catalog server that allows you to track the history of your tapes, a tape library, and a sysadmin to manage the library and the accompanying imports/exports. Some businesses also require an external vendor who can perform secure tape pickups and deliveries regularly.

When you move to the cloud, this outlay of hardware and human resources is reduced. For example, on Google Cloud, you can replace both methods with Cloud Storage Nearline, Cloud Storage Coldline, and Cloud Storage Archive, which provide progressively "colder" storage solutions. Cloud Storage Nearline is designed for data you expect to access less than once a month. Cloud Storage Coldline is designed for data you expect to access less than once a quarter. Cloud Storage Archive is designed for data that you expect to access less than once a year.

Cloud Storage Nearline, Cloud Storage Coldline, and Cloud Storage Archive help address many of the issues that plague traditional archival storage methods. For example, unlike magnetic tape media, all three storage classes are durable and dependable. You don't have to worry about whether your tape is in working order, nor do you have to worry about retrieving your tape from an offsite facility or resolving data that spans multiple tapes. Moreover, you no longer have to worry about data replication—by default, every Cloud Storage storage class replicates your data to help ensure durability and availability.

In addition, Cloud Storage Nearline, Cloud Storage Coldline and Cloud Storage Archive offer low latency that is comparable to an on-site storage disk array, with only subseconds to retrieve your first byte of data. However, in contrast to a storage disk array, you don't have to pay for upfront hardware costs and expensive maintenance contracts. With Cloud Storage, you pay only for what you use.

Finally, Cloud Storage integrates with several popular catalog management systems. With this model, you can continue to use your current software and send your new archival backups to Cloud Storage Nearline, Cloud Storage Coldline, or Cloud Storage Archive. For more information, see Cloud Storage partners.


Persistent disk and local SSD

Compute Engine persistent disks and local SSDs are priced per GB per month. For more information about persistent disk and local SSD pricing, see Persistent disk pricing.

Cloud Storage Standard

The Cloud Storage pricing model contrasts sharply with that of traditional data-center storage. In a data center, you have to purchase your NAS or SAN storage hardware up front. In contrast, Cloud Storage charges by usage. You are billed for the amount of data you store per month, the amount of network egress, the amount of data transfer between locations, and the number of API requests you make. You don't have to worry about maintenance contracts or the costs associated with potential hardware failures that come with purchasing your own hardware.

For more information about Cloud Storage pricing, see Cloud Storage pricing.

Cloud Storage Nearline, Cloud Storage Coldline and Cloud Storage Archive

As with the Cloud Storage Standard storage class, Cloud Storage Nearline, Cloud Storage Coldline, and Cloud Storage Archive are priced by amount of data stored per month, by network egress, and by the amount of data transfer between locations. As archival classes, all three also have a storage retrieval fee and a minimum storage period. If you delete or modify your data before the minimum storage period, you will be charged for the remainder of the period. For example, if you delete an object 5 days after storing the object in Cloud Storage Nearline, you will be charged for the remaining 25 days of storage for that object.

For more information about Cloud Storage Nearline, Cloud Storage Coldline, and Cloud Storage Archive pricing, see Cloud Storage pricing.

What's next?

Next: Management on Google Cloud