Google Cloud Platform for AWS Professionals: Storage

Updated June 29, 2016

Compare the storage services that Amazon and Google provide in their respective cloud environments. This article discusses the following service types:

  • Distributed object storage, or redundant key-value stores in which you can store data objects. Amazon and Google both offer standard, reduced-cost, and cold tiers of service for this storage type.
  • Block storage, or virtual disk volumes that you can attach to virtual machine instances.

Distributed object storage

Amazon Simple Storage Service (S3) and Google Cloud Storage are hosted services for storing and accessing large numbers of binary objects, or blobs, of varying sizes. Each service can be understood as a highly scalable key-value store, where the keys are strings and values are arbitrary binary objects.

This section discusses the Amazon S3 Standard and Cloud Storage Standard storage classes. However, Amazon S3 and Cloud Storage each offer several classes of service, allowing users to make a tradeoff between storage cost and speed of retrieval. For information on these other storage classes, see the Reduced-cost data storage and Cold data storage sections below.

Service model comparison

Cloud Storage and Amazon S3 have very similar service models. In both services, you store objects in a bucket. Each object within a bucket is identified by a unique key within that bucket, and each object has an associated metadata record. This metadata record contains information such as object size, date of last modification, and media type. If you have the appropriate permissions, you can view and modify some of this metadata. You can also add custom metadata, if needed.

Though buckets are key-value stores, the user experience for buckets is designed to emulate that of filesystems. As such, by convention, object keys are usually paths such as "/foo/bar.txt" or "/foo/subdir/baz.txt." Amazon S3 and Cloud Storage extend the filesystem metaphor further by providing filesystem-like APIs—for example, the provided list method lists all object keys with a common prefix, not unlike ls -R would on a Unix-like filesystem.

In addition to their most obvious use, distributed object storage, both services can be used to host static web content and media.

Amazon S3's features and terminology map to those of Cloud Storage as follows:

Feature Amazon S3 Cloud Storage
Unit of deployment Bucket Bucket
Deployment identifier Globally unique key Globally unique key
File system emulation Limited Limited
Object metadata Yes Yes
Object versioning Yes Yes
Object lifecycle management Yes Yes
Update notifications Event notifications Object change notifications
Service classes Standard, Reduced Redundancy, Infrequent Access, Amazon Glacier Standard, Durable Reduced Availability, Nearline
Deployment locality Regional Regional and multi-regional

Object keys

Due to the nature of Amazon S3's indexing scheme, users with high-throughput use cases must be careful about they way they generate their object keys, to avoid potential hotspotting issues. For more information, see Request Rate and Performance Considerations.

In Cloud Storage, the name of an object does not affect the performance or scalability of accessing objects within a bucket.

Object versioning

Amazon S3 and Cloud Storage both support object versioning, in which distinct versions of an object with a given key are stored under a distinct version ID. By enabling versioning, you can mitigate the risk of accidental data loss due to an object being overwritten.

In Cloud Storage, you can use preconditions to support conditional updates for PUT and DELETE operations. In a conditional update, the update request will succeed only if the object version being updated matches the object version specified in the request. This mechanism helps prevent the possibility of race conditions during updates. Amazon S3 does not support conditional updates.

Object lifecycle management

Amazon S3 and Cloud Storage both allow you to automate object deletion according to user-specified lifecycle policies. In Amazon S3, you can also set policies to automatically migrate objects across storage classes.

Update notifications

Amazon S3 and Cloud Storage both allow you to configure your buckets to issue notifications when objects are created, deleted, or updated. In Amazon S3, this feature is called event notifications, and in Cloud Storage, this feature is called object change notifications.

Amazon supports three possible destinations for notifications: an Amazon Simple Notification Service (SNS) topic, an Amazon Simple Queue Service (SQS) queue, or an AWS Lambda function. Cloud Storage lets you post notifications to a target URL, or webhook, that handles the notification payload.

Service level agreement

Amazon and Google both provide uptime guarantees, and have policies in place for crediting customer accounts in the event that these guarantees are not met. Amazon defines the guarantees and policies for Amazon S3 Standard in the standard Amazon S3 service level agreement (SLA). Google defines the guarantees and policies for Cloud Storage Standard in the Cloud Storage SLA.

Costs

Amazon S3 Standard

Amazon S3 Standard is priced by amount of data stored per month and by network egress. In addition, Amazon S3 Standard charges for common API requests.

Cloud Storage Standard

Like Amazon S3 Standard, Cloud Storage Standard is priced by amount of data stored per month and by network egress. Cloud Storage Standard also charges for common API requests.

Reduced-cost data storage

Cloud Storage and Amazon S3 each offer a reduced-cost storage class for data that does not require the robustness of the standard storage tier. Cloud Storage offers Cloud Storage Durable Reduced Availability (DRA), which offers less availability that the standard tier, and Amazon S3 offers S3 Reduced Redundancy Storage (RRS), which offers less redundancy than the standard tier.

Latency and throughput

Cloud Storage DRA and Amazon S3 RRS both have a first-byte latency period of no more than a few milliseconds. The throughput of both storage classes is identical to that of the Cloud Storage Standard and Amazon S3 Standard storage classes.

Service level agreement

Amazon and Google both provide uptime guarantees, and have policies in place for crediting customer accounts in the event that these guarantees are not met. Amazon defines the guarantees and policies for Amazon S3 RRS in the standard Amazon S3 SLA, and Google defines the guarantees and policies for Cloud Storage DRA in the Cloud Storage SLA.

Costs

Amazon S3 RRS

Amazon S3 RRS is priced by amount of data stored per month and by network egress. In addition, Amazon S3 RRS charges for common API requests, and charges for data retrieval on a per-GB basis.

Cloud Storage DRA

Like Amazon S3 RRS, Cloud Storage DRA is priced by amount of data stored per month and by network egress. Cloud Storage DRA also charges for common API requests.

Cold data storage

Google and Amazon each offer cold storage options for data that does not need to be accessed regularly or retrieved quickly. Cloud Storage offers an additional class called Cloud Storage Nearline, and Amazon offers two options:

  • Amazon Glacier, their initial cold storage offering.
  • Amazon S3 Standard - Infrequent Access Storage, their latest cold storage offering.

Latency

Amazon S3 Standard - Infrequent Access Storage has an advertised first-byte latency period of several milliseconds, and Amazon Glacier has a first-byte latency period of approximately four hours. Cloud Storage Nearline has a first-byte latency period of approximately three seconds.

Storage duration

Amazon and Google each have a minimum storage period their respective cold storage classes:

  • Data objects stored in Amazon S3 Standard - Infrequent Access Storage must remain unmodified for 30 days.
  • Data objects stored in Amazon Glacier must remain unmodified for 90 days.
  • Data objects stored in Cloud Storage Nearline must remain unmodified for 30 days.

In each service, if you delete or overwrite a data object before the minimum storage period ends, you will incur additional charges.

Service level agreement

Amazon defines its uptime guarantees and account crediting policies for Amazon S3 Standard - Infrequent Access Storage in the standard Amazon S3 SLA.

Amazon Glacier has no SLA.

Google defines its uptime guarantees and account crediting policies for Cloud Storage Nearline in the Cloud Storage SLA.

Costs

Amazon S3 Standard - Infrequent Access Storage

Amazon S3 Standard - Infrequent Access Storage is priced by amount of data stored per month and by network egress. Because Amazon S3 has a minimum object size of 128KB, objects smaller than 128KB are charged as if they are 128KB in size. If you delete or modify your data before the minimum storage period, you will be charged for the remainder of the period. For example, if you delete an object 5 days after storing the object, you will be charged for the remaining 25 days of storage for that object.

In addition, Amazon S3 Standard - Infrequent Access Storage charges for common API requests, and charges for data retrieval on a per-GB basis.

Amazon Glacier

Amazon Glacier pricing is more complex, taking several factors into account. These factors include:

  • The amount of data you've stored in Amazon Glacier.
  • The amount of data you're retrieving.
  • The peak amount of gigabytes you've retrieved in an hour in a given month.
  • The number of hours in the month in which the data is restored.
  • Whether the percentage of data you retrieve in a given period surpasses a prorated percentage of 5% of your total stored data in that month

If the amount of data you retrieve in a given period exceeds the prorated percentage of 5% required to remain in the free-retrieval tier, retrieval costs can escalate sharply. For a detailed explanation of Amazon Glacier's data retrieval pricing formula, see the Amazon Glacier FAQ.

In addition, if you delete or modify your data before the minimum storage period, you will be charged for the remainder of the period. For example, if you delete an object 5 days after storing the object, you will be charged for the remaining 85 days of storage for that object.

As with other Amazon S3 storage classes, Amazon Glacier also charges for common API requests.

Cloud Storage Nearline

Cloud Storage Nearline is priced by amount of data stored per month and by network egress. If you delete or modify your data before the minimum storage period, you will be charged for the remainder of the period. For example, if you delete an object 5 days after storing the object, you will be charged for the remaining 25 days of storage for that object.

Cloud Storage Nearline also charges for common API requests.

For more information about Cloud Storage Nearline pricing, see Cloud Storage Nearline pricing.

Block storage

Cloud Platform and Amazon Web Services both offer block storage options as part of their compute services. Google Compute Engine provides persistent disks, and Amazon Elastic Compute Cloud (EC2) provides Elastic Block Store (EBS). Each service has several block storage types that cover a range of price and performance characteristics.

Service model comparison

Compute Engine persistent disks and Amazon EBS are very similar in most ways. In both cases, disk volumes are network-attached, though both Compute Engine and Amazon EC2 also provide the ability to locally attach a disk if necessary. While networked disks have higher operational latency and less throughput than their locally attached counterparts, they have many benefits as well, including built-in redundancy, snapshotting, and ease of disk detachment and reattachment.

In both Compute Engine and Amazon EBS, you must create your disk volume in the same zone as the virtual machine instance to which the disk will be attached.

Each service provides redundancy within a single zone for increased durability. While this feature provides some protection against hardware failure, it does not protect against data corruption or accidental deletions due to user or application error. To protect important data, users should perform regular data backups and disk snapshots.

Compute Engine persistent disks map to Amazon EBS as follows:

Feature Amazon EBS Compute Engine
Volume types EBS Provisioned IOPS SSD, EBS General Purpose SSD, Throughput Optimized HDD, Cold HDD Standard persistent disk (HDD), SSD persistent disk
Volume attachment Can be attached to only one instance at a time Read-write volumes: Can be attached to only one instance at a time
Read-only volumes: Can be attached to multiple instances
Attached volumes per instance Up to 40 Up to 128
Maximum volume size 16TiB 64TB
Redundancy Yes Yes
Snapshotting Yes Yes
Snapshot locality Regional Global

Compute Engine's locally attached disks compare to those of Amazon EC2 as follows:

Feature Amazon EC2 Compute Engine
Service name Instance store (also known as ephemeral store) Local SSD
Volume attachment Tied to instance type Can be attached to any non-shared-core instance
Device type Varies by instance type SSD
Attached volumes per instance Varies by instance type Up to 8
Storage capacity Varies by instance type 375GB per volume
Live migration No Yes
Redundancy None None

Volume attachment and detachment

After creating a disk volume, you can attach the volume to a Compute Engine or Amazon EC2 virtual machine instance. The instance can then mount and format the disk volume like any other block device. Similarly, you can unmount and detach a volume from an instance, allowing it to be reattached to other instances.

An Amazon EBS volume can be attached to only one Amazon EC2 instance at a time. Compute Engine persistent disks in read/write mode have the same limitation. However, persistent disks in read-only mode can be attached to multiple instances simultaneously.

In Amazon EC2, you can attach up to 40 disk volumes to a Linux instance. In Compute Engine, you can attach up to 128 disk volumes.

Volume snapshotting

Compute Engine and Amazon EBS both allow users to capture and store snapshots of disk volumes. These snapshots can be used create new volumes at a later time.

In both services, snapshots are differential. The initial snapshot creates a full copy of the volume, but subsequent snapshots only copy the blocks that have changed since the previous snapshot.

Amazon EBS snapshots are available in only one region by default, and must be explicitly copied to other regions if needed. This extra step incurs additional data transfer charges. In contrast, Compute Engine persistent disk snapshots are global, and can be used in any region without additional operations or charges.

Volume performance

For both Compute Engine persistent disks and Amazon EBS, disk performance depends on several factors, including:

  • Volume type: Each service offers several distinct volume types. Each has its own set of performance characteristics and limits.
  • Available bandwidth: The throughput of a networked volume depends on the network bandwidth available to the Compute Engine or Amazon EC2 to which it is attached.

This section discusses additional performance details for each service.

Amazon EBS

Amazon EC2 instance types vary widely in networking performance. Instance types with a small number of cores, such as the T2 instance type, might not have sufficient network capacity to achieve the advertised maximum IOPs or throughput for a given Amazon EBS disk type. See Amazon EC2 Instance Configuration for more information.

In addition, some Amazon EC2 instance types are EBS-optimized, which means that they have a dedicated connection to their attached Amazon EBS volumes. If you use an Amazon EC2 instance type that is not EBS-optimized, you have no guarantees as to how much network capacity will be available between the instance and its EBS volumes at a given time.

Compute Engine persistent disks

Compute Engine allocates throughput on a per-core basis. You get 2 Gbps of network egress per virtual CPU core, with a maximum of 10 Gbps for a single virtual machine instance. Because Compute Engine has a data redundancy factor of 3.3x, each logical write actually requires 3.3 writes’ worth of network bandwidth. As such, machine types with a small number of cores might not have sufficient network capacity to achieve the advertised maximum IOPs or throughput for a given persistent disk type. See Network egress caps for more information.

For each Compute Engine disk type, the total I/O available is related to the total size of the volumes that are connected to a given instance. For example, if you have two 2.5TB standard persistent disks connected to an instance, your total available I/O comes out to 3000 read IOPS and 7500 write IOPS.

Locally attached disks

In addition to standard networked block storage, Amazon EC2 and Compute Engine both allow users to use disks that are locally attached to the physical machine running the instance. These local disks offer much faster transfer rates. However, unlike networked block storage, they are not redundant and cannot be snapshotted.

On Amazon EC2, local disks are called instance store or ephemeral store. These disks can be either HDD or SSD, depending on the instance type family. The number ands size of these disks depends on the specific instance type and is not adjustable.

On Compute Engine, local disks are referred to as local SSD. Local SSDs are, as their name implies, SSD-only, and can be attached to almost any machine type, with the exception of shared-core types like f1-micro and g1-small. Local SSDs have a fixed size of 375GB per disk, and a maximum of 8 can be attached to a single instance.

Compute Engine migrates local SSDs automatically and seamlessly when their host machines are down for maintenance. See the live migration blog post for more information.

While Amazon EC2 instance store comes at no additional cost, Compute Engine local SSDs do incur additional expenses. See Local SSD pricing for details.

Costs

Amazon EBS

Amazon EBS volumes are priced per GB per month and, for some volume types, per provisioned IOPS per month. In addition, Amazon EBS pricing varies by region. See Amazon EBS pricing for details.

Compute Engine persistent disks

Compute Engine persistent disks and disk snapshots are priced per GB per month. See Compute Engine Pricing for details.

What's next?

Check out the other Google Cloud Platform for AWS Professionals articles:

Send feedback about...

Google Cloud Platform for AWS Professionals