Deploying Hybrid Cloud Storage with Swiftstack

By Erik Pounds, Head of Product Marketing, Swiftstack

This article discusses integrating Swiftstack and Google Cloud Platform (GCP) to automatically and continuously synchronize data to GCP based on policies you define.

SwiftStack is a software-defined object storage solution that customers can deploy on-premises behind their firewall, running on standard server hardware.

Core to SwiftStack is the ability to scale a cluster with a single namespace across multiple geographic regions. While this makes excellent use of multiple data centers (if you have them), your workflows may benefit from a hybrid cloud strategy. With Cloud Sync in SwiftStack, data can be synchronized to GCP based on policies you define, and the data lives wherever it is needed by users and applications.

If you do not have a second site, this helps give you a seamless way to further protect your vital business data. It also allows you to provide access to specific data in a public bucket as an alternative to opening up your private cloud. And it can even assist with cloud bursting and archiving to Google Cloud Storage Coldline.

Highlights of a hybrid cloud strategy

  • Sync data from SwiftStack to Cloud Storage.
  • Policy-driven, where it's a property of the container.
  • Many container-to-bucket sync relationships.
  • Contents of many containers can be stored in a single public bucket.
  • Native objects are replicated and are not in a proprietary archive.

Use cases

  • Second site for offsite data protection.
  • Provide data access to users in public bucket.
  • Quickly stage data to high-performance storage and then archive off to Coldline.
  • Cloud bursting to utilize elastic compute resources.

image

After SwiftStack has been authorized to put data into your GCP account, mappings can be created between SwiftStack containers and Cloud Storage buckets. SwiftStack then continuously replicates data from the on-premises containers to buckets in the public cloud. All on-premises changes (new, changed, and deleted objects) are propagated to Cloud Storage.

This is unlike many cloud gateways. With SwiftStack, native objects are replicated to a bucket in the public cloud and are not in a proprietary format. This means that you can easily access and operate on the data in GCP.

See a demo video

Joe Arnold, founder and Chief Product Officer of SwiftStack, demonstrates how to synchronize data in a SwiftStack container with a bucket in Cloud Storage (the extent of this article). He then shows an example of what you can do with data after it's in the public cloud, because extending your private storage infrastructure to the public cloud is most powerful when you can use the scalable compute capabilities that it offers (not covered in this article).

Configuration and deployment

This article walks you through how to synchronize data from an on-premises SwiftStack cluster to GCP, keeping the data in cloud-native format, which allows it to be accessible by applications and users operating in the public cloud.

This document assumes that you already have SwiftStack storage configured with a container of data and an available bucket in GCP. For assistance setting up SwiftStack for the first time, see the Quick Start section of the documentation at https://www.swiftstack.com/docs/.

After you configure and deploy a synchronization relationship, data changes made on-premises will automatically be replicated to the public bucket.

Obtaining SwiftStack

SwiftStack is offered for trial at no charge for up to 50 TB of data in non- commercial, non-production uses. A single x64-compatible server running standard Linux is required to set up and test this solution. To sign up, go to https://www.swiftstack.com/try-it-now.

Prerequisites

The processes doing the replication reside on the nodes that handle the container role. These nodes must have network access to GCP in order to replicate the data.

Cloud Sync in SwiftStack exclusively uses the S3 API. You will need to get a developer key for your Google Cloud account, which comprises the access key and secret key that are needed for configuration. Refer to the Cloud Storage Migration Guide for details.

Supplying SwiftStack with GCP credentials

Before mapping a SwiftStack container to a Cloud Storage bucket, you must add a set of S3 API credentials to SwiftStack. Cloud Sync integrates with Amazon S3, Google Cloud Storage, and any other object stores that support the S3 API.

To set up credentials, in the SwiftStack Controller UI, navigate to the Cloud Sync tab in the manage cluster interface. Follow these steps to configure specific providers:

  1. Click Manage S3 Credentials. image
  2. Add credentials by specifying a friendly Label, Access Key ID, and Secret Access Key.
  3. Select Google Cloud Storage in the Provider dropdown menu. image
  4. Click Add S3 Credentials.

Afterward, you can edit the credentials (for example, if the secret keys are rotated). Each set of credentials for a given cluster must be unique, that is, you cannot add multiple credentials with the same name.

After editing credentials, you must deploy configuration for the changes to go into effect.

You can add multiple credentials to SwiftStack, because it can synchronize on-premises data with multiple public cloud accounts.

image

Configuring mappings

After at least one set of credentials exists, you can create Swift container mappings. To do so, on the Cloud Sync tab, follow these steps:

  1. Click Add a container mapping.
  2. Enter a SwiftStack Account, SwiftStack Container, the name of the S3 bucket in GCP, and select the Credentials to use.

    image

  3. Click Add container mapping.

Afterward, each mapping appears in the table on the Cloud Sync page:

image

Resetting a mapping

It is possible that objects from the remote bucket at some point are removed, through an accident or deliberate action. Cloud Sync can repopulate all of the missing objects and ensure that all of the data is replicated. You can use the reset button next to the affected mapping to trigger this action.

The length of the actual process to re-populate the remote data depends depends on how many objects have been removed. Cloud Sync will continue to move objects in the background until it has iterated through all of the objects in the Swift container and ensured they are replicated.

How Cloud Sync in SwiftStack works

This section provides details into how Cloud Sync in SwiftStack works and some tips to ensure optimal success with this hybrid cloud solution.

SwiftStack object representation in a public cloud bucket

To allow for objects from multiple SwiftStack containers to appear in an public cloud bucket, the S3 keys include the private account and container. To prevent all keys from being stored with the same prefix for a given account, Cloud Sync also prepends a hashed prefix to each key. The prefixes for each mapping are listed in the Cloud Sync configuration table and are derived from the account and container. For example, if there is an object in a SwiftStack container under account AUTH_account, it will be stored in the S3 bucket as 62506b/AUTH_account/container/object.

The prefix for each mapping is listed on the Cloud Sync page to make it easier to locate data. The reason a prefix is added is related to performance considerations using public cloud buckets. Cloud Sync follows these guidelines in the case of having many containers replicated to public cloud buckets under one account.

Large objects

If you use large objects, the segments may be stored in a container that is not the one being synced. For example, by default, the swift-client will store the segments in a separate <container>_segments container. To ensure that the content of the large objects is also replicated, configure a mapping for the segments container.

Large objects are not converted into a multi-part upload in when replicating to public cloud accounts using the S3 API, meaning that each segment will be a separate object in a public cloud bucket.

Multiple SwiftStack clusters

Beware of using the same bucket in GCP for multiple SwiftStack clusters. In that case, if the same account and container exist on two clusters, one cluster may overwrite objects from the other in the bucket.

Next steps

Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...