By Erik Pounds, Head of Product Marketing, Swiftstack
This article discusses integrating Swiftstack and Google Cloud Platform (GCP) to automatically and continuously synchronize data to GCP based on policies you define.
SwiftStack is a software-defined object storage solution that customers can deploy on-premises behind their firewall, running on standard server hardware.
Core to SwiftStack is the ability to scale a cluster with a single namespace across multiple geographic regions. While this makes excellent use of multiple data centers (if you have them), your workflows might benefit from a hybrid cloud strategy. With Cloud Sync in SwiftStack, data can be synchronized to GCP based on policies you define, and the data lives wherever it is needed by users and applications.
If you do not have a second site, this helps give you a seamless way to further protect your vital business data. It also allows you to provide access to specific data in a public bucket as an alternative to opening up your private cloud. And it can even assist with cloud bursting and archiving to Cloud Storage Coldline.
Highlights of a hybrid-cloud strategy
- Sync data from SwiftStack to Cloud Storage.
- Policy-driven, where it's a property of the container.
- Many container-to-bucket sync relationships.
- Contents of many containers can be stored in a single public bucket.
- Native objects are replicated and are not in a proprietary archive.
- Second site for offsite data protection.
- Provide data access to users in public bucket.
- Quickly stage data to high-performance storage and then archive off to Coldline.
- Cloud bursting to utilize elastic compute resources.
The following diagram shows how objects move from your application, through SwiftStack, and to Cloud Storage.
After SwiftStack has been authorized to put data into your GCP account, mappings can be created between SwiftStack containers and Cloud Storage buckets.
SwiftStack then continuously replicates data from the on-premises containers to buckets in the public cloud. All on-premises changes (new, changed, and deleted objects) are propagated to Cloud Storage.
This is unlike many cloud gateways. With SwiftStack, native objects are replicated to a bucket in the public cloud and are not in a proprietary format. This means that you can access and operate on the data in GCP.
See a demo video
Joe Arnold, founder and Chief Product Officer of SwiftStack, demonstrates how to synchronize data in a SwiftStack container with a bucket in Cloud Storage (the extent of this article). He then shows an example of what you can do with data after it's in the public cloud, because extending your private storage infrastructure to the public cloud is most powerful when you can use the scalable compute capabilities that it offers (not covered in this article).
Configuration and deployment
This article walks you through how to synchronize data from an on-premises SwiftStack cluster to GCP, keeping the data in cloud-native format, which allows it to be accessible by applications and users operating in the public cloud.
This document assumes that you already have SwiftStack storage configured with a container of data and an available bucket in GCP. If you need assistance setting up SwiftStack for the first time, see the Quick start guide for SwiftStack.
After you configure and deploy a synchronization relationship, data changes made on-premises will automatically be replicated to the public bucket.
SwiftStack is offered for trial at no charge for up to 50 TB of data in non- commercial, non-production uses. A single x64-compatible server running standard Linux is required to set up and test this solution. To sign up, try out the Test drive.
The processes doing the replication reside on the nodes that handle the container role. These nodes must have network access to GCP in order to replicate the data.
Cloud Sync in SwiftStack exclusively uses the S3 API. You will need to get a developer key for your Google Cloud account, which comprises the access key and secret key that are needed for configuration. For more information, refer to the Cloud Storage migration guide.
Supplying SwiftStack with GCP credentials
Before mapping a SwiftStack container to a Cloud Storage bucket, you must add a set of S3 API credentials to SwiftStack. Cloud Sync integrates with Amazon Simple Storage Service (Amazon S3), Cloud Storage, and any other object stores that support the S3 API.
To set up credentials, follow these steps to configure specific providers:
- In the SwiftStack Controller UI, go to the Cloud Sync tab in the manage cluster interface.
- Click Manage S3 Credentials.
- Add credentials by specifying a friendly Label, Access Key ID, and Secret Access Key.
- Select Cloud Storage in the Provider drop-down list.
- Click Add S3 Credentials.
Afterward, you can edit the credentials (for example, if the secret keys are rotated). Each set of credentials for a given cluster must be unique, that is, you cannot add multiple credentials with the same name.
After editing credentials, you must deploy configuration for the changes to go into effect.
You can add multiple credentials to SwiftStack, because it can synchronize on-premises data with multiple public cloud accounts.
After at least one set of credentials exists, you can create Swift container mappings.
- On the Cloud Sync tab, click Add a container mapping.
Enter a SwiftStack Account, SwiftStack Container, the name of the S3 bucket in GCP, and select the Credentials to use.
Click Add container mapping.
Afterward, each mapping appears in the table on the Cloud Sync page.
Resetting a mapping
It is possible that objects from the remote bucket at some point are removed, through an accident or deliberate action. Cloud Sync can repopulate all of the missing objects and ensure that all of the data is replicated. You can use the reset button next to the affected mapping to trigger this action.
The length of the actual process to re-populate the remote data depends depends on how many objects have been removed. Cloud Sync will continue to move objects in the background until it has iterated through all of the objects in the Swift container and ensured they are replicated.
How Cloud Sync in SwiftStack works
This section provides details into how Cloud Sync in SwiftStack works and some tips to ensure optimal success with this hybrid cloud solution.
SwiftStack object representation in a public cloud bucket
To allow for objects from multiple SwiftStack containers to appear in an public
cloud bucket, the S3 keys include the private account and container. To prevent
all keys from being stored with the same prefix for a given account, Cloud Sync
also prepends a hashed prefix to each key. The prefixes for each mapping are
listed in the Cloud Sync configuration table and are derived from the account
and container. For example, if there is an object in a SwiftStack container
under account AUTH_account, it will be stored in the S3 bucket as
The prefix for each mapping is listed on the Cloud Sync page to make it easier to locate data. The reason a prefix is added is related to performance considerations using public cloud buckets. Cloud Sync follows these guidelines in the case of having many containers replicated to public cloud buckets under one account.
If you use large objects, the segments may be stored in a container that is not
the one being synced. For example, by default, the swift-client will store the
segments in a separate
<container>_segments container. To ensure that the
content of the large objects is also replicated, configure a mapping for the
Large objects are not converted into a multi-part upload in when replicating to public cloud accounts using the S3 API, meaning that each segment will be a separate object in a public cloud bucket.
Multiple SwiftStack clusters
Beware of using the same bucket in GCP for multiple SwiftStack clusters. In that case, if the same account and container exist on two clusters, one cluster might overwrite objects from the other in the bucket.