Google Cloud

Running the same, everywhere part 2: getting started

August 2, 2016

Miles Ward

Director of Solution Architecture, Google Cloud

In part one of this post, we looked at how to avoid lock-in with your cloud provider by selecting open-source software (OSS) that can run on a variety of clouds. Sounds good in theory, but I can hear engineers and operators out there saying, “OK, really, how do I do it?”

Moving from closed to open isn’t just about knowing the names of the various OSS piece-parts and then POOF! — you're magically relieved of having to make tech choices for the next hundred years. It’s a process, where you choose more and more open systems and gradually gain more power.

Let’s assume that you’re not starting from scratch (if you are, please! Use the open tools we’ve described here as opposed to more proprietary options). If you’ve already built an application that consumes some proprietary components, the first step is to prioritize migration from those components to open alternatives. Of course, this starts with knowing about those alternatives (check!) and then following a given product’s documentation for initialization, migration and operations.

But before we dive into specific OSS components, let’s put forth a few high-level principles.

Applications that are uniformly distributed across distinct cloud providers can be complex to manage. It’s often substantially simpler and more robust to load-balance entirely separate application systems than it is to have one globally conjoined infrastructure. This is particularly true for any services that store state, such as storage and database tools; in many cases, setting up replication across providers for HA is the most direct path to value.
The more you can minimize the manual work required to relocate services from one system to another, the better. This of course can require very nuanced orchestration and automation, and its own sets of skills. Your level of automated distribution may vary between different layers of your stack; most companies today can get to “apps = automated” and “data = instrumented” procedures relatively easily, but “infra = automated” might take more effort.
No matter how well you think migrating these systems will work, you won’t know for sure until you try. Further, migration flexibility atrophies without regular exercise. Consider performing regular test migrations and failovers to prove that you’ve retained flexibility.
Lock-in at your “edges” is easier to route around or resolve than lock-in at your “core.” Consider open versions of services like queues, workflow automation, authentication, identity and key management as particularly critical.
Consider the difference in kind between “operational lock-in” versus “developer lock-in.” The former is painful, but the latter can be lethal. Consider especially carefully the software environments you leverage to ensure that you avoid repetitive work.

Getting started

With that said, let’s get down to specifics and look at the various OSS services that we recommend when building this kind of multi-cloud environment.

If you choose Kubernetes for container orchestration, start off with a Hello World example, take an online training course, follow setup guides for Google Container Engine and Elastic Compute Cloud (EC2), familiarize yourself with the UI, or take the docker image of an existing application and launch it. Perhaps you have applications that require communications between all hosts? If you’re distributed across two cloud providers, that means you’re distributed across two networks, and you’ll likely want to set up VPN between the two environments to keep traffic moving. If it’s a large number of hosts or a high-bandwidth interaction, you can use Google Cloud Interconnect.

If you’re using Google App Engine and AppScale for platform-as-a-service, the process is very similar. To run on the Google side, follow App Engine documentation, and for AppScale in another environment, follow their getting started guide. If you need cross-system networking, you can use VPN or for scaled systems — Cloud Interconnect.

For shops running HBase and Google Cloud BigTable as their big data store, follow the Cloud Bigtable cluster creation guide for the Cloud Platform side, and the HBase quickstart (as well as longer form not-so-quick-start guides). There’s some complexity in importing data from other sources into an HBase-compatible system; there’s a manual for that here.

The Vitess relational database is an interesting example, in that the easiest way to get started with this is to run it inside of the Kubernetes system we built above. Instructions for that are here, the output of which is a scalable MySQL system.

For Apache Beam/Cloud Dataflow batch and stream data processing, take a look at the GCP documentation to learn about the service, and then follow it up with some practical exercises in the How-to guides and Quickstarts. You can also learn more about the open source Apache Beam project on the project website.

For TensorFlow, things couldn’t be simpler. This OSS machine learning library is available via Pip and Docker, and plays nicely with Virtualenv and Anaconda. Once you’ve installed it, you can get started with Hello TensorFlow, or other tutorials such as MNIST For ML Beginners, or this one about state of the art translation with Recurrent Neural Nets.

The Minio object storage server is written in Golang, and as such, is portable across a wide variety of target platforms, including Linux, Windows, OS X and FreeBSD. To get started, head over to their Quickstart Guide.

Spinnaker is an open-source continuous delivery engine that allows you to build complex pipelines that take your code from a source repository to production through a series of stages — for example, waiting for code to go through unit testing and integration phases in parallel before pushing it to staging and production. In order to get started with continuous deployment with Spinnaker, have a look at their deployment guide.

But launching and configuring these open systems is really just the beginning; you’ll also need to think about operations, maintenance and security management, whether they run in a single- or multi-cloud configuration. Multi-cloud systems are inherently more complex, and the operational workflow will take more time.

Still, compared to doing this at any previous point in history, these open-source tools radically improve businesses’ capacity to operate free of lock-in. We hear from customers every day that OSS tools are an easy choice, particularly for scaled, production workloads. Our goal is to partner with customers, consultancies and the OSS community of developers to extend this framework and ensure this approach succeeds. Let us know if we can help you!

Posted in