Why Google App Engine rocks: A Google engineer’s take
Director of Customer Reliability Engineering
In December 2011, I had been working for Google for nine years and was leading a team of 10 software developers, supporting the AdSense business. Our portfolio consisted of over 30 software systems, mostly web apps for business intelligence that had been built over the past decade, each on a stack that seemed like a good idea at the time. Some were state-of-the-art custom servers built on the (then) latest Google web server libraries and running directly on Borg. Some were a LAMP stack on a managed hosting service. Some were running as a cron job on someone’s workstation. Some were weird monsters, like a LAMP stack running on Borg with Apache customized to work with production load balancers and encryption. Things were breaking in new and wonderful ways every day. It was all we could do to keep the systems running — just barely.
The team was stressed out. The Product Managers and engineers were frustrated. A typical conversation went like this:
PM: “You thought it would be easy to add the foobar feature, but it’s been four
Eng: “I know, I know, but I had to upgrade the package manager version
first, and then migrate off some deprecated APIs. I’m almost done with that stuff.
I’m eager to start on the foobar, too.”
PM: “Well, now, that's disappointing.”
I surveyed the team to find the root cause of our inefficiency: we were spending 60% of our time on maintenance. I asked how much time would be appropriate, and the answer was a grudging 25%. We made a goal to reduce our maintenance to that point, which would free up the time equivalent of three and a half of our 10 developers.
Google App Engine had just come out of preview in September 2011. A friend recommended it heartily — he'd been using it for a personal site — and he raved that it was low-maintenance, auto-scaling and had built-in features like Google Cloud Datastore and user-management. Another friend, Alex Martelli, was using it for several personal projects. I myself had used it for a charity website since 2010. We decided to use it for all of our web serving. It was the team’s first step into PaaS.
Around the same time, we started using Dremel, Google’s internal version of BigQuery. It was incredibly fast compared to MapReduce, and it scaled almost as well. We decided to re-write all of our data processing to use it, even though there were still a few functional gaps between it and App Engine at the time, for example visualization and data pipelines. We whipped up solutions that are still in use by hundreds of projects at Google. Now Google Cloud Platform users can access similar functionality using Google Cloud Datalab.
What we saw next was an amazing transformation in the way that software developers worked. Yes, we had to re-write 30 systems, but they needed to be re-written anyway. WIth that finished, developing on the cloud was so much faster -- I recall being astonished at seeing the App Engine logs, that I had done 100 code, test, and deploy cycles in a single coding session. Once things were working, they kept working for a long time. We stopped debating what stack to choose for the next project. We just grabbed the most obvious one from Google Cloud Platform and started building. If we found a bug in the cloud infrastructure, it was promptly fixed by an expert. What a change from spending hours troubleshooting library compatibility!
Best of all, we quickly got the time we spent on maintenance down to 25%, and it kept going down. At the end of two years I repeated the survey; the team reported that they now only spent 5% of their time on maintenance.
We started having good and different problems. The business wasn’t generating ideas fast enough to keep us busy, and we had no backlog. We started to take two weeks at the end of every quarter for a “hackathon” to see what we could dream up. We transferred half of the developers to another, busier team outside of Cloud. We tackled larger projects and started out-pacing much larger development teams.
After seeing how using PaaS changed things for my team, I want everyone to experience it. Thankfully, these technologies are available not only to Google engineers, but to developers the world over. This is the most transformational technology I’ve seen since I first visited Google Search in 1999 — it lets developers stop doing dumb things and get on with developing the applications that add value to our lives.