Google Cloud

With Amadeus, Cloud is in the Air

January 5, 2016

Google Cloud Platform

Today we hear from Olivier Favorel, Senior Manager, Airline IT at Amadeus. Operating in 195 countries, Amadeus is a leading technology company dedicated exclusively to the global travel industry. When an increase in CPU consumption of just 100 microseconds can mean thousands of dollars of extra hosting, Amadeus turned to Google Cloud Platform to offer new alternatives to its airline customers.

At Amadeus, we develop the technology that will shape the future of travel. To understand the business needs of our customers and partners, we’re highly focused on the trends impacting airlines. One main trend is the exponential growth of consumers browsing and shopping for airline products across digital channels.

Airline “look-to-book” ratios, or the average number of search requests before a flight reservation is actually made, were previously as low as 10:1. Today, these can easily run to 1000:1. Moreover, demand is never constant, thus managing demand fluctuations requires the ability to anticipate strong traffic peaks and make necessary capacity arrangements — a challenging task for airlines. In order to cope with the pressure of ever increasing online shopping transactions, shopping engines have developed cache-based solutions. However, cache-based systems have certain limitations, as they don’t accurately reflect an airline’s sophisticated revenue management policies.

Large network carriers are investing in advanced revenue management solutions to capture maximum traveler value and generate revenue. Maximizing revenue requires real-time capability to process every shopping request and make the right “flight availability” (availability of seats in a particular fare class) offer at an optimal price. Furthermore, it’s crucial for airlines to display consistent offers across various shopping platforms to capture every sales opportunity. Cache-based systems conflict with real-time revenue optimization, thus hindering airlines’ merchandising and personalization capabilities to make the right product offer to the right customer at the right time for the right price.

Given the challenge to maintain accurate and consistent airline offers across all distribution channels, how can we ensure high performance in dynamic content distribution for massive volumes?

With the help of Google Cloud Platform, Amadeus has developed a unique cloud-based solution, Amadeus Airline Cloud Availability. The solution offloads the processing of shopping transactions outside the airline reservation system, where the booking and payment are finally performed. This solution can be deployed in any public or private cloud, bringing airline offers closer to the shopping platform serving online travel agencies, meta searches or global distribution systems, while taking full advantage of more efficient solutions.

https://storage.googleapis.com/gweb-cloudblog-publish/images/amadeus2B1t19n.max-700x700.PNG

Figure 1: Amadeus Airline Cloud Availability architecture

This solution helps airlines efficiently manage the huge increase in search and shopping traffic.

We conducted a pilot of Amadeus Airline Cloud Availability in Google Cloud Platform from February to July 2015, together with Lufthansa. The objective of the pilot was two-fold:

Demonstrate the scalability and performance of flight availability requests using Google Compute Engine. Amadeus is currently handling requests for 4M+ flights per second in its private data center in Munich, for more than 140 airline carriers. This traffic increases by 50% every year.
Contain infrastructure cost of flight availability traffic.

The flight availability requests are handled by a farm of C++ backends accessing data through a Couchbase cluster, a distributed NoSQL store that hosts the airline flight and fare details. CPU consumption is a critical indicator for these kinds of large scale applications; an increase in CPU consumption of 100 microseconds per transaction translates into several thousands of dollars in extra hosting costs over a one-year period.

The initial deployment of our solution in Compute Engine was seamless thanks to the intuitive console and vast set of pre-installed Linux images (CentOS 7.1 in our case). First flight availability backend instances were ready to accept traffic only two hours after our initial connection.

The 1,500 cores challenge

Amadeus and Google engineering teams worked hand-in-hand to get the most out of a pre-allocated capacity of 1,500 cores spread over 3 regions (Central US, Western Europe and East Asia), each region being fed by airline data thanks to Couchbase Cross-Datacenter (XDCR) replication protocol.

Our mission was to increase the volume of flight availability requests processed per dollar. Several actions were undertaken:

Reducing the CPU path-length per transaction thanks to several C++ low level optimizations, and usage of Google’s tcmalloc memory allocator.
Increasing the IO throughput towards Couchbase data store to keep our application cores busy. We were quite impressed by the stability and very low latency of the internal Compute Engine network (stable sub-millisecond round-trip to Couchbase cluster nodes).
Enabling NOOP scheduler on VMs hosting our Couchbase cluster (optimal IO scheduling pattern to increase throughput to SSD drives).
Adjusting the VMs size (CPU/Memory ratio) to ensure that our servers were running constantly between 85-90% CPU usage (n1-highcpu-16 for application servers and n1-highmem-4 for Couchbase cluster nodes).

https://storage.googleapis.com/gweb-cloudblog-publish/images/amadeus2B3sjgb.max-400x400.PNG

https://storage.googleapis.com/gweb-cloudblog-publish/images/amadeus2B22f8v.max-400x400.PNG

Figure 2: GCP Console and Performance Reports

The results

Pilot objectives were achieved much faster than initially planned, thanks to the flexibility of GCP and the reactivity of Google support teams.

The overall throughput of flight availability requests processed by 1,500 cores was doubled after only three months of joint effort.

Going further

We’re now engaging in the second phase of the pilot, aiming at dynamically adjusting the hardware capacity to the fluctuating shopping demand, further tuning the size of our VMs and leveraging the benefits of Compute Engine Preemptible VMs (“low-cost VMs” as we like to call them):

Dynamic capacity adjustment is being implemented thanks to Kubernetes (Google’s container orchestration and cluster management solution) that’s being rolled out in the pilot framework to dynamically spawn or shut down application VMs in line with flight availability traffic fluctuation. Kubernetes is shipped by our PaaS partner, Red Hat, as part of their OpenShift offer (we’re building our internal application platform, Amadeus Cloud Services, on top of these strategic products, to ensure our independence to the underlying IaaS provider). Per-minute billing of instances further optimizes the hosting costs.
Preemptible VMs, released in May 2015, run at a much lower price than standard VMs (70% off) but might be terminated, or preempted, by Compute Engine if it requires access to those resources for other tasks. Our plan is to oversize the number of computation VMs by 10% and use exclusively preemptible instance types, assuming that a fraction of those VMs will be terminated on a daily basis but still keeping our overall processing power at the required level to handle the flight availability traffic. Significant cost savings are anticipated with this new feature as well.
Custom machine types, released in November 2015, are being setup to replace our standard instance types (n1-highcpu-16 and n1-highmem-4). Custom VMs will be sized with only the required amount of cores and minimal memory requirement (per GB). The objective is to avoid any waste of CPU/memory.

Return on experience

Our journey on GCP was very exciting and impressed us for the following reasons:

Performance: Network latency, throughput and stability have astonishing performance. Also, the on-going migration of VMs to next-generation Intel architecture (Haswell) in many regions will bring even more CPU gains to flight availability request processing.
Stability: We faced very few VM outages over the 6-month pilot duration. The maintenance notification process is working great and the live VM migration is really transparent.
Monitoring: The Stackdriver framework is awesome to report both system metrics (CPU, Memory, IOs) and user-defined KPIs (like the rate of airline flights processed per second). Coupled with an efficient alerting system and the “Cloud Console” mobile app, we rapidly ended up with a production-grade monitoring solution.
Pace of innovation: During the six month duration of the pilot, three major announcements were made that helped our project: introduction of preemptible VMs, rollout of custom machine types and most importantly a 15% price drop in May 2015.

Summary

The pilot in the Google Cloud Platform changed our approach to performance optimization, from a pure CPU cost angle to an infrastructure driven approach (the efficiency is what matters in the end). GCP proved to be a very efficient sandbox environment for internal benchmarking, and we have no doubt that it will become a natural hosting solution for more Amadeus applications in the future.

Posted in