Google shares data center security and design best practices
GCP NEXT 2016 — San Francisco — Attendees at Google Cloud Platform’s user conference this week got a chance to hear from two of the company’s leaders — Joe Kava, VP of data center operations and Niels Provos, distinguished engineer for security and privacy — on how the company designs, builds, operates and secures its data centers globally. They shared some of the secret sauce that makes Google's data centers so unique and what this means for GCP customers running inside them.
Security and data protectionGoogle’s focus on security and protection of data is a key design criteria. Our physical security features a layered security model, including safeguards like custom-designed electronic access cards, alarms, vehicle access barriers, perimeter fencing, metal detectors and biometrics. The data center floor features laser beam intrusion detection. Our data centers are monitored 24/7 by high-resolution interior and exterior cameras that can detect and track intruders. Access logs, activity records and camera footage are available in case an incident occurs.
Data centers are also routinely patrolled by experienced security guards who have undergone rigorous background checks and training (look closely and you can see a couple of them in this 360 degree data center tour). As you get closer to the data center floor, security measures increase. Access to the data center floor is only possible via a security corridor which implements multi-factor access control using security badges and biometrics. Only approved employees with specific roles may enter. Less than one percent of Google employees will ever set foot in one of our data centers.
We employ a very strict end-to-end chain of custody for storage, tracking everything from cradle to grave, from the first time a HD goes into a machine until it’s verified clean/erased or destroyed. Information security and physical security go hand-in-hand. Data is most vulnerable to unauthorized access as it travels across the Internet or within networks. For this reason, securing data in transit is a high priority for Google. Data traveling between a customer’s device and Google is encrypted using HTTPS/TLS (Transport Layer Security). Google was the first major cloud provider to enable HTTPS/TLS by default.
We build our own hardware and monitoring systemsGoogle servers don’t include unnecessary components such as video cards, chipsets or peripheral connectors, which can introduce vulnerabilities. Our production servers run a custom-designed operating system (OS) based on a stripped-down and hardened version of Linux. Google’s servers and their OS are designed for the sole purpose of providing Google services. Server resources are dynamically allocated, allowing for flexibility in growth and the ability to adapt quickly and efficiently, adding or reallocating resources based on customer demand.
For these teams to be successful they must have advanced, real-time visibility into the status and functionality of our infrastructure. As you might know, Google is obsessed with data, which is a bit of an understatement. To aid our teams, we've built monitoring and controls systems for all functional areas, from the servers, storage and networking systems, to the electrical distribution, mechanical cooling systems and security systems. We're monitoring all aspects of performance and operations from “chip to chiller.”
Using machine learning to optimize data center operationsTo help in this endeavor, we’re using our machine learning / deep learning algorithms for data center operations. As you can imagine, our data centers are large and complex, with electrical, mechanical and controls systems all working together to deliver optimal performance. Because of the sheer number of interactions and possible settings for these systems, it's impossible for mere mortals to visualize how best to optimize the data center in real time. However, it's fairly trivial for computers to crunch through these possible scenarios and find the optimal settings.
Over the past couple years we've developed this algorithm and trained it with billions of data points from our sites all over the world. We now use this machine learning model to help visualize the data so the operations teams can set up the data center electrical and cooling plants for the optimal, most efficient performance on any given day considering up to 19 independent variables that affect performance. This helps the team identify discontinuities or efficiency inflection points that aren't intuitive.
Powered by renewable energyOn the energy side, we're committed to powering our infrastructure with renewable energy. We're the world's largest private investor in renewable energy. To date we've invested more than $2 billion in renewable energy Power Purchase Agreements. These PPA's are very important because (1) we're buying the entire output of wind and solar farms for long periods, typically 10-20 years, (2) these wind farms are on the same power grids as our data centers, and (3) wind farms and data centers sharing power grids gives the project developer the financial commitment they need to get the project built, so we know our investment is adding renewable power to the grid that wouldn’t otherwise have been added.
For cooling, we've redesigned our fundamental cooling technology on average about every 12-18 months. Along the way, we've developed and pioneered innovations in water-based cooling systems such as seawater cooling, industrial canal water cooling, recycled / grey water cooling, stormwater capture and reuse, rainwater harvesting and thermal energy storage. We've designed data centers that don't use water-based solutions, instead using 100% outside air cooling. The point is there's no "one size fits all" model here. Each data center is designed for the highest performance and highest efficiency for that specific location.