AI Platform Prediction goes GA with improved reliability & ML workflow integration
Robbie Haertel
Staff Engineer
Bhupesh Chandra
Senior Engineer
Machine learning (ML) is transforming businesses and lives alike. Whether it be finding rideshare partners, recommending products or playlists, identifying objects in images, or optimizing marketing campaigns, ML and prediction is at the heart of these experiences. To support businesses like yours that are revolutionizing the world using ML, AI Platform is committed to providing a world-class, enterprise-ready platform for hosting all of your transformative ML models.
As a part of our continued commitment, we are pleased to announce the general availability of AI Platform Prediction based on a Google Kubernetes Engine (GKE) backend. The new backend architecture is designed for improved reliability, more flexibility via new hardware options (Compute Engine machine types and NVIDIA accelerators), reduced overhead latency, and improved tail latency. In addition to standard features such as autoscaling, access logs, and request/response logging available during our Beta period, we've introduced several updates that improve robustness, flexibility, and usability:
XGBoost / scikit learn models on high-mem/high-cpu machine types: Many data scientists like the simplicity and power of XGBoost and scikit learn models for predictions in production. AI Platform makes it simple to deploy models trained using these frameworks with just a few clicks -- we'll handle the complexity of the serving infrastructure on the hardware of your choice.
Resource Metrics: An important part of maintaining models in production is understanding their performance characteristics such as GPU, CPU, RAM, and network utilization. These metrics can help make decisions about what hardware to use to minimize latencies and optimize performance. For example, you can view your model's replica count over time to help understand how your autoscaling model responds to changes in traffic and alter minReplicas to optimize cost and/or latency. Resource metrics are now visible for models deployed on GCE machine types from Cloud Console and Stackdriver Metrics.
Regional Endpoints: We have introduced new endpoints in three regions (us-central1, europe-west4, and asia-east1) with better regional isolation for improved reliability. Models deployed on the regional endpoints stay within the specified region.
VPC-Service Controls (Beta): Users can define a security perimeter and deploy Online Prediction models that have access only to resources and services within the perimeter, or within another bridged perimeter. Calls to the CAIP Online Prediction APIs are made from within the perimeter. Private IP will allow VMs and Services within the restricted networks or security perimeters to access the CMLE APIs without having to traverse the public internet.
But prediction doesn't just stop with serving trained models. Typical ML workflows involve analyzing and understanding models and predictions. Our platform integrates with other important AI technologies to simplify your ML workflows and make you more productive:
Explainable AI. To better understand your business, you need to better understand your model. Explainable AI provides information about the predictions from each request and is available exclusively on AI Platform.
What-if tool. Visualize your datasets and better understand the output of your models deployed on the platform.
Continuous Evaluation. Obtain metrics about the performance of your live model based on ground-truth labelling of requests sent to your model. Make decisions to retrain or improve the model based on performance over time.
"[AI Platform Prediction] greatly increases our velocity by providing us with an immediate, managed and robust serving layer for our models and allows us to focus on improving our features and modelling,” said Philippe Adjiman, data scientist tech lead at Waze. Read more about Waze's experience adopting the platform here.
All of these features are available in a fully managed, cluster-less environment with enterprise support -- no need to stand up or manage your own highly available GKE clusters. We also take care of the quota management and protecting your model from overload from clients sending too much traffic. These features of our managed platform allow your data scientists and engineers to focus on business problems instead of managing infrastructure.