AI & Machine Learning

5 ways Vertex Vizier hyperparameter tuning improves ML models

June 3, 2021

https://storage.googleapis.com/gweb-cloudblog-publish/images/networking_nedNYmJ.max-2600x2600.jpg

Sagi Perel

ML Research Engineer

We recently launched Vertex AI to help you move machine learning (ML) from experimentation into production faster and manage your models with confidence—speeding up your ability to improve outcomes at your organization.

But we know many of you are just getting started with ML and there’s a lot to learn! In tandem with building the Vertex AI platform, our teams are dropping as much best practices content as we can to help you come up to speed. Plus, we have a dedicated event on June 10th, Applied ML Summit, with sessions on how to apply ML technology in your projects, as well as grow your skills in this field.

In the meantime, we couldn’t resist a quick lesson on hyperparameter tuning, because (a) it’s incredibly cool (b) you will impress your coworkers (c) Google Cloud has some unique battle tested tech in this area and (d) you will save time by getting better ML models into production faster. Vertex Vizier, on average, finds optimal parameters for complex functions in over 80% fewer trials than traditional methods.

So it’s incredibly cool, but what is it?

While machine learning models automatically learn from data, they still require user-defined knobs which guide the learning process. These knobs, commonly known as hyperparameters, control, for example, the tradeoff between training accuracy and generalizability. Examples of hyperparameters are the optimizer being used, its learning rate, regularization parameters, the number of hidden layers in a DNN, and their sizes.

Setting hyperparameters to their optimal values for a given dataset can make a huge difference in model quality. Typically, optimal hyperparameter values are found via grid searching a small number of combinations, or tedious manual experimentation. Hyperparameter tuning automates this work for you by searching for the best configuration of hyperparameters for optimal model performance.

Vertex Vizier enables automated hyperparameter tuning in several ways:

"Traditional" hyperparameter tuning: by this we mean finding the optimal value of hyperparameters by measuring a single objective metric which is the output of an ML model. For example, Vizier selects the number of hidden layers and their sizes, an optimizer and its learning rate, with the goal of maximizing model accuracy.
When hyperparameters are evaluated, models are trained and evaluated on splits of the data set. If evaluation metrics are streamed to Vizier (e.g. as a function of epoch) as the model is trained, Vizier’s early stopping algorithms can predict the final objective value, and recommend which unpromising trials should be early stopped. This conserves compute resources and speeds up convergence.
Oftentimes, models are tuned sequentially on different data sets. Vizier’s built in transfer learning learns priors from previous hyperparameter tuning studies, and leverages them to converge faster on subsequent hyperparameter tuning studies.
AutoML is a variant of #1, where Vertex Vizier performs both model selection, and also tunes architectures/non-architecture modifying hyperparameters. AutoML usually requires more code on top of Vertex Vizier (to ingest data etc), but Vizier is in most cases the "engine" behind the process. AutoML is implemented by defining a tree like (DAG) search space, rather than a “flat” search space (like in #1). Note that you can use DAG search spaces for any other purpose where searching over a hierarchical space makes sense.
There are times when you may wish to optimize more than one metric. For example, we would like to optimize model accuracy, while minimizing model latency. Vizier can find the Pareto frontier, which presents tradeoffs for multiple metrics, allowing users to choose the appropriate tradeoff. Simple example: I want to make a more accurate model, but would like to minimize serving latency. I do not know ahead of time what's the tradeoff between the two metrics. Vizier can be used to explore and plot a tradeoff curve, so users can select on the most appropriate one. For example, "a latency decrease of 200ms will only decrease accuracy by 0.5%”

Google Vizier is all yours with Vertex AI

Google published the Vizier research paper in 2017, sharing our work and use cases for black-box optimization—i.e. The process of finding the best settings for a bunch of parameters or knobs when you can’t peer inside a system to see how well the knobs are working. The paper discusses our requirements, infrastructure design, underlying algorithms, and advanced features such as transfer learning that the service provides. Vizier has been essential to our progress with machine learning at Google, which is why we are so excited to make it available to you on Vertex AI.

Vizier has already tuned millions of ML models at Google, and its algorithms are continuously improved for faster convergence and handling of real-life edge cases. Vertex Vizier’s models are very well calibrated and are self-tuning (they adapt to user data), and offer unique power features, such as hierarchical search spaces and multi-objective optimization. We believe Vertex Vizier’s set of features is a unique capability to Google Cloud, and look forward to optimizing the quality of your models by automatically tuning hyperparameters for you.

To learn more about Vertex Vizier, check out these docs and if you are interested in what’s coming in machine learning over the next five years, tune in to our Applied ML Summit on June 10th, or watch the sessions on demand in your own time.

AI & Machine Learning