AutoML

Bringing Google AutoML to 3.5 million data scientists on Kaggle

November 4, 2019

Devvret Rishi

Product Manager, Kaggle

Recently, Kaggle hit a significant milestone by surpassing over 3.5 million users that use our platform to learn and apply machine learning. AI is one of the world’s most powerful emerging technologies, but even with its growing numbers, its adoption has been hampered by the limited amount of data scientists who have access to the tools and expertise to leverage it effectively. Kaggle’s mission is to empower our community of data scientists by providing them with the skills and tools they need to lead in their field, and now we’re advancing that mission by integrating AutoML into our platform.

Why we’re excited about AutoML

AutoML stepped into our spotlight earlier this year, where it led for most of our all-day machine learning competition at Kaggle Days at Cloud Next ’19, before being narrowly edged out by a team of data scientists in the closing moments of the event. The strong performance even made headlines and generated excitement for its future.

What especially drew our interest was that the team using AutoML was able to get strong performing results quickly, with low effort and no domain expertise or supervision. What’s more, they spent very little time on data prep, and virtually no time on feature engineering, model selection, and hyperparameter tuning. The time efficiency of AutoML became even more clear during the IEEE competition, where it took thousands of teams several weeks to beat the AutoML benchmark by a significant margin on our private leaderboard.

https://storage.googleapis.com/gweb-cloudblog-publish/images/competitiion_submission.max-800x800.png

This figure shows submission scores (individual points) during the first four weeks of the competition, compared to the AutoML Tables benchmark score posted at the start of the competition (green line). The dashed blue line represents the 90th percentile of daily submission scores. The AutoML Tables benchmark beats >90% of daily submissions for approximately the first two weeks of the competition.

The simplicity and efficacy of the tool brings promising potential for people with data science problems—but not necessarily deep data science backgrounds—to create powerful models.

How it works

Automated machine learning tools (AMLTs) have been on the market for several years and come in varying flavors, but they all generally look to automate the end-to-end process of training a machine learning model based on minimally preprocessed input data. Google Brain published their seminal paper on automated machine learning in 2016, and the exciting results from research, combined with the potential to make machine learning more accessible led Google Cloud to invest in making AutoML accessible, through its AI Platform

Cloud AutoML is now a suite of products that helps users build custom machine learning models for a diverse set of tasks on data ranging from vision to language to structured data. The exact use varies by each individual product, but all follow the general pattern of ingesting your data from their SDK or web UI, giving you a few knobs to adjust, and then outputting a trained model that can be deployed to GCP with one click. Today’s release focuses on enabling our community to use the SDK directly within Kaggle Notebooks.

Getting started with AutoML on Kaggle

Kaggle’s integration with AutoML follows in the footsteps of our prior work bringing BigQuery to Kaggle Notebooks.

To get started, simply link your GCP account and authorize access to the cloud services you’d like to use. Enabling Cloud Storage at the same time will make it easy for AutoML to access your data.

https://storage.googleapis.com/gweb-cloudblog-publish/images/python_unknown.max-700x700.png

https://storage.googleapis.com/gweb-cloudblog-publish/images/AutoML_for_Kaggle.max-900x900.png

Once you link your Google account, you’ll want to double-check that your cloud account is ready for you to start using AutoML. To do this, make sure you’ve enabled the ML APIs and billing for your GCP project. AutoML is a paid service, and free tier limits and charges vary by the individual product you are using. In order to make it more accessible to more Kagglers, we plan to offer GCP credits throughout the year to subsidize the costs of using the service, and all new Google accounts that sign up for GCP get a $300.

From there, you’re ready to get started!

You can now easily run AutoML using the built-in client SDKs within your Kaggle Notebook, or by using the web interface within the cloud console. To get started with AutoML in your Notebook, check out the documentation or one of our tutorials. To learn more about the topic of automated machine learning and how it can improve your data science workflow, take a look at our explainer video.

Keep up with the latest with Kaggle

We’re looking forward to seeing what you think about these new tools, and we’ll continue to invest in new ways to make our platform and machine learning more accessible. Follow Kaggle’s YouTube channel to keep up with the latest on what Kaggle’s doing, including an upcoming workshop on model selection, weekly live coding, and more.

Posted in