Bringing Google AutoML to 3.5 million data scientists on Kaggle
Recently, Kaggle hit a significant milestone by surpassing over 3.5 million users that use our platform to learn and apply machine learning. AI is one of the world’s most powerful emerging technologies, but even with its growing numbers, its adoption has been hampered by the limited amount of data scientists who have access to the tools and expertise to leverage it effectively. Kaggle’s mission is to empower our community of data scientists by providing them with the skills and tools they need to lead in their field, and now we’re advancing that mission by integrating AutoML into our platform.
Why we’re excited about AutoMLAutoML stepped into our spotlight earlier this year, where it led for most of our all-day machine learning competition at Kaggle Days at Cloud Next ’19, before being narrowly edged out by a team of data scientists in the closing moments of the event. The strong performance even made headlines and generated excitement for its future.
What especially drew our interest was that the team using AutoML was able to get strong performing results quickly, with low effort and no domain expertise or supervision. What’s more, they spent very little time on data prep, and virtually no time on feature engineering, model selection, and hyperparameter tuning. The time efficiency of AutoML became even more clear during the IEEE competition, where it took thousands of teams several weeks to beat the AutoML benchmark by a significant margin on our private leaderboard.
The simplicity and efficacy of the tool brings promising potential for people with data science problems—but not necessarily deep data science backgrounds—to create powerful models.
How it worksAutomated machine learning tools (AMLTs) have been on the market for several years and come in varying flavors, but they all generally look to automate the end-to-end process of training a machine learning model based on minimally preprocessed input data. Google Brain published their seminal paper on automated machine learning in 2016, and the exciting results from research, combined with the potential to make machine learning more accessible led Google Cloud to invest in making AutoML accessible, through its AI Platform
Cloud AutoML is now a suite of products that helps users build custom machine learning models for a diverse set of tasks on data ranging from vision to language to structured data. The exact use varies by each individual product, but all follow the general pattern of ingesting your data from their SDK or web UI, giving you a few knobs to adjust, and then outputting a trained model that can be deployed to GCP with one click. Today’s release focuses on enabling our community to use the SDK directly within Kaggle Notebooks.
Getting started with AutoML on KaggleKaggle’s integration with AutoML follows in the footsteps of our prior work bringing BigQuery to Kaggle Notebooks.
To get started, simply link your GCP account and authorize access to the cloud services you’d like to use. Enabling Cloud Storage at the same time will make it easy for AutoML to access your data.
Once you link your Google account, you’ll want to double-check that your cloud account is ready for you to start using AutoML. To do this, make sure you’ve enabled the ML APIs and billing for your GCP project. AutoML is a paid service, and free tier limits and charges vary by the individual product you are using. In order to make it more accessible to more Kagglers, we plan to offer GCP credits throughout the year to subsidize the costs of using the service, and all new Google accounts that sign up for GCP get a $300.