The Google Cloud Prediction API has been deprecated and will be shut down on April 30, 2018.

Smart Autofill Spreadsheets Add On

Smart Autofill is a Google Spreadsheets Add On that uses the Prediction API for performing Machine Learning directly in a Google Spreadsheet.

Click here to install and enable the Add On. After the add-on is installed, you should see a menu item under "Add-ons" called "Smart Autofill".

Examples

Estimating car prices

Suppose you want to price a used vehicle based on a few simple attributes: year, milage, number of doors and vehicle type (Car or Truck). Of course, accurate pricing will depend on many other variables as well, but we will use these four simple features to illustrate the use of Smart Autofill.

  • First, we gather some data on several used vehicles - note that we have the price for some of them, but for some of the vehicles the price is missing (these are the examples we want to fill in). Then we open the Smart Autofill sidebar - "Add-ons > Smart Autofill > Start".




  • Select the block of data that you wish to autofill. Note: select only rows that you wish to autofill, or rows that contain values in the column you wish to autofill - do not select the header row. Click "Next".




  • From the dropdown menu, select the column to autofill and click "Start".




  • Depending on the size of the data, it may take a few minutes to autofill.




  • After it is done, the empty cells will be filled in with predicted values. Smart Autofill also gives some idea about how accurate the predicted values are by reporting an Estimated accuracy or Average error. See the FAQ for more information on how to interpret these values.




  • Using SmartAutofill with text data

    Smart Autofill is designed to specially handle free-form text and can be used to help fill in missing numeric/categorical values.

    As an example, suppose you sent out a survey using Google Forms to a multi-lingual audience that could potentially respond with comments in either English, Spanish, or French. When someone responds to the Form, their answer will automatically be populated into a Google Sheets spreadsheet.

    Now, you wish to read some of the comments in the form, but suppose you only speak English. Instead of sifting through the responses looking for the ones that are in English, you can use Smart Autofill to simplify your task. If you would like to follow along this example, the fully labeled data is available here.

  • First you create an empty column that will store the output labels (in this case “English”, “Spanish”, or “French”), and hand-label the first several instances (in this example we label the first 100 instances).




  • Next, (assuming you have already installed Smart Autofill) open up the Smart Autofill sidebar by clicking on "Add-ons > Smart Autofill > Start". Once the sidebar is open, select the full data (2 columns, 406 rows) and click "Next".




  • Now, select the column to autofill (column A in the screenshot) and click "Start". Note that it may take a couple of minutes before SmartAutofill starts filling entries.




  • Once Smart Autofill is done, it will report an "Estimated accuracy" (see the FAQ for details) as well as the number of pre-labeled rows and filled rows (should be 100 and 306 in this case respectively). Also, keep in mind, the more rows that you label manually, the more accurate the predictions will be.




  • Although this example only has one filled column and one column with missing data, SmartAutofill can also handle the case where there is more than one filled column. Some of the filled columns can contain text and others can contain numeric or categorical data (like in the used vehicle example).

    There are many other text applications for Smart Autofill that you may think of, here are just a few more examples:

  • Labeling reviews of products with the overall sentiment (e.g. "positive", "neutral", "negative").
  • Automatically classify comments into categories (e.g. "feature_request", "bug_report", "positive_feedback", etc) or scoring them with a numeric value to indicate their urgency.
  • Collecting survey results in a spreadsheet that contains categorical responses as well as text and then using SmartAutofill to fill in missing values in one of the categorical or numeric columns.
  • We note that the larger the vocabulary of the text then the larger the number of non-missing examples you will generally need in order to get more accurate filled values. SmartAutofill may not be well suited for all data, but we hope you can use these examples as a guide to find many interesting use cases.

    FAQ

    1. What do all the fields in Autofill Info mean?
      • Estimated Accuracy / Average Error

        If the autofill column contains categorical values (ex. True or False) then Smart Autofill will report the Estimated accuracy and if the autofill column contains numeric values it will report the Average error.

        These metrics are computed by measuring how well Smart Autofill would do if it hypothetically were to fill in the cells with non-missing values by comparing predicted values with the actual values (Smart Autofill of course cannot measure the accuracy of the autofilled entries since we do not know what the missing values truly are in that case). In the categorical scenario, Estimated accuracy is the fraction of cells Smart Autofill gets right (100% is best), and in the numeric scenario, Average error is the average difference between the predicted value and the actual value (0 is best).

        Smart Autofill does not guarantee the accuracy of the autofilled entries. The Estimated accuracy and Average error metrics are only a guideline and may not be representative of the actual accuracy of the autofilled entries especially if (a) the number of non-missing entries is small and (b) if the rows with non-missing values have very different characteristics than the rows with missing values.

      • Number of rows filled

        The number of rows in the selection that were autofilled by SmartAutofill.

      • Number of labeled rows

        The number of rows in the selection that were already labeled and therefore were not modified by SmartAutofill.

      • Number of empty rows

        The number of rows in the selection that were completely empty and therefore skipped by SmartAutofill.

    2. What if I have missing values in any of the columns other than the autofilled column?
    3. Smart Autofill will work even if a few cells are not filled in the non-autofilled columns, however, the autofilled/predicted value for those rows may be less accurate.

    4. If one of my columns contains text, how should I separate the words?
    5. If the column value is just free form text, then it is ok to just leave it as regular text. However, if the column value represents a single entity, then you wil most likely get better performance if you replace spaces with underscores.

      Examples:

      Suppose you wish to add another column that represents a comment/review about the vehicle, then you should keep the text as is - ex. "This car is really fast!".

      Suppose you wish to add another column to the data that represents the city in which the vehicle is being sold, you should replace all the spaces with underscores - ex. "NEW_YORK_CITY".

    6. What should I do if I run into any other issues?
    7. If you run into any other issues, please email kdavydov@google.com.

    Monitor your resources on the go

    Get the Google Cloud Console app to help you manage your projects.

    Send feedback about...

    Prediction API Documentation