Developers & Practitioners
Make informed decisions with Google Trends data
A few weeks ago, we launched a new dataset into Google Cloud’s public dataset program: Google Trends. If you’re not familiar with our datasets program, we host a variety of datasets in BigQuery and Cloud Storage for you to access and integrate into your analytics. Google pays for the storage of these datasets and provides public access to the data, e.g., via the bigquery-public-data project. You only pay for queries against the data. Plus, the first 1 TB per month is free! Even better, all of these public datasets will soon be accessible and shareable via Analytics Hub.
The Google Trends dataset represents the first time we’re adding Google-owned Search data into the program. The Trends data allows users to measure interest in a particular topic or search term across Google Search, from around the United States, down to the city-level. You can learn more about the dataset here, and check out the Looker dashboard here! These tables are super valuable in their own right, but when you blend them with other actionable data you can unlock whole new areas of opportunity for your team. You can view and run the queries we demonstrate here.
Focusing on areas that matter
Each day, the top 25 search terms are added to the top_terms table. Additionally, information about how that term has fluctuated over time for each region, Nielsen’s Designated Market Area® (DMA), is recorded with a score. A value of 100 is the peak popularity for the term. This regional information can offer further insight into trends for your organization.
Let’s say I have a BigQuery table that contains information about each one of my physical retail locations. Like we mentioned in our previous blog post, depending on how that data is brought into BigQuery we might enhance the base table by using the Google Maps Geocoding API to convert text-based addresses into lat-lon coordinates.
So now how do I join this data with the Google Trends data? This is where BigQuery GIS functions, plus the public boundaries dataset comes into play. Here I can use the DMA table to determine which DMA each store is in. From there I can simply join back onto the trends data using the DMA ID and focus on the top three terms for each store, which is based on terms with the highest score for that area within the past week. Note that DMA boundaries need to be licensed from Nielsen for use in analysis.
SELECT stores.id as store_id, array_agg(term ORDER BY score DESC LIMIT 3) as rising_terms
FROM `looker-private-demo.retail.us_stores` as stores
JOIN `looker-private-demo.boundaries.dma_boundaries` as dma
ON ST_WITHIN(ST_GEOGPOINT(stores.Longitude, stores.Latitude),dma.dma_geom)
JOIN `bigquery-public-data.google_trends.top_terms` as trends
ON dma.dma_id = CAST(trends.dma_id as STRING)
WHERE trends.refresh_date = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
AND stores.id IS NOT NULL
AND trends.week = DATE_TRUNC(CURRENT_DATE(), WEEK)
GROUP BY 1;
With this information, you can figure out what trends are most important to customers in the areas you care about, which can help you optimize marketing efforts, stock levels, and employee coverage. You may even want to compare across your stores to see how similar term interest is, which may offer new insight into localized product development.
Filtering for relevant search terms
Search terms are constantly changing and it might not be practical for your team to dig into each and every one. Instead, you might want to focus your analysis on terms that are relevant to you. Let's imagine that you have a table that contains all your product names. These names can be long and may contain lots of words or phrases that aren’t necessary for this analysis. For example:
“10oz Authentic Ham and Sausages from Spain”
Like most text problems, you should probably start with some preprocessing. Here, we’re using a simple user-defined function that converts the string to lowercase, tokenizes it, and removes words with numbers, and stop words or adjectives that we’ve hard-coded.
CREATE TEMP FUNCTION pre_process(text STRING) as (
FROM UNNEST(SPLIT(LOWER(REGEXP_REPLACE(text, r'[\.\",*:()\[\]/|\n]', ' ')), ' ')) as word
WHERE word NOT IN UNNEST(["a", "an", "the", "and", "but", "if", "or", "as", "of", "at", "by", "for", "with",
"to", "from", "nice","yummy","authentic","tasty","gooey","delicious","fresh"])
AND NOT REGEXP_CONTAINS(word, "[0-9]"))
WITH product as (SELECT "10oz Authentic Ham and Sausages from Spain" as product_name)
SELECT pre_process(product_name) as name
For a more robust solution, you might want to leverage a natural language processing package, for example NLTK in Python. You can even process words to use only the stem or find some synonyms to include in your search. Next, you can join the products table onto the trends data, selecting search terms that contain one of the words from the product name.
,UNNEST(pre_process(product_name)) as processed_name
,(SELECT DISTINCT term FROM `bigquery-public-data.google_trends.top_rising_terms`) as terms
WHERE processed_name in UNNEST(SPLIT(terms.term,' '))
It looks like `Spain vs Croatia` was recently trending because of the Euro Cup. This might be a great opportunity to create a new campaign and capitalize on momentum: “Spain beat Croatia and is on to the next round, show your support by celebrating with some authentic Spanish ham!”
Now going a bit further, if we take a look at the top rising search terms from yesterday (as of writing this on 6/30), we can see that there are a lot of names for people. But it’s unclear who these people are or why they’re trending. What we do know is we’re looking for a singer to strike up a brand deal with. More specifically, we have a great new jingle for our authentic ham and we’re looking for some trendy singers to bring attention to our company.
Using the Wikipedia Open API you can perform an open search for the term, for example “Jamie Lynn Spears”:
This gives you a JSON response that contains the name of first wikipedia page returned in the search, which you can then use to perform a query against the API:
From here you can grab the first sentence on the page (hint: this usually tells us if the person in question is a singer or not): “Jamie Lynn Marie Spears (born April 4, 1991) is an American actress and singer.”
Putting this together, we might create a Google Cloud function that selects new BigQuery search terms from the table, calls the wikipedia API for each of them, grabs that first sentence and searches for the word “singer.” If we have a hit, then we simply add the search term to the table. Check out some sample code here! Not only does this help us keep track of who the most trendy singers are, but we can use the historical scores to see how their influence has changed over time.
These queries, plus many more, can be used to make various business decisions. Aside from looking at product names, you might want to keep tabs on competitor names so that you can begin a competitive analysis against rising challengers in your industry. Or maybe you’re interested in a brand deal with a sports player instead of a singer, so you want to make sure you’re aware of any rising stars in the athletic world. Either way you probably want to be notified when new trends might influence your decision making.
With another Google Cloud Function, you can programmatically run any interesting SQL queries and return the results in an email. With Cloud Scheduler, you can make sure the function runs each morning, so you stay alert as new trends data is added to the public dataset. Check out the details on how to implement this solution here.
Ready to get started?
You can explore the new Google Trends dataset in your own project, or if you’re new to BigQuery spin up a project using the BigQuery sandbox. The trends data, along with all the other Google Cloud Public Datasets, will be available in Analytics Hub - so make sure to sign up for the preview, which is scheduled to be available in the third quarter of 2021, by going to cloud.google.com/analytics-hub.