Data Analytics

Twitter: gaining insights from Tweets with an API for Google Cloud

twitter api toolkit.jpg

Editor’s note: Although Twitter has long been considered a treasure trove of data, the task of analyzing Tweets in order to understand what’s happening in the world, what people are talking about right now, and how this information can support business use cases has historically been highly technical and time-consuming. Not anymore. Twitter recently launched an API toolkit for Google Cloud which helps developers to harness insights from Tweets, at scale, within minutes. This blog is based on a conversation with the Twitter team who’ve made this possible. The authors would like to thank Prasanna Selvaraj and Nikki Golding from Twitter for contributions to this blog. 


Businesses and brands consistently monitor Twitter for a variety of reasons: from tracking the latest consumer trends and analyzing competitors, to staying ahead of breaking news and responding to customer service requests. With 229 million monetizable daily active users, it’s no wonder companies, small and large, consider Twitter a treasure trove of data with huge potential to support business intelligence. 

But language is complex, and the journey towards transforming social media conversations into insightful data involves first processing large amounts of Tweets by ways of organizing, sorting, and filtering them. Crucial to this process are Twitter APIs: a set of programmatic endpoints that allow developers to find, retrieve, and engage with real-time public conversations happening on the platform. 

In this blog, we learn from the Twitter Developer Platform Solutions Architecture team about the Twitter API toolkit for Google Cloud, a new framework for quickly ingesting, processing, and analyzing high volumes of Tweets to help developers harness the power of Twitter. 

Making it easier for developers to surface valuable insights from Tweets 

Two versions of the toolkit are currently available: The Twitter API Toolkit for Google Cloud Filtered Stream and the Twitter API Toolkit for Google Cloud Recent Search.

The Twitter API for Google Cloud for Filtered Stream supports developers with a trend detection framework that can be installed on Google Cloud in 60 minutes or less. It automates the data pipeline process to ingest Tweets into Google Cloud, and offers visualization of trends in an easy-to-use dashboard that illustrates real-time trends for configured rules as they unfold on Twitter. This tool can be used to detect macro- and micro-level trends across domains and industry verticals, and can horizontally scale and process millions of Tweets per day. 

“Detecting trends from Twitter requires listening to real-time Twitter APIs and processing Tweets on the fly,” explains Prasanna Selvaraj, Solutions Architect at Twitter and author of this toolkit. “And while trend detection can be complex work, in order to categorize trends, tweet themes and topics must also be identified. This is another complex endeavor as it involves integrating with NER (Named Entity Recognition) and/or NLP (Natural Language Processing) services. This toolkit helps solve these challenges.”

Meanwhile, the Twitter API for Google Cloud Recent Search returns Tweets from the last seven days that match a specific search query. “Anyone with 30 minutes to spare can learn the basics about this Twitter API and, as a side benefit, also learn about Google Cloud Analytics and the foundations of data science,” says Prasanna. 

The toolkits leverage Twitter’s new API v2 (Recent Search & Filtered Stream) and use BigQuery for tweet storage, Data Studio for business intelligence and visualizations, and App Engine for data pipeline on the Google Cloud Platform. 

“We needed a solution that is not only serverless but also can support multi-cardinality, because all Twitter APIs that return Tweets provide data encoded using JavaScript Object Notation (JSON). This has a complex structure, and we needed a database that can easily translate it into its own schema. BigQuery is the perfect solution for this,” says Prasanna. “Once in BigQuery, one can visualize that data in under 10 minutes with Data Studio, be it in a graphic, spreadsheet, or Tableau form. This eliminates friction in Twitter data API consumption and significantly improves the developer experience.” 

Accelerating time to value from 60 hours to 60 minutes

Historically, Twitter API developers have often grappled with processing, analyzing, and visualizing a higher volume of Tweets to derive insights from Twitter data. They’ve had to build data pipelines, select storage solutions, and choose analytics and visualization tools as the first step before they can start validating the value of Twitter data. 

“The whole process of choosing technologies and building data pipelines to look for insights that can support a business use case can take more than 60 hours of a developer’s time,” explains Prasanna. “And after investing that time in setting up the stack they still need to sort through the data to see if what they are looking for actually exists.”

Now, the toolkit enables data processing automation at the click of a button because it provisions the underlying infrastructure it needs to work, such as BigQuery as a database and the compute layer with App Engine. This enables developers to install, configure, and visualize Tweets in a business intelligence tool using Data Studio in less than 60 minutes.

“While we have partners who are very well equipped to connect, consume, store, and analyze data, we also collaborate with developers from organizations who don’t have a myriad of resources to work with. This toolkit is aimed at helping them to rapidly prototype and realize value from Tweets before making a commitment,” explains Nikki Golding, Head of Solutions Architecture at Twitter.

Continuing to build what’s next for developers

As they collaborated with Google Cloud to bring the toolkit to life, the Twitter team started to think about what public datasets exist within the Google Cloud Platform and how they can complement some of the topics that Twitter has a lot of conversations about, from crypto to weather. “We thought, what are some interesting ways developers can access and leverage what both platforms have to offer?” shares Nikki. “Twitter data on its own has high value, but there’s also data that is resident in Google Cloud Platform that can further support users of the toolkit. The combination of Google Cloud Platform infrastructure and application as a service with Twitter’s data as a service is the vision we’re marching towards.”

Next, the Twitter team aims to place these data analytics tools in the hands of any decision-maker, both in technical and non-technical teams. “To help brands visualize, slice, and dice data on their own, we’re looking at self-serve tools that are tailored to the non-technical person to democratize the value of data across organizations,” explains Nikki. “Google Cloud was the platform that allowed us to build the easiest low-code solution relative to others in the market so far, so our aim is to continue collaborating with Google Cloud to eventually launch a no-code solution that helps people to find the content and information they need without depending on developers. Watch this space!”