ShareThis Drives Insights for Publishers and Advertisers with Google BigQuery
ShareThis makes social data actionable. Advertisers work with ShareThis to reach relevant audiences with highly targeted messaging across mobile and web. Publishers benefit because ShareThis drives traffic and revenue to their sites, while advertisers benefit not only through targeted placement of their marketing messages, but also through insight into customers’ interests across the Internet. Both publishers and advertisers can use the ShareThis platform to understand customers and engage them more effectively.
The company gathers about a billion social data points each day, adding the information to its database and making it available for analysis using its Insights data analysis portal. Through Insights, ShareThis adds a layer of intelligence that helps marketers understand their customers, spot patterns and plan campaigns.
The company has been gathering social data since 2007 – collecting about two terabytes per day. Because of the large amount of information stored, queries by publishers and advertisers across that aggregate data were too slow, sometimes taking two or three days to code and execute – and some types of analysis couldn’t be run at all. As the amount of data gathered by ShareThis increased, the company needed to rewrite its Insights portal to be faster and more efficient.
A separate challenge for the 90-person company was to lower the operational cost of the back-end services driving Insights, which were running on four separate Hadoop map/reduce clusters hosted by Amazon Web Services, with more than 160 servers in those clusters. Depending on workload, three to six engineers were focused on maintaining those clusters. As it gained both publishers and advertisers, ShareThis could only see the size, complexity, and cost of its Hadoop clusters growing – and meanwhile, the existing systems were falling behind in handling customer queries.
ShareThis chose to migrate to Google Cloud Storage to host its data, and to Google BigQuery to power the analytics for its Insights software, which publishers and advertisers use to understand their customers. “We went through an architectural process where we investigated different querying engines, including Amazon Redshift and Amazon Spark,” explained Ishika Paul, engineering lead at ShareThis. “When considering cost, responsiveness and functionality, BigQuery was the best choice. Insight analysis took 20+ machines using alternate solutions, but worked flawlessly on BigQuery.”
“We are no longer concerned about data growing, because Google handles the scalability,” she continued, adding that the ShareThis social data uses a complex nested object structure, and it was critical that the new solution be able to accommodate that structure: “We found that BigQuery handles that structure quite easily without flattening the data.”
Not only does BigQuery handle the data, but the new Insights platform has significantly more capacity. “Queries can run concurrently because it’s a managed service,” Ishika said.
The rewritten Insights software, based on Google Cloud Platform, is expected to save ShareThis 50-60 percent in operational costs over the version hosted by Amazon Web Services. Part of the savings can be attributed to lower fixed costs of the service compared to AWS. Another cause: The staff previously maintaining the ShareThis map/reduce clusters in AWS are now assigned to more strategic tasks that really push their business forward, such as software development.
The BigQuery-powered Insights can run more customer queries faster, as simple requests that took minutes now take seconds, and very complex queries that took days now take only minutes. “Because of the SQL-like interface of BigQuery, we can also enable non-technical users to get data themselves,” said Ishika. “Before, we’d have to populate a custom cluster based on date ranges, for example, but now marketing analysts can get the information they need without engineering intervention.”
The new Insights software helped ShareThis during its own customer acquisition and onboarding process – such as being able to run real-time queries on a laptop during a sales meeting. With the old platform, setting up some of those queries might have taken days. With the new platform, it takes seconds.
Previously, ShareThis required several software development cycles to provide live-data examples of what customers would gain from Insights. Now, the company’s on-staff analysts can build BigQuery-based reports in a couple of hours and show those reports to new customers. If ShareThis can also take the same query the analyst ran and productize it.
With Google Cloud Platform, Paul said, “We’re not simply building products faster. We’re building the right products faster.”