ThinkData Works: Solving the data variety problem with fast, scalable processing

About ThinkData Works

ThinkData Works is the Toronto-based creator of Namara, a data management platform that enables businesses to access, manage, enhance, and integrate data to develop new products and gain insight. Built for data professionals, Namara is a data refinery for the enterprise, designed to let organizations and individuals access high-value data in standard formats.

Industries: Technology
Location: Canada

Seeking a modern cloud solution for its public-facing open data platform and new customer deployments, ThinkData Works moved to Google Cloud Platform to optimize performance, costs, and manageability.

Google Cloud Results

  • Gives customers faster, more reliable access to more than 250,000 datasets
  • Reduces hosting costs by 50%, directing more resources to product development
  • Enables 10x faster networking and infrastructure changes for greater agility
  • Simplifies deployments and data warehousing for customers, scaling to meet their needs
  • Improves security, collaboration, resource management, and billing

Up to 3x savings in DevOps resources

Data variety sounds like a good thing, but for most data science teams, having too much data from too many sources can be a substantial barrier to results. Despite the widespread availability of open datasets from government institutions, many organizations are still challenged to source, cleanse, and prepare the data so that it is ready and usable for analysis. Many data scientists spend as much as 80 percent of their time finding and prepping data, leaving only one day a week for actual data science.

"Google Cloud Platform just felt like a modern cloud. Google also gave us a lot of valuable support, allowing us to get up and running quickly without consultants. It was such a great experience that we decided to deploy all our customers on Google Cloud Platform."

Brendan Stennett, CTO, ThinkData Works

ThinkData Works (ThinkData) is helping to solve this problem by offering solutions that make open data, third-party data, and internal datasets accessible and available to those who need it. It's one of the first companies in Canada to unlock the nation's data for some of its biggest corporations, and serves companies in the United States and 45 other countries.

ThinkData's public-facing data platform, called Namara, provides easy access to more than 250,000 open datasets from thousands of sources through one common API, as well as premium datasets from partner organizations such as Geotab. ThinkData also offers data management solutions that enable customers to search and standardize their own internal data for faster insights.

As a startup, ThinkData wanted to host Namara on a cost-effective public cloud platform that would give the company more time for development and innovation versus operational tasks. It also wanted to give enterprise customers an easily deployable solution that could scale with their growing data needs. After trying various cloud and managed hosting providers, ThinkData began looking for a modern cloud that could offer greater agility, more personal interactions, and a better selection of managed services. To that end, ThinkData joined the Google Cloud for Startups program, receiving mentorship, training, and $100,000 in Google Cloud Platform (GCP) credits.

"Google Cloud Platform just felt like a modern cloud," says Brendan Stennett, Chief Technology Officer at ThinkData Works. "Google also gave us a lot of valuable support, allowing us to get up and running quickly without consultants. It was such a great experience that we decided to deploy all our customers on Google Cloud Platform."

"One of the most attractive aspects of Google Cloud is the rich variety of managed services for Kubernetes, databases, and data processing. Suddenly, managing infrastructure wasn't a full-time job anymore. We could do a lot more with a lot less people, saving time and money."

Brendan Stennett, CTO, ThinkData Works

Working smarter with managed services

ThinkData began by moving its stateless, container-based services to Google Kubernetes Engine (GKE), a reliable managed service for Kubernetes. Database virtual machines run on Compute Engine, using Cloud Storage for object storage. ThinkData's fast-growing data pipeline is built on Cloud Dataproc, which automatically handles Hadoop cluster creation, management, monitoring, and job orchestration. Soon, ThinkData plans to retire its Compose databases in favor of PostgreSQL databases running on Cloud SQL, another GCP managed service, to simplify scaling and make sure that all 250,000 data sets are always up to date.

"One of the most attractive aspects of Google Cloud is the rich variety of managed services for Kubernetes, databases, and data processing," says Brendan. "Suddenly, managing infrastructure wasn't a full-time job anymore. We could do a lot more with a lot less people, saving time and money."

With ThinkData's previous hosting provider, setting up a production-ready Kubernetes deployment took two weeks. With GKE, it now takes less than two hours. Making changes to firewalls and other GCP infrastructure components is likewise streamlined due to software-defined networking and workflows.

"Everything is just easier on Google Cloud," says Brendan. "We can make changes 10x faster and move on. The way Google Cloud Platform resources are organized and isolated into projects is also really convenient, improving resource management and security while simplifying our billing."

Giving customers more options

ThinkData is constantly updating its datasets to give customers the most current information available. Data is either ingested directly into Cloud Dataproc or into Vertica Analytics Platform (deployed on GCP Marketplace), giving customers fast, reliable access to the data they need. ThinkData also plans to make its own solution available on GCP Marketplace to give customers an even faster deployment option.

"Deploying on GCP Marketplace will put our product into the hands of users faster than ever before, fully deployed and integrated into their existing cloud infrastructure," says Brendan.

"Since moving to Google Cloud, we reduced our hosting costs by 50 percent, allowing us to direct more of our capital toward product development…If we hadn't switched to Google Cloud Platform when we did, we wouldn't be nearly as far along."

Brendan Stennett, CTO, ThinkData Works

Soon, ThinkData will give customers the option of using BigQuery as a cloud-based data warehouse, offering them a convenient, pay-per-use model to quickly analyze large datasets.

"BigQuery will streamline data warehousing for certain customers while giving them high performance and elastic scalability," says Brendan. "It's another good example of how Google Cloud simplifies just about everything we do."

Bringing it all together

Since the company was founded, ThinkData has used G Suite to keep employees connected: using Gmail and Calendar to schedule Hangouts Meet video calls, collaborating in real time on Docs, Sheets, and Slides, and sharing files on Drive.

"G Suite brings everything together for us," says Brendan. "Nothing else compares. Plus, we get the added benefit of not managing separate licenses for other pieces of software that Google already has a better solution for."

More Dev, less Ops

In addition to giving ThinkData greater operational agility and the ability to offer customers new service options, moving to GCP brought financial benefits through consolidation and optimization.

"Since moving to Google Cloud, we reduced our hosting costs by 50 percent, allowing us to direct more of our capital towards product development," says Brendan. "We're also saving 2x to 3x in DevOps resources. If we hadn't switched to Google Cloud Platform when we did, we wouldn't be nearly as far along."

About ThinkData Works

ThinkData Works is the Toronto-based creator of Namara, a data management platform that enables businesses to access, manage, enhance, and integrate data to develop new products and gain insight. Built for data professionals, Namara is a data refinery for the enterprise, designed to let organizations and individuals access high-value data in standard formats.

Industries: Technology
Location: Canada
Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE
Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE