Compute

QuillBot cuts writing time for over 10 million users using Google Cloud

September 30, 2021

Chelsie Czop

Product Manager, Google Cloud Compute

David Silin

Co-founder and Chief Science Officer at QuillBot

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

QuillBot, a Chicago-based company founded in 2017, is a natural language processing (NLP) company recently acquired by coursehero.com. QuillBot’s platform of tools leverages state-of-the-art NLP to assist users with paraphrasing, summarization, grammar checking, and citation generation. With over ten million monthly active users around the world, QuillBot continues to improve writers’ efficiency and productivity on a global scale.

QuillBot's differentiation is its broad range of writing services compared to its competitors. Besides grammar and spelling checks, QuillBot offers additional services such as paraphrasing text while retaining its meaning, with modifications such as changing the tone from formal to informal or making the text longer or shorter. In addition, a panoply of synergistic tools provides the value of completing workflows end-to-end.

Google Cloud has been the infrastructure partner for QuillBot from the start. Their business interests converged as QuillBot aspires to constantly push the envelope of state-of-art natural language processing and make computing demands that test hardware's limits. As a result, it was drawn to Google's roadmap for the future development of hardware to support larger AI models.

SADA, a Google Cloud premier partner and 2020 North America Partner of the Year, provided key insights to QuillBot on Google Cloud’s advanced GPU instance types and how to optimize their complex machine learning (ML) workload. QuillBot also leverages SADA's consulting and Technical Account Management services to roadmap new releases and effectively scale for growth.

Preparing for huge growth in capacity without huge increase in costs

QuillBot experiments with new artificial intelligence models that scale exponentially; traffic spikes, in an instant, potentially choking the infrastructure. It would have been overly expensive to build spare capacity in-house or even purchase on-demand computing capacity for peak demand.

Instead, QuillBot needed the flexibility to deploy and pay for infrastructure proportionate to usage without changing purchase plans mid-course. "From an economics perspective, we needed our cloud computing partner to have the hardware capacity to scale as much as 100X and remain stable without a proportionate increase in costs," Rohan Gupta, CEO and Co-founder of QuillBot, stated.

QuillBot's entrepreneurial staff needed to make ML easy to understand and execute. At the time, they were not using Terraform, nor did they have a DevOps professional to simplify and automate model development and testing processes. "Our priority was to keep the approach simple and avoid downtime when we upgraded our models to ensure a seamless deployment. Our past efforts to migrate, including our multi-cloud deployment as an example, were fraught with the risk of a painful transition," David Silin, Co-founder and Chief Science Officer at QuillBot, revealed.

Google Cloud Compute Engine solutions support scalability

Google Cloud uses the latest hardware to scale to millions of users and meet the computing needs of the state-of-art AI models that QuillBot builds and deploys. In addition, it has sufficient redundant capacity to distribute computing loads when traffic spikes.

"Google Cloud’s user interface blew us away. It was unbelievably superior to other cloud providers. We used virtual machine (VM) instance groups and easily distributed load across them with Pub/Sub," David Silin gushed.

QuillBot uses Google Cloud's A2 VMs with NVIDIA A100 Tensor Core GPUs for training models and inference and N1 VMs with NVIDIA T4 Tensor Core GPUs on Google Cloud for serving. With A2 VMs, Google remains the only public cloud provider to offer up to 16 NVIDIA A100 GPUs in a single VM , making it possible to train the largest AI models used for high-performance computing. In addition, users can start with one NVIDIA A100 GPU and scale to 16 GPUs without configuring multiple VMs for a single-node ML training. Effective performance of up to 10 petaflops of FP16 or 20 petaOps of int8 in a single VM, when using the NVIDIA A100 GPUs with the sparsity feature. Seamless scaling becomes possible with containerized, pre-configured software to shorten the lead time for running on Compute Engine A100 instances.

Google Cloud also provides a choice of N1 VMs with NVIDIA T4 Tensor Cores, with varying sizing and pricing plans, to help control the cost. NVIDIA T4 GPUs have advanced networking with up to 100 Gbps. In addition, T4 GPUs have a worldwide footprint, and users can choose capacities in individual regions based on their market size. As a result, they have the flexibility to serve demand incrementally as it grows, with smaller and cheaper GPUs and install more than one in areas where T4 GPUs are available in proximity for stability while keeping latencies low.

QuillBot implements best practices when rolling out GPUs

Users need to consider the trade-off between going directly to 16 NVIDIA GPUs or starting small and growing incrementally. "For the bigger models, it makes sense to go straight to 16. To be sure, it is not always easy to figure out how to optimize for that level of scaling," David Silin cautioned. "We experimented and learned that 16 works best for our core models.”

Similarly, Silin noted, "Serving and distributing preemptible VMs across regions and in production was not something we did immediately." QuillBot leverages preemptible VMs primarily for their unit economics. Given they are preemptible and subject to being shut down in a given region if capacity is full, distributing them across regions allows the company to diversify and prevent all preemptibles from going down at once.

Silin and team have been able to use Kubernetes Engine to manage their NVIDIA GPUs on Google Cloud for model training and serving. This lightens the load of managing their platform, gives time back to their engineers and helps recognize cost savings from gained efficiencies.

Scaling with Google Cloud is easy and saves on costs

QuillBot found the trade-offs of scaling with Google Cloud to be favorable, with their downsides less costly compared to the upside of the benefits. "With Google Cloud, we can scale 4X and maintain the customer experience knowing that enough spare capacity is available without increasing costs disproportionately. As a result, we are comfortable trying larger models," Rohan Gupta surmised.

Over-provisioning capacity does not increase unit costs because of an integrated training and deployment stack on the Google Cloud. Additionally, the time-to-market is shorter.

"We can scale with ease with Google Cloud because the unit cost increase of GPU is lower than the speed of scaling as they increase from one to sixteen. The gains in the rate of training are 3X faster compared with only 2X higher unit cost," David Silin reported. "We grabbed NVIDIA A100 GPUs with 40 Gigabytes of memory as soon as we could, and we can't wait for what's next for Google Cloud GPU offering," he added.

The twin benefits of scaling with relatively low downsides have proved to be an overwhelming advantage for QuillBot over its competitors. As a result, QuillBot has experienced a hockey-stick pattern of growth which it expects to maintain in the future. "We could afford freemium services to acquire customers very rapidly, because our unit costs are low. Both the A2 VM family and NVIDIA T4 GPUs on Google Cloud contribute to our business growth. A2 VMs enable us to build state-of-the-art technology in-house," Rohan Gupta explained.

QuillBot looks ahead to super-scaling

With the success that QuillBot experienced so far using Google Cloud, the company is planning its future growth on Google's competitive hardware. "Provisioning our clusters and scaling them is a big priority over the next three months; we will calibrate based on the traffic on our site," Rohan Gupta revealed.

"Our efficiencies will improve because we have a DevOps person on board. We expect cost savings from predictive auto-scaling implemented through Managed Instance Groups. We are also encouraged by our tests of Google Cloud against competitors that show it has better cold start times than other clouds we tested—a key consideration for super-scaling," David Silin said.

Posted in

Cost Management

Simpler billing, clearer savings: A FinOps guide to updated spend-based CUDs

By Alfonso Hernandez • 5-minute read

Serverless

High-performance inference meets serverless compute with NVIDIA RTX PRO 6000 on Cloud Run

By James Ma • 3-minute read

Compute

Unlock 2x better price-performance with Axion-based N4A VMs, now generally available

By Nate Baum • 6-minute read

Compute

Scaling WideEP Mixture-of-Experts inference with Google Cloud A4X (GB200) and NVIDIA Dynamo

By Sean Horgan • 9-minute read

QuillBot cuts writing time for over 10 million users using Google Cloud

Chelsie Czop

David Silin

Try Google Cloud

Preparing for huge growth in capacity without huge increase in costs

Google Cloud Compute Engine solutions support scalability

QuillBot implements best practices when rolling out GPUs

Scaling with Google Cloud is easy and saves on costs

QuillBot looks ahead to super-scaling

Related articles

Simpler billing, clearer savings: A FinOps guide to updated spend-based CUDs

High-performance inference meets serverless compute with NVIDIA RTX PRO 6000 on Cloud Run

Unlock 2x better price-performance with Axion-based N4A VMs, now generally available

Scaling WideEP Mixture-of-Experts inference with Google Cloud A4X (GB200) and NVIDIA Dynamo