How ShareChat built scalable data-driven social media with Google Cloud
Co-founder and CTO at ShareChat
Editor’s note: Today’s guest post comes from Indian social media platform ShareChat. Here’s the story of how they improved performance, app development, and analytics for serving regional content to millions of users using Google Cloud.
How do you create a social network when your country has 22 major official languages and countless active regional dialects? At ShareChat, we serve more than 160 million monthly active users who share and view videos, images, GIFs, songs, and more in 15 different Indian languages. We also launched a short video platform in 2020, Moj, which already supports over 80 million monthly active users.
Connecting with people in the language they understand
As mobile data and smartphones have become more affordable in India, we noticed a large new segment of people, many in rural areas, being welcomed onto the internet. However, many of them didn’t speak English, and when it comes to accessing content and information—language plays a significant role. Instead of joining other social media sites where English reigned supreme, new internet users chose to join language or dialect-specific Whatsapp groups where they felt more comfortable instead.
So, we set out to build a platform where people can share their opinions, document their lives, and make new friends, all in their native language. ShareChat simplifies content and people discovery by using a personalized content newsfeed to deliver language-specific content to the right audience.
Given the high-intensity data and high volume of content and traffic, we rely heavily on IT infrastructure. On top of that, a large number of our users rely on 2G networks to post, like, view, or follow each other. Our platform needs to deliver great experiences to people who are spread out across the country and different networks without any reduction in performance.
The right cloud partner to support future growth
ShareChat was born in the cloud—we already knew how to scale systems to serve a large customer base with our existing cloud provider. But like many companies, we struggled with over-provisioning compute and storage to accommodate unpredictable traffic and avoid running out of storage. With demand rising for local language content and an increase in online interactions in response to the COVID-19 crisis, we realized that we would need a more efficient way to scale dynamically and allocate resources as needed.
Google Cloud was a natural choice for us. We wanted to partner with a technology-first company that would make it easy (and cost-effective) to manage a strong technology portfolio that would allow us to build whatever we wanted. Google is at the forefront of technology innovation and provided everything we needed to build, run, and manage our applications (including creating an efficient DevOps pipeline to fix and release new features quickly).
We had a few issues in mind at the start of discussions with the Google Cloud team, but over time, as we got information and support from them, we realized that these were the partners we wanted in our corner when it came time to tackle our most challenging problems. In the end, we decided to take our entire infrastructure to Google Cloud.
To support millions of users, we deploy and scale using Google Kubernetes Engine. While we analyze our data using a combination of managed data cloud services, such as Pub/Sub for data pipelines, BigQuery for analytics, Cloud Spanner for real-time app serving workloads, and Cloud Bigtable for less-indexed databases. We also rely on Cloud CDN to help us distribute high-quality and reliable content delivery at low latency to our users.
We now use just half the total core consumption of our legacy environment to run ShareChat’s existing workloads.
Google Cloud delivers better outcomes at every level
By moving to Google Cloud, we saw major benefits in several key areas:
Zero-downtime migration for users
At the time of migration, we had over 70 terabytes of data, consisting of 220 tables—some of which were up to 14 terabytes with nearly 50 billion rows. Due to our data’s interdependencies, moving services over one at a time wasn’t an option for us.
Even though we were migrating such large volumes of data, we didn’t want to impact any of our customers. Latency spikes for out-of-sync data might affect message delivery. For instance, if a message or notification was delayed, we didn’t want to risk a bad user experience causing someone to abandon ShareChat.
To prepare for the move, we ran a proof-of-concept cluster for over four months to test database performance in a real-world scenario for handling more than a million queries per second. Using an open-source API gateway, we replicated our legacy data environment into Google Cloud for performance testing and capacity analysis. As soon as we were confident Google Cloud could handle the same traffic as our previous cloud environment, we were ready to execute.
Using wrappers, we were able to migrate without having to change anything in our existing application code. The entire migration of 60 million users to Google Cloud took five hours—without any data loss or downtime. Today, ShareChat has grown to 160 million users, and Google Cloud continues to give us the support we need.
Scaling globally to meet unexpected demand
We rely on real-time data to drive everything on ShareChat by tracking everything that goes on in our app—from messages and new groups to content people like or who they follow. Our users create more than a million posts per day, so it’s critical that our systems can process massive amounts of data efficiently.
We chose to migrate to Spanner for its global consistency and secondary index. Unlike our legacy NoSQL database, we could scale without having to rethink existing tables or schema definitions and keep our data systems in sync across multiple locations. It’s also cost-effective for us—moving over 120 tables with 17 indexes into Cloud Spanner reduced our costs by 30%.
Spanner also replicates data seamlessly in multiple locations in real time, enabling us to retrieve documents if one region fails. For instance, when our traffic unexpectedly grew by 500% over just a few days, we were able to scale horizontally with zero lines of code change. We were also launching our Moj video app simultaneously, and we were able to move it to another region without a single issue.
Simplifying development and deployment
On average, we experience about 80,000 requests per second (RPS) –nearly 7 billion RPS per day. That means daily push notifications sent out to the entire user base about daily trending topics can often result in a spike of 130,000 RPS in just a few seconds.
Instead of over-provisioning, Google Kubernetes Engine (GKE) enables us to pre-scale for traffic spikes around scheduled events, such as holidays like Diwali, when millions of Indians send each other greetings.
Migrating to GKE has also enabled us to adopt more agile ways of work, such as automating deployment and saving time with writing scripts. Even though we were already using container-based solutions, they lacked transparency and coverage across the entire deployment funnel.
Kubernetes features, such as sidecar proxy, allows us to attach peripheral tasks like logging into the application without requiring us to make code changes. Kubernetes upgrades are managed by default, so we don’t have to worry about maintenance and stay focused on more valuable work. Clusters and nodes automatically upgrade to run the latest version, minimizing security risks and ensuring we always have access to the latest features.
Low latency and real-time ML predictions
Even though many of our users may be accessing ShareChat outside of metropolitan areas, it doesn’t mean they’re more patient if the app loads slowly or their messages are delayed. We strive to deliver a high-performance experience, regardless of where our users are.
We use Cloud CDN to cache data in five Google Cloud Point of Presence (PoP) locations at the edge in India, allowing us to bring content as close as possible to people and speeding up load time. Since moving to Cloud CDN, our cache hit ratio has improved from 90% to 98.5%—meaning our cache can handle 98.5% of content requests.
As we expand globally, we’d like to use machine learning to reach new people with content in different languages. We want to build new algorithms to process real-time datasets in regional languages and accurately predict what people want to see. Google Cloud gives us an infrastructure optimized to handle compute-intensive workloads that will be useful to us both now—and in the future.
The confidence to build the best platform
Our current system now performs better than before we migrated, but we are continuously building new features on top of it. Google’s data cloud has provided us with an elegant ecosystem of services that allows us to build whatever we want, more easily and faster than ever before.
Perhaps the biggest advantage of partnering with Google Cloud has been the connection we have with the engineers at Google. If we’re working to solve a specific problem statement and find a specific solution in a library or a piece of code, we have the ability to immediately connect with the team responsible for it.
As a result, we have experienced a massive boost in our confidence. We know that we can build a really good system because we not only have a good process in place to solve problems—we have the right support behind us.