Tokopedia: Scaling to accommodate major shopping events with Google Kubernetes Engine

About Tokopedia

Tokopedia, one of Indonesia's leading tech companies with one of the country's largest marketplace platforms, serves 5.9 million merchants and 90 million active users every month. With 5% ecommerce penetration in Indonesia, there are still major opportunities and exciting challenges ahead for Tokopedia to further democratize commerce through technology for the 260 million people of Indonesia.

Industries: Retail & Consumer Goods
Location: Indonesia

Tokopedia modernized its Live Engagement Platform for an optimal retail experience during large-scale shopping events.

Google Cloud results

  • Enables Tokopedia Play to handle a 20x increase in traffic thanks to Google Cloud
  • Supports growth by optimally scaling up to 30x with automated provisioning
  • Reduces operating costs by 90% by migrating to Google Cloud

How do you prepare for an epic online shopping event that rivals Black Friday in the U.S. and Singles’ Day in China? According to Tahir Hashmi, Vice President of Engineering at Tokopedia, a successful delivery depends on tech support and a reliable network.

Capable of serving up to 1.5M concurrent Live Platform users on Google Cloud

In May 2018, Tokopedia launched Ramadan Ekstra, the first-ever online shopping festival in Indonesia. The event attracted over 332 million visits to the Tokopedia platform during the Muslim holy month of Ramadan. Ramadan Ekstra was so successful that the transactions from May 25 alone equaled the total transactions from Tokopedia’s first five years of operations. On top of that, Tokopedia welcomed 73 million visitors to its platform during that month.

Although high-profile online events like the Ramadan Ekstra campaign help Tokopedia acquire many new users at once, they require careful planning to minimize disruptions. Any glitch in Tokopedia’s network can affect millions of users and result in complaints from sellers and online shoppers as well as negative publicity. According to a survey by Unbounce Research in 2018, nearly 70% of consumers admit that page load speed influences their willingness to buy from an online retailer.

“We benefited from the knowledge of Google Cloud engineers who have the experience of running large-scale events. If we had to roll out the project with our limited resources, we would have to read a lot of documentation, run many experiments, and perhaps still end up in blind alleys.”

Tahir Hashmi, Vice President of Engineering, Tokopedia

To cope with the increase in website visitors and transactions, Tokopedia turned to Google Cloud Platform to deliver uninterrupted service to shoppers and merchants alike.

“Internally, we did a lot of prep work, with help from Google Cloud, to make sure that our system could handle peak demand without slowing down performance,” says Tahir Hashmi, Vice President of Engineering at Tokopedia. “It’s critical that we offer a frictionless shopping experience to turn new customers into return shoppers.”

“We benefited from the knowledge of Google Cloud engineers who have the experience of running large-scale events,” says Tahir. “We were able to roll out the new technology faster, and with more confidence than we would have if we were doing it without their support.”

For big events and promotions, Tokopedia runs the overall design by the Google Cloud team to see if it fits the Google Cloud infrastructure. In the preparation stage, Tokopedia sets up load and performance testing to simulate large-scale traffic on the application. This exercise gives the team plenty of time to uncover and resolve bottlenecks. Before executing the event, Tokopedia coordinates with Google Cloud to freeze changes during the promotion timeframe so network performance isn’t affected by software updates or bug fixes.

Minimizing downtime with autoscaling on Google Kubernetes Engine

According to a recent study by McKinsey, electronic retailing, or “e-tailing” revenue in Indonesia is expected to grow from $5 billion in 2017 to $40 billion in 2022, driven by tech-savvy customers who are willing to pay for convenience.

“Our mission at Tokopedia is to democratize commerce through technology. We want to transform lives by reducing distances between merchants and consumers in this vast country we call home,” says Tahir. “Running our ecommerce platform on Google Kubernetes Engine (GKE) helps us to improve user experience and keeps shoppers coming back.”

“We used to experience partial downtime after adding a new VM that wasn’t configured correctly. Such headaches have been pretty much eliminated since moving to application clusters on Google Kubernetes Engine.”

Tahir Hashmi, Vice President of Engineering, Tokopedia

Before moving to Google Cloud, Tokopedia experienced issues with scalability and reliability with its previous service provider. One major challenge was that Tokopedia’s largest scale interactive product Tokopedia Play could only support 55,000 concurrent users. The application was rebuilt as a microservice on GKE in five weeks and is now able to support 1.5 million concurrent users. Tokopedia manages and secures the microservices with Istio service mesh and configures global load balancing on GKE for resiliency.

“Unlike our previous VM-based environment, adding and removing compute capacity is extremely reliable on Kubernetes,” says Tahir. “We had to put in a lot of effort to avoid partial downtime after adding or removing VMs due to complicated configuration changes. Such headaches have been pretty much eliminated since moving to application clusters on Google Kubernetes Engine.”

Autoscaling comes in handy when Tokopedia runs limited time campaigns such as the Semarak Maret Mantap, or “Great March,” that encourages users to open the Tokopedia Play app on their phone and shake it to win prizes. The application, supported by Google Cloud, scaled servers down by 30x after the dual-screen event on TV and the Tokopedia app ended. According to Tahir, Tokopedia saved money by not having to provision hardware just for that purpose.

“Scalability, at a very basic level, means your application can handle a bigger load if you add more hardware to it,” Tahir explains. “By moving to GKE, we have more than just scalability, we have reliable scalability, better known as elasticity. We can scale up and down as many times as needed, without having to laboriously configure VMs.”

Achieving redundancy with global load balancing

Tokopedia uses Cloud Load Balancing to provision service instances in Google Cloud regions around the world. This feature is useful for Tokopedia’s multi-region business continuity planning. The load balancer doesn’t need to be pre-warmed to handle spikes in traffic. If a server in one region fails because of a man-made or natural disaster, the load balancer seamlessly shifts the traffic to servers with the most available capacity.

“One of the features that I particularly like on Google Cloud is global load balancing because our application is available via a single global IP address,” says Tahir. “Previously, we used DNS policy for application load balancing by configuring multiple IP addresses on the same domain. That method worked, but global load balancing offers a more scalable and robust DNS setup.”

Reducing complexity with Istio on Google Kubernetes Engine

The concept of a service mesh came about from the need to manage and deploy huge numbers of microservices. “At Tokopedia, we run a few hundred microservices, and all of them interact with each other in a way that isn’t immediately obvious. It’s hard to tell which microservice depends on which other microservices without auditing the code and the traffic,” Tahir notes. “Istio on GKE makes it easy to manage the overall microservices ecosystem and observe the telemetry from the containers.” Tokopedia is the first company in Indonesia to deploy Istio for such a high volume of traffic.

Tokopedia plans to leverage Istio's identity and access control policies to help secure microservices running on GKE beyond the current security provided by IAM. Istio helps to authenticate services and provide the right level of access to data.

“Running our ecommerce platform on Google Kubernetes Engine (GKE) helps us to improve user experience and keeps shoppers coming back.”

Tahir Hashmi, Vice President of Engineering, Tokopedia

Democratizing commerce through big data with Google BigQuery

At the moment, Tokopedia uses BigQuery to analyze traffic and transactional data such as logistics and billing and create reports on customer insights. For example, product managers use sales forecasting to anticipate how much budget is needed for promotion on a daily, weekly, or monthly basis.

Because BigQuery integrates with G Suite, which Tokopedia uses across the organization, employees and partners are able to easily share and collaborate on reports in Sheets.

Moving forward, Tokopedia is looking to improve customer satisfaction with data science. Analyzing the transactional data through BigQuery would allow Tokopedia to do demand prediction for more effective logistics delivery time and costing. "This would allow merchants and customers from different islands to enjoy same-day delivery," says Herman Widjaja, Senior Vice President of Engineering of Tokopedia.

About Tokopedia

Tokopedia, one of Indonesia's leading tech companies with one of the country's largest marketplace platforms, serves 5.9 million merchants and 90 million active users every month. With 5% ecommerce penetration in Indonesia, there are still major opportunities and exciting challenges ahead for Tokopedia to further democratize commerce through technology for the 260 million people of Indonesia.

Industries: Retail & Consumer Goods
Location: Indonesia