How Meesho migrated a petabyte of data into Cloud CDN with zero downtime
Kishore Jagannath
Cloud Infrastructure Engineer
Siddharth Gupta
Architect-Common Platform, Meesho
Meesho is an Indian online marketplace that serves millions of customers every day. Recently, the company decided to adopt a multi-cloud strategy, leveraging Google Cloud’s scalable and reliable infrastructure to drive operational efficiency, modernize and scale for growth. To do so, they needed to migrate billions of static files and images to Google Cloud, to render the static content that serves their web and mobile applications. But with over a petabyte of data in their object storage system, and 10 billion requests per day, Meesho needed to perform this gigantic migration gradually, with zero downtime — a huge challenge.
In this blog post, we look at how Meesho did this using Storage Transfer Service, Cloud Storage and Cloud CDN. We also look at how it saved on storage capacity by resizing static images as needed on the fly, using Cloud Run.
CDN migration requirements
Migrating from one cloud to another isn’t easy. To pull it off, Meesho identified the following requirements:
- Petabyte-scale data transfer: Meesho needed to migrate billions of image files from their existing object storage server to Cloud Storage.
- Dynamic image resizing: To save on storage costs, Meesho wanted the ability to dynamically resize the images based on the end user platform and store the smaller images in the Cloud CDN cache.
- High-throughput data transfer: To support consumer demand, Meesho needed images to be served at a throughput of thousands of requests per second.
- Zero downtime: Since any downtime involves potential loss of revenue, Meesho needed to perform the migration without taking any systems offline.
Migration architecture
Cloud CDN architecture in Google Cloud
The above figure depicts the CDN migration architecture implemented in Meesho. The existing DNS server points to both the source load balancer as well as Google External HTTP Load Balancer with weighted distribution. The source load balancer points to the source object storage. Images were transferred from the source object storage to Google Cloud Storage.
The Google External HTTP Load Balancer was deployed with Cloud CDN to serve static images that are stored in the CDN cache to users. The Google Load Balancer public IP is configured as an end point on their existing DNS server. The Load Balancer is connected to Cloud Run, which talks to the Cloud Storage bucket. When a request reaches the Load Balancer in the edge, it first checks if the content is available in Cloud CDN, and returns the object from the closest edge network. If the image is not available in the Cloud CDN cache, the request is sent to Cloud Run which obtains the image from the Cloud Storage bucket and performs dynamic resizing of the image if necessary.
Data transfer
Meesho used Google Cloud’s Storage Transfer Service to transfer data from their current object storage to Cloud storage bucket over the internet. Since the number of files and total size of the data to be transferred was huge, Meesho executed multiple parallel transfers by specifying folders and subfolders as prefixes in a Storage Transfer Service job.
Dynamic image resizing
Meesho delivers static images to multiple end user platforms — mobile, laptop — at multiple resolutions. Rather than store each image at multiple image resolutions, Meesho opted to store a single high-resolution mezzanine image. It then attached Cloud Run as a serverless network endpoint group to a Cloud Load Balancer. Application requests for images specify the name of the object, the format of the image, and its resolution (for example, abc.jpeg with 750*450 resolution). If the specific image exists for the requested resolution, then it is returned from the Cloud Storage bucket to the end user and stored in the Cloud CDN cache. If an image for a specified resolution and/or format is not found, the mezzanine image (in our example, abc.jpeg) is resized to the specified resolution and format, then stored in Cloud Storage bucket and returned to the end user. The dynamic resizing and formatting is only performed the first time for a specific resolution.
In this architecture, it is important to configure Cloud Run to scale appropriately as it handles a bulk of “CDN cache-miss” requests. Meesho performed the following configuration steps:
- Configured the number of concurrent requests that a single instance of Cloud Run can handle
- Ensured a sufficient minimum of Cloud Run instances were available to serve user traffic to avoid cold-start latency
- Reviewed limits of Cloud Run maximum instance size for the region and increased the limits if necessary to handle peak load
- Set up smaller start-up times for Cloud Run containers, so that the application could quickly autoscale to handle a surge in traffic
- Optimized the memory and CPU configuration to handle processing requirements
CDN configuration
Cloud CDN was configured to ensure a high cache hit ratio > 99 %. This not only sped up the rendering of the images, but also reduced the load on Cloud Run, saving cost and improving performance.
Achieving zero downtime
Meesho followed well-established DevOps principles to achieve a zero-downtime migration:
- Metrics and alerts were configured in Cloud Monitoring to oversee the load balancer.
- The DNS server was configured to point to Cloud Load Balancer IP addresses in addition to their current load balancer, which served status assets.
- Weight-based DNS load balancing was employed to gradually shift the traffic to Google Cloud, while monitoring application performance and HTTP response codes.
- The initial migration process distributed .1% of traffic during non-peak hours. The metrics, end user performance and response codes were continuously monitored.
- Traffic was gradually incremented over a two-week period by increasing the weight of the Google Cloud Load balancer in DNS. By gradually shifting traffic, Meesho ensured a healthy cache-hit ratio, allowing Cloud Run to learn traffic patterns gradually and scale seamlessly.
Meesho learned a lot through this experience, and has the following advice for anyone undertaking a similar migration:
- While transferring data using Storage Transfer Service of Google Cloud, split the transfer process into multiple transfers.
- Ensure that applications do not pin certificates, which could create problems while migrating to the newer certificates in Google Cloud.
- Plan a gradual migration process to gradually increase the traffic to Google Cloud.
Summary
When all is said and done, Meesho considers its migration to Google Cloud a big success. After migrating the static images to Cloud CDN, Meesho held two major sales that each had three times the normal peak traffic, all with no issues. The CDN migration helped Meesho reduce its costs, improve performance and reduce load balancer errors when fetching static images. To learn more about Cloud CDN and how you can use it in your environment, check out the documentation.