Namshi: Making the most of microservices with Google Kubernetes Engine

About Namshi

Namshi is one of the Middle East's leading online fashion retailers. The fast-growing business offers more than 700 brands to 1.2 million active customers in the United Arab Emirates, Saudi Arabia, and across the region.

Industries: Retail & Consumer Goods
Location: United Arab Emirates

Tell us your challenge. We're here to help.

Contact us

Online fashion retailer Namshi saves time and money by simplifying the management of its microservices with a migration to Google Kubernetes Engine

Google Cloud results

  • Eliminates 50% of the SRE workload with simple scaling to match demand on Google Kubernetes Engine
  • Upgrades infrastructure at the click of a button, without lengthy migrations
  • Creates dashboards with Google Stackdriver to improve transparency and display key metrics

Cuts infrastructure costs by 30%

Founded in 2011, Namshi is now one of the foremost fashion sites in the Middle East, posting a sales increase of 16 percent in 2018. From the beginning, the online retailer embraced cloud computing, building its platform on a range of distinct, containerized services. But over time, this diverse microservices architecture became more and more complicated to handle. Kubernetes clusters had to be managed manually, making life difficult for Namshi's two Site Reliability Engineers (SREs).

"We'd spend at least half our time troubleshooting, just to keep the site stable and running," remembers Abdelrahman Shiddo, the company's SRE Manager. "As Namshi grew, the task of managing our microservices became really complicated. We were firefighting. We'd start fixing one issue, only to be interrupted by a new issue somewhere else."

"It was frustrating to spend half our time maintaining infrastructure. We had tried various management solutions with no success, so when we tested Google Kubernetes Engine, it was a revelation. With one click, we could provision a cluster and have it running within minutes. Our biggest day-to-day problem disappeared overnight."

Abdelrahman Shiddo, SRE Manager, Namshi

The team identified networking as the root cause of the problems. "Because our networking was suboptimal, any time we scaled up or down we'd get a lot of errors," says Abdelrahman. "In effect, keeping the clusters up and running was taking up 30 percent of the team's time." Cost was another factor, as every additional cluster they requested meant new budget demands. "We realized that we needed new nodes to keep things running, but we'd have to calculate how much they would cost at the same time," continues Abdelrahman. "That added another layer of stress to the decisions we had to make."

Instead of spending time on maintenance, Namshi wanted to focus on developing and deploying new releases to add value to the business. The team looked for a cost-effective way to simplify management of its microservices: something they felt they could rely on, so they could worry less, and innovate more.

"It was frustrating to spend half our time maintaining infrastructure," says Abdelrahman. "We had tried various management solutions with no success, so when we tested Google Kubernetes Engine (GKE), it was a revelation. With one click, we could provision a cluster and have it running within minutes. Our biggest day-to-day problem disappeared overnight."

Matching microservices with the right management tools

Organizations typically choose a microservices architecture for the improved scalability, resilience, and speed of development that it can deliver. In practice, however, the wrong management tools can make those benefits hard to realize. Namshi built up its microservices architecture over time, but running the system had become an increasingly difficult manual task, especially when scaling.

"Traffic to our site might spike by 30 percent to 50 percent in just a few minutes," says Abdelrahman. "That's when we'd see errors, because our system wasn't ready for that traffic. And then we would research ways to fix the problem and manually tweak our system. It was a long-winded way of doing things. Often we would try to predict peaks and provide extra capacity ahead of time, because we just didn't trust that things would scale properly in the moment."

"We had millions of rows of data to move to Google Cloud SQL, and it would have taken forever to do it ourselves. Google Cloud made it easy when they released an API that copied everything across from the database we had with our previous cloud provider."

Abdelrahman Shiddo, SRE Manager, Namshi

With Namshi's previous cloud provider, infrastructure could take several minutes to respond to spikes in traffic. Users would receive error notifications during that time, or be unable to access the website. And because it was difficult to migrate to new clusters, development was held back, too. "Migrating to a new cluster would take so much time that we would try to keep upgrades to a minimum," says Abdelrahman. "We would limit upgrades to once every four months, but that meant we were always using outdated versions of products, which created other problems."

Abdelrahman and his team looked to migrate to the best cloud provider available for orchestrating Kubernetes. "We ran a benchmark, and the alternatives just could not compare to GKE," says Abdelrahman. "It was so much easier to manage networking and scaling, and cluster upgrades could be done at the click of a button."

As well as moving microservices to GKE, the team migrated its primary database to Cloud SQL too, so it could take advantage of competitive pricing.

"We had millions of rows of data to move to Cloud SQL, and it would have taken forever to do it ourselves," says Abdelrahman. "Google Cloud made it easy when they released an API that copied everything across from the database we had with our previous cloud provider."

Communicating key metrics with dashboards on Stackdriver

With the time Namshi saves thanks to faster upgrades and migrations, the team is looking at new ways to optimize their platform, like building internal dashboards for key metrics. "We want to make metrics as transparent as possible for teams inside Namshi," says Abdelrahman. "That's especially important for making sure we meet our service-level agreements on error rates or latencies. But we also want people in the company to feel more connected to what happens online and the work of our technical team."

To do that, Namshi sends logs and metrics from GKE to Stackdriver, which streams performance and diagnostics data to dashboards.

"It took me less than 15 minutes to set up a dashboard with Stackdriver," says Abdelrahman. "With our previous solution, that would have taken a week and a half. The dashboards are a great extra benefit, alongside our main use of Stackdriver as a fast, reliable, automatically updated monitoring tool."

"One of our favorite aspects of Google Cloud pricing is its committed usage discount. Instead of having to plan ahead and reserve instances, capacity is there when we need it, and the more we use it, the more we save."

Abdelrahman Shiddo, SRE Manager, Namshi

Cutting costs and building trust

By migrating to GKE and Cloud SQL, Namshi has cut infrastructure costs by 30 percent. "Our 30 percent saving is just the start of what we expect to achieve," says Abdelrahman. "We haven't explored optimizing our costs on Google Cloud yet, but we expect to increase that saving to more than 40 percent. One of our favorite aspects of Google Cloud pricing is its committed usage discount. Instead of having to plan ahead and reserve instances, capacity is there when we need it, and the more we use it, the more we save."

Now Namshi is exploring ways to use Google Cloud products to make its site more resilient, too. "As we become more successful, we also become a target for hackers," says Abdelrahman. "We're really interested in using Google Cloud Armor to help protect ourselves from DDoS attacks. In the past, we had limited ways of filtering traffic coming into our cluster, and they didn't integrate seamlessly with the cloud. Thanks to Cloud Armor we're able to set up filters to spot hostile traffic and cut it off before it can affect our services."

Meanwhile, the improved performance of GKE is making waves throughout the organization, as teams learn that they can trust the new infrastructure to deliver.

"With Google Kubernetes Engine, managing our clusters is painless," says Abdelrahman. "We don't worry about things going wrong. Marketing used to check with us before sending push notifications to ensure the clusters were ready to handle it. Now they no longer feel they need to check. We know the system will work."

Tell us your challenge. We're here to help.

Contact us

About Namshi

Namshi is one of the Middle East's leading online fashion retailers. The fast-growing business offers more than 700 brands to 1.2 million active customers in the United Arab Emirates, Saudi Arabia, and across the region.

Industries: Retail & Consumer Goods
Location: United Arab Emirates