Sky: Scaling for success with Sky Q diagnostics

About Sky

Based in London, Sky is one Europe’s leading media and telecommunications companies, operating in the UK, Ireland, Germany, Austria, Italy, and Spain.

Industries: Media & Entertainment
Location: United Kingdom

Tell us your challenge. We're here to help.

Contact us

About Datatonic

Google Cloud partner Datatonic delivers big data and machine learning solutions for telecommunication, media, retail, and finance clients.

Using managed services on Google Cloud, Sky replaces its on-premises big data platform in record time to meet the increased needs of its next generation Sky Q box, now in millions of homes.

Google Cloud results

  • Replaces out of capacity on-premises platform in only six weeks
  • Ability to capture diagnostic data from all Sky Q boxes to match demand with no additional DevOps
  • Establishes a hub for all diagnostic data used to enhance the Sky Q customer experience

Zero diagnostic data loss from millions of Sky Q boxes

Sky is one of Europe’s leading media and communications companies, providing Sky TV, streaming, mobile TV, broadband, talk, and line rental services to millions of customers in seven countries. Delivering customer service at such scale is a major challenge, so to help ensure the best possible user experience, Sky collects diagnostic data from its millions of TV boxes, ready for analysis, insight, and action to help ensure service uptime and delivery.

For many years, that meant gathering data on an old Hadoop cluster, as Oliver Tweedie, Director of Data Engineering at Sky, explains: “Sky’s Hadoop cluster was built in 2013 to the specifications of its time, but things moved on fast, both in terms of diagnostics data volumes and in what companies want to do with data. With the introduction of the new Sky Q boxes, we started to see bottlenecks in the diagnostic data collection setup.”

“The data will sit right at the heart of Sky's future strategy. It will help ensure that our products are intuitive and easy to use and that we can keep seamlessly connecting customers with the content and services, they know and love Sky.”

Oliver Tweedie, Director of Data Engineering, Sky

Those bottlenecks had serious consequences, leading to processing backlogs of up to 50 percent of daily data, which compromised feedback and limited the usability of the entire dataset. Sky looked for a solution that could handle the data volumes, but also collect additional usage information to create a rich dataset for analysis.

“By collecting diagnostic data, we can create an essential feedback loop from inside the home,” says Oliver. “The data will sit right at the heart of Sky's future strategy. It will help ensure that our products are intuitive and easy to use and that we can keep seamlessly connecting customers with the content and services, they know and love Sky.”

Cost-effective scaling with zero DevOps

On-premises infrastructure can create problems for businesses looking to put big data at the heart of company strategy. Providing for peaks in demand may mean maintaining idle servers, while failing to meet those peaks can cause data loss that compromises entire datasets. At the same time, increasingly diverse data processing possibilities add to a DevOps burden that may make solutions difficult to scale.

For Sky, an increased flow of diagnostic data from its TV boxes ran up against the limitations of its existing, on-premises Hadoop cluster. “As we rolled out Sky Q, we had more traffic and more diagnostic metrics from the TV boxes,” says Oliver. “The on-premises infrastructure was struggling to meet increased demand. We were chasing our tails to fix bottlenecks in the network stack as they emerged. Up to 50 percent of the data on any given day was held up waiting to be processed. Because the boxes would report back in bursts, we would get spikes in flows of data. That made the biggest problem one of infrastructure scale.”

As data reliability became a critical issue, Sky began mirroring diagnostic data in the cloud. “Without diagnostic data, we cannot assess service quality,” explains Oliver. “Sky Q is a sophisticated system that relies on several features working well, so if some of our boxes are invisible to us, it means we are blind to the customers’ experience.” However, mirroring the data proved an expensive workaround rather than a true solution. Sky looked to create a new, cloud-based architecture that could scale, with the potential to handle not only diagnostics from millions of Sky Q boxes, but data from all Sky products to work towards improving the entire Sky experience.

To do that, Sky worked with Google Cloud partner Datatonic to create a solution on Google Cloud. By landing diagnostic data from set-top boxes directly in Cloud Pub/Sub, Sky eliminates data loss caused by bottlenecks in server capacity. Data is then parsed through Cloud Dataflow to Cloud Storage and BigQuery, monitored on its way by Stackdriver, which triggers email and Slack alerts should issues occur.

"This project went into production in less than three months’ development time, due to the serverless architecture and NoOps design. Since the launch, there’s been no data loss and no noteworthy incidents, and we continue to scale out to more set-top boxes and more countries without friction or rework."

Louis Decuypere, Founder, Datatonic

“Sky publishes between 200 and 300 million events per day, and up to 600 million at peak, with that number doubling next year,” says Louis Decuypere, Founder at Datatonic. “Cloud Pub/Sub can handle that volume straight out of the box with no need for manual interference. You just publish your event and Pub/Sub makes sure it gets delivered. It's massively scalable and globally distributed, so Sky could launch the system in another country with very minimal additional setup required.”

Due to the entire pipeline being built with Google managed services, the solution scales automatically to match the peaks caused when set-top boxes report in bursts. In addition, should Sky choose to move away from batch processing in the future, the solution’s combination of Cloud Dataflow and Apache Beam will help ensure an almost seamless transition. “With changes to a couple of lines of code, we could use the same framework to switch from batch to real-time processing,” says Louis. “At this level, that's unique to this technology.”

BiqQuery lies at the heart of the solution. As well as supplying dashboards in Tableau, BigQuery acts as a hub for all of the collected data, making it available on a self-serve basis for a range of teams and tools. Both raw and enriched events are also stored in the near-infinite capacity of Cloud Storage, made especially cost-effective with the Coldline and Nearline archival storage classes.

"This project went into production with less than three months’ development time, due to the serverless architecture and NoOps design,” says Louis. “Since the launch, there's been no data loss and no noteworthy incidents, and we continue to scale out to more set-top boxes and more countries without friction or rework."

Forming a strategic relationship

Sky switched cloud provider when it moved from the data mirroring measure to the Google Cloud solution. “With Google, we could see the opportunity for a true strategic relationship,” says Oliver. “We established a framework agreement and a Platinum support package. On our own, it wasn’t cost effective to build and maintain our own infrastructure. Technology was changing too quickly. Google Cloud has allowed us to capitalize on that technological change, and not worry about the scale of the infrastructure. When we had issues with the way data was ingested into Pub/Sub, we were put through to the Pub/Sub support team who came up with a resolution really quickly.” For a company of Sky’s scale, the proven performance of Google offered additional reassurance.

“Google uses a version of this architecture to process the diagnostic data that comes back from Android mobile phones,” says Oliver. “We’re using a technology that had been battle tested with that number of devices and on a global scale. It's mature technology that is now commoditized and opened up to the rest of the world.”

“This fully functioning Google Cloud solution will act as a blueprint for future Sky projects. We can capture all diagnostic data in Google Cloud and use it to inform our future strategy. Sky management sees this as the beginning of a new era in data management, analytics, and data science.”

Oliver Tweedie, Director of Data Engineering, Sky

Collecting the data to drive decisions

It took Sky and Datatonic just six weeks to develop, test, and go live with the new solution. Since then, Sky reports all diagnostic data has been successfully collected from Sky Q boxes. Sky now plans to collect all of its diagnostic reporting on the new solution, either expanding its current pipeline, or the architecture with Cloud IoT Core. By combining set-top box diagnostic and viewing data with streamed and batched information from reference feeds and other resources, Sky will create a data warehouse on BigQuery as a one-stop shop for all queries.

“This fully functioning Google Cloud solution will act as a blueprint for future Sky projects,” says Oliver. “We can capture all diagnostic data in Google Cloud and use it to inform our future strategy. Sky management sees this as the beginning of a new era in data management, analytics, and data science.”

Tell us your challenge. We're here to help.

Contact us

About Sky

Based in London, Sky is one Europe’s leading media and telecommunications companies, operating in the UK, Ireland, Germany, Austria, Italy, and Spain.

Industries: Media & Entertainment
Location: United Kingdom

About Datatonic

Google Cloud partner Datatonic delivers big data and machine learning solutions for telecommunication, media, retail, and finance clients.