Jump to Content
Transform with Google Cloud

Cloud and consequences: Internet censorship data enters the transformation age

October 26, 2023
https://storage.googleapis.com/gweb-cloudblog-publish/images/GettyImages-172400662.max-2600x2600.jpg
Seth Rosenblatt

Security Editor, Google Cloud

Censored Planet Observatory is transforming the way we analyze censorship data to be ‘more informative’

Hear monthly from our Cloud CISO in your inbox

Get the latest on security from Cloud CISO Phil Venables.

Subscribe

As internet access has spread across the world, so has the desire by governments to control that access. Tactics once reserved for repressive regimes have been implemented in countries where they might not be expected, including Italy, where the government has blocked access to the free eBook website Project Gutenberg.

Rome’s objection to no-cost digital editions of Romeo and Juliet, Moby Dick, A Room with a View, and 70,000 other books qualifies as internet censorship according to Censored Planet Observatory, a team of University of Michigan researchers who measure and track how governments block content on the internet. Censored Planet takes that data and, in collaboration with Google's Jigsaw and powered in part by Google Cloud infrastructure, offers a public dashboard to explore the processed data.

Their goal is to make censorship data accessible and useful, said Vinicius Fortuna, engineering manager at Jigsaw. “Knowing how information is being controlled helps us develop our societies. Relying on governments to be transparent is not enough,” he said. “We’re empowering researchers to use these public data sets to take action.”

The growing urge to censor the internet is not just a problem for activists and journalists, but one that can affect all manner of organizations, enterprises, and entire industries — even in countries where internet censorship might be least expected. While it can be challenging to place a price tag on internet censorship, at least one estimate based on a tool by internet monitoring organization Netblocks said that internet disruptions and restrictions have cost the global economy more than $44 billion since 2019.

Organizations keeping tabs on internet censorship, including Censored Planet, the Open Observatory of Network Interference (OONI), Internet Society, and AccessNow, have found that while internet censorship originally centered around political events such as elections, governments have begun to rely on censorship techniques to restrict internet access to topics including health care, education, and commerce. In Turkmenistan, for example, nearly 20% of all websites are inaccessible because the government there blocks the web host and infrastructure provider CloudFlare.

The art of censorship data analysis evolves with cloud

The only method to collect measurements about internet censorship during the Arab Spring in 2011 was a very labor-intensive process. Researchers had to get participants to install an app that would share relevant data to researchers' servers. Analysts would compare data from many users in the same country, in the hopes that they could better understand which specific websites were censored, or if internet access as a whole was blocked.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Censored_Planet_dashboard_screenshot.max-1100x1100.png

Watch this video to learn more about the Censored Planet Dashboard.

“Years ago, the value of having credible and scalable data on censorship by governments at times of unrest became super-clear to everyone. We wanted to know who’s being blocked, and how,” said Roya Ensafi, founder and director of Censored Planet. Getting volunteers to participate required an arduous, manual process that also put the same volunteers the organization relied on for its data at risk of incarceration and physical harm.

“I was not comfortable asking my friends and family to take risks,” she said. “So over the course of a decade of research, we developed a less-risky approach to data collection.” For each of the remote measurement techniques that Censored Planet and its collaborators developed, the amount of data that was collected increased — sometimes exponentially. The organization now collects censorship data using multiple remote measurement techniques in more than 200 countries, with more than 80 terabytes of data from the past six years behind it.

Taking literal measurements on the phone in Iran was hard and dangerous. Now, you can filter out what sites are broken, and get down to the question of what’s blocked in a specific country, or if there was a specific change in that country. This can potentially help put pressure on a government or ISP that has blocked a site.

Sarah Laplante, lead engineer for censorship measurement, Jigsaw

To evolve from disruption detection to censorship insight required processing power, Ensafi said. Censored Planet had to rethink how it analyzed the data it was collecting, but as a small research lab it had limited capacity to focus on long-term infrastructure.

“The data wasn’t useful for our users because it was too much for them,” she said. “We wanted to extract insights and democratize the censorship analysis. This is where Jigsaw came in.”

The rise of cloud computing helped Censored Planet collaborate with Jigsaw to develop the Censored Planet Dashboard, which lets users explore longitudinal censorship data to quickly understand censors behaviors and censorship events. Censored Planet also published a report in February on how internet censorship analysis has changed since 2011.

“Taking literal measurements on the phone in Iran was hard and dangerous,” said Sarah Laplante, lead engineer for censorship measurement at Jigsaw. “Now, you can filter out what sites are broken, and get down to the question of what’s blocked in a specific country, or if there was a specific change in that country. This can potentially help put pressure on a government or ISP that has blocked a site.”

It also creates an opportunity for internet users who believe their access is censored to use circumvention tools, such as Jigsaw’s open-source Outline, a tool that anyone can use to set up a virtual private network and share access to the free and open internet.

Google Cloud and processing clouds are the only way you can process this data. While we had resources maintained on Google Cloud and other Google services at the time, we found that our mission was a good match for Jigsaw. To be more open to collaboration, Google Cloud was a better fit for us.

Roya Ensafi, founder and director, Censored Planet

Censored Planet does not track website takedowns or on-device censorship, but the network-level censorship it does track nevertheless provides a rich trove of information. Censorship signals and data have become more detailed, and cloud technology has matured, which have led Censored Planet to shift their mission.

When Censored Planet launched in 2018, they were focused on asking if a website was blocked. Now they want answers to next-level questions: Why was a site blocked? How was it blocked? When was it blocked, has it been blocked before, and who blocked it? Importantly for researchers, they want those answers to be available from a single, searchable, publicly-accessible, analytical platform.

“Our collaboration has been a game changer for censorship data analysis,” Ensafi said. “Google Cloud and processing clouds are the only way you can process this data. While we had resources maintained on Google Cloud and other Google services at the time, we found that our mission was a good match for Jigsaw. To be more open to collaboration, Google Cloud was a better fit for us.”

Digging into the data for deeper lessons

While businesses can measure what is happening to their websites and services when they are under the thumb of internet censorship, and provide that data to censorship-monitoring organizations to help them understand how and why they are being blocked, the utility of internet censorship analysis to businesses may not be readily apparent. That’s where the new data analysis techniques come in, said Maria Xynou, director of research and partnerships at OONI, and a frequent Censored Planet collaborator.

“Long-term measurement coverage can help us understand what kind of data is censored over time, and it can help us understand the political environment as censorship happens,” Xynou said. “OONI’s goal is to provide an historical archive of what’s been censored on the internet for future generations. But we also need to scale our infrastructure to meet growing demands of measurements, and publish data in real-time.”

Part of the challenge in monitoring and exposing internet censorship is that the censors can use many different technologies to achieve their goals. Like a game of digital whack-a-mole, the more censorship technologies that are in use, the harder it is to accurately analyze the multiple streams of censorship data.

Google Cloud and processing clouds are the only way you can process this data. While we had resources maintained on Google Cloud and other Google services at the time, we found that our mission was a good match for Jigsaw. To be more open to collaboration, Google Cloud was a better fit for us.

Vinicius Fortuna, engineering manager, Jigsaw

Newer tracking efforts include studying the deployment of TLS middleboxes around the world, Xynou said. And she is hopeful that new analysis technologies driven by machine learning can help.

“We are developing a new analysis process that uses more machine learning, where we aggregate the different types of anomalies and the likelihood of those anomalies,” she said. “Some of the latest experiments have shown a huge improvement in accuracy.”

Jigsaw’s Fortuna echoed that sentiment. “We are transforming the way we analyze censorship data,” he said. “We are shifting from the question of whether a site is blocked, to something way more informative.”

More than 60 organizations in the internet freedom and technology community have requested access to Censored Planet data for their research or advocacy. While Censored Planet takes into consideration legitimate content moderation and control is needed in some cases, including malware detections and parental controls, it’s clear that internet censorship is increasing everywhere.

Research conducted with Censored Planet’s data combined with data from other internet censorship organizations proves that internet censorship is a global problem, said Xynou.

“Ten years ago it was mainly China and Iran doing the censoring. Now, there’s not a country in the world that doesn’t engage in some form of internet censorship,” she said.

Posted in