OnCrawl: Pushing websites up the ranks with data-based SEO analysis, powered by Google Cloud

About OnCrawl

There’s no crawl that OnCrawl can’t handle. The enterprise technical SEO platform from Cogniteev helps customers analyze their websites with a powerful web crawler and a suite of related tools. With interactive, data-based analysis of custom data segments, OnCrawl helps websites of any size improve their rankings, boosting conversions, and online revenues.

Industries: Technology
Location: France

Tell us your challenge. We're here to help.

Contact us

Leveraging the scalability and processing power of Google Cloud, OnCrawl helps customers boost their SEO rankings across search engines by analyzing even the largest sites, on demand.

Google Cloud results

  • Puts customers first, delivering actionable SEO results on demand for any project size by autoscaling with Google Kubernetes Engine
  • Speeds time to market for new features and third-party integrations, increasing deployment by more than 200%
  • Frees up IT teams from maintenance tasks, allowing faster development and more innovation

Data-based analysis boosts traffic for customers by 70%

Any business that wants to establish its brand needs to impress its potential customers—but before it can do that, it needs to be discoverable. And that means impressing web crawlers first. Crawlers help search engines to quickly compile information on the web, categorize it into an index, and send it to the results page. Every business with a website wants its URLs to rank as highly as possible on that page so that customers can find it easily. That’s where cloud-based search engine optimization (SEO) platform OnCrawl comes in.

Businesses small and large use OnCrawl to understand how crawlers analyze web pages, so they can optimize them for better results. The service can help customers discover errors in on-page optimization, HTML, or linking structures that can compromise search-engine rankings and cause drops in traffic. A furniture store, for example, might use OnCrawl to evaluate which product categories, such as “bedroom” or “cabinets,” typically bring searchers to their website and study correlations to establish which characteristics make these categories most successful in search engines. Online magazines can boost the reach of their articles by monitoring the time it takes for them to be listed in search engines and by optimizing their content for keywords and their site structures for better search snippets.

“Our customers need to be able to use our platform for whatever they need, whenever they want. To make that possible, we need technology that’s automated, always-on, and able to support resource-intensive tasks. Choosing Google Cloud early on allowed us to scale from zero to where we are today."

Philippe David, Chief Technology Officer, OnCrawl

With connectors to all the major analytics platforms and other data sources, customers can then add the gathered data to the visualization dashboards of their choice, making actionable SEO insights available on the fly. Today, the company handles 7,000 crawls per month, or more than 250 million monthly crawled URLs.

“Our customers need to be able to use our platform for whatever they need, whenever they want,” says Philippe David, Chief Technology Officer at OnCrawl. “To make that possible, we need technology that’s automated, always-on, and able to support resource-intensive tasks. Choosing Google Cloud early on allowed us to scale from zero to where we are today.”

“Google Kubernetes Engine enables us to tackle gigantic projects immediately, allowing our customers to crawl even hundreds of millions of JavaScript pages without prior notice. We’ve seen our JavaScript crawler scale up from 10 to 750 pods in just a few minutes without issue.”

Philippe David, Chief Technology Officer, OnCrawl

Delivering faster crawls for customers, on demand

Different websites bring different challenges for crawlers. JavaScript websites, for example, are more dynamic and resource heavy, which makes them harder to crawl than regular websites. But even regular crawls require a set of tasks to run in perfectly automated sequences. This includes fetching the data and processing it with Apache Spark. To make its crawls more reliable and more powerful, OnCrawl started adopting a more microservices-focused approach with Google Kubernetes Engine (GKE) in 2017.

GKE enables OnCrawl’s developers to deploy new features more regularly and automate tasks that used to require manual maintenance. Instead of provisioning and maintaining a cluster for every Spark job, OnCrawl spawns ephemeral clusters automatically with Dataproc, which are removed after the job is complete. That automates crawls further and makes clusters easier to maintain and cheaper.

For customers, more frequent deployments and automatic scaling translate to a better user experience with faster crawls, available on demand. Because of unlimited scalability, smaller clients don’t lose out on quality just because large customers need the resources. “Google Kubernetes Engine enables us to tackle gigantic projects immediately, allowing our customers to crawl even hundreds of millions of JavaScript pages without prior notice,” says Philippe. “We’ve seen our JavaScript crawler scale up from 10 to 750 pods in just a few minutes without issue.”

Enabling smoother, more integrated data processes

Large or small, what OnCrawl’s customers increasingly want is data access. By building a 500 TB data lake with Cloud Storage, OnCrawl created a singular data culture throughout the company, processing data at scale with BigQuery. With its ready-made libraries, BigQuery is easy to integrate with analytics and visualization tools, which enables OnCrawl to provide customers with exactly the data they want, in the tools they’re using. On top of this, the connectivity of the Google Cloud data solutions makes it easy to integrate new tools as the platform evolves.

In 2020, OnCrawl responded to the growing popularity of Looker Studio with a dedicated connector for the service. With the infrastructure already in place, OnCrawl’s developers just needed to leverage API integrations between the Google Cloud services. In today’s setup, BigQuery automatically fetches data stored in Cloud Storage in the Apache Parquet format and exports it to Looker Studio.

With the Looker Studio connector, customers can send crawl data to their dashboard automatically, where it’s updated in real time as new data comes in. This makes large amounts of data immediately digestible and converts it into actionable insights for marketing departments, managers, or clients.

“Making the data available for our customers in the formats they need has become an integral part of our SEO work,” says Rebecca Berbel, Content Manager at OnCrawl. “Once we saw a huge increase in the use of Looker Studio as a visualization device, our Google Cloud infrastructure made it incredibly easy to respond to that demand and give our customers the features they’re asking for.”

“Machine learning is the future of SEO. With all the processing power we need and automatic scaling, Google Cloud enabled us to transform OnCrawl into a machine learning platform. That has put us well on our way to become the global leader for data-based SEO analysis in machine learning environments.”

Vincent Terrasi, Head of Data and Product, OnCrawl

Preparing, analyzing, and exporting data quickly

The more data OnCrawl’s customers gather, the more it takes to make sense of it. With advanced segmentation, OnCrawl empowers customers to analyze their websites more systematically, to discover tendencies or anomalies based on types of pages or specific metrics, for example. With custom segmentation, users can cross-analyze data at the most meaningful intersections for their business, revealing actionable patterns other SEO platforms might miss.

“The processing power and on-demand scalability of Google Cloud enable us to prepare, analyze, and export data quickly, even in custom segmentations our users created,” says Vincent Terrasi, Head of Data and Product at OnCrawl. “That’s why we’re the only solution on the market that can analyze segmentation data in real time.”

OnCrawl’s ongoing quest to make SEO data more actionable has earned the company many awards, including Best Global SEO Software 2020. But the real reward is the success of their customers. “Some have increased organic traffic to their websites by up to 70% or more by improving their site structures based on accessible SEO insights,” says Rebecca.

Meanwhile, today’s web crawlers are exploring new territories as well. Using machine learning models, search engine algorithms are becoming ever more sophisticated and difficult to analyze. To keep scoring front-row seats in the search engine arena, OnCrawl’s customers will need even better SEO data in the future. That’s why OnCrawl has begun to revamp components to process all the data it gathers with machine learning algorithms. Having the Google Cloud infrastructure in place gives OnCrawl the confidence to adopt computationally heavy machine learning technologies quickly.

“Machine learning is the future of SEO,” says Vincent. “With all the processing power we need and automatic scaling, Google Cloud enabled us to transform OnCrawl into a machine learning platform. That has put us well on our way to become the global leader for data-based SEO analysis in machine learning environments.”

Tell us your challenge. We're here to help.

Contact us

About OnCrawl

There’s no crawl that OnCrawl can’t handle. The enterprise technical SEO platform from Cogniteev helps customers analyze their websites with a powerful web crawler and a suite of related tools. With interactive, data-based analysis of custom data segments, OnCrawl helps websites of any size improve their rankings, boosting conversions, and online revenues.

Industries: Technology
Location: France