Mito.ai: AI-powered text analytics built by Norwegian startup

About Mito.ai

Seeded by a grant from the Norwegian Research Council in 2016, Mito.ai developed a machine learning solution that can extract meaning from millions of daily news articles, social media posts, reports, and so on, across languages. The small Trondheim-based firm uses its innovative approach to natural language processing to provide clients with actionable business insights in batch or on demand from a single API.

Industries: Technology
Location: Norway

Mito.ai, a media intelligence and analytics startup in Trondheim, Norway, created a language-agnostic data processing platform that uses machine learning and knowledge graph engineering to make sense of tens of millions of daily news articles, blogs, social media posts, reports, and other unstructured data documents in multiple languages.

Google Cloud Results

  • Gains significant savings on computing power via reliable auto-scaling up and down
  • Boosts throughput 3x in less than 5 minutes, 6x in less than 15 minutes
  • Experiments at any scale without building or maintaining one-off build/deploy environments
  • Minimizes native build environment — a single node cluster with 2 CPUs — to run Jenkins on off-peak hours

Dramatically minimizes DevOps resources

The demand for AI-assisted analytics is rising sharply. As retailers, publishers, financial services companies, and others look to capitalize on new business opportunities, text analytics can cue timely business insights and reveal new strategies for reaching and serving end-users. Scaling quickly to sort and reliably analyze vast amounts of unstructured data in content worldwide is key.

Mito.ai relies on Google Cloud Platform (GCP) to deploy, operate, and deliver results in real time. The company, which has brought together a small development team to focus on creating powerful AI solutions, is well positioned to ride a market projected to double in size to nearly $8 billion by 2022.

Marit Rodevand, Patrick Skjennum, and Sigve Søråsen, the company's co-founders, all met through the Norwegian University of Science and Technology (NTNU). Keen on AI, the Department of Computer Science championed Patrick's Master's project. His work laid out a promising path for extracting meaning from unstructured data in multiple languages by using a knowledge graph to abstract, chunk, and categorize content.

"Building DevOps and production environments from the ground up for our startup was not an option. Google Cloud Platform is a vital enabling resource."

Marit Rodevand, CEO and Co-founder, Mito.ai

Marit quickly saw its commercial potential for the financial and media industries. With a grant from the Norwegian Research Council and a prominent early adopter, Norway's leading financial news outlet (Dagens Næringsliv — dn.no), Mito.ai was launched.

Mito.ai traverses the huge labyrinth of unstructured data using state-of-the-art natural language processing (NLP) and machine learning (ML), making insights available through intuitive and powerful customer APIs.

The Mito.ai GraphQL APIs, secured by Auth0, are designed to support extremely nuanced queries. According to Marit, one such query may result in fetching "the most relevant buying signals from pharmaceutical companies headquartered in New York City whose earnings exceeded $10M in revenue." The ability to apply filters and constraints with such human-level abstraction make querying Mito.ai's system simple, yet powerful, often reducing the number of returned documents by several orders of magnitude in comparison with traditional media monitoring systems. This can create value in numerous areas of application, but the first product Mito.ai is bringing to the market is a tool to help sales organizations prioritize and understand their B2B customers and prospects.

Example of how Mito.ai's pipeline filters down search results for news articles, matching a humanly abstracted query with the corresponding content represented as stories, for one of its customers.
Example of how Mito.ai's pipeline filters down search results for news articles, matching a humanly abstracted query with the corresponding content represented as stories, for one of its customers.

Customers, for example, can specify a Story object that traverses semantically chunked articles to return only the most relevant story points across blogs and articles while eliminating duplicates.

Since Mito.ai technology is language agnostic, the company is well positioned in linguistically fragmented markets like those in Europe and Asia. The solutions currently support Norwegian, Swedish, Danish, English, French, Spanish, and German, with support for more languages underway.

Customers specify a Story object to refine and filter content queries.
Customers specify a Story object to refine and filter content queries.

Scaling the Opportunity

From the start, the challenge was the scale required to achieve Mito.ai's potential. In order to provide deep insights to its clients, Mito.ai needed to be able to analyze millions of documents from the start.

"Building DevOps and production environments from the ground up for our startup was not an option," says Marit, noting Mito.ai's ingestion of tens of millions of published news articles, blogs, and reports daily. "Google Cloud Platform is a vital enabling resource."

The company's initial proof-of-concept ran on Ubuntu VMs in Compute Engine. It consisted of a single pipeline in Apache Spark Streaming which continuously read and parsed textual content from RSS. A simple NLP pipeline, including a knowledge graph stored as an in-memory Redis, made sense of the content.

Making it extensible was the next milestone. "We started splitting the system into separate modules for content ingestion, content analysis, and knowledge enrichment — all of which were connected through messaging queues using Cloud Pub/Sub," explains Patrick. The company's knowledge base was moved from Redis and into Elasticsearch, and the Spark clusters became orchestrated by Yarn and Zookeeper. Mito.ai swapped its early NLP pipeline for a more sophisticated and modular microservice setup.

"Our initial reason for choosing Google Cloud Platform over AWS and Azure was the superior support for Apache Spark through Cloud Dataproc. Managing and running Spark jobs went from being a constant struggle with high costs of operations to becoming automatically managed and scalable."

Patrick Skjennum, CTO and Co-founder, Mito.ai

Though the system was now extensible, responsiveness, maintainability, and deployment quickly emerged as stumbling blocks. The solution requires running thousands of concurrent ML models to consume and sort ingested data. Prior to adopting GCP, Mito.ai had to manually deploy 10 services to different machines, each with different requirements, dependencies, and configurations.

This created two related problems for developers: the complexity of managing physical resources and the lack of convenient scaling. "Not only was it expensive, but also hard to maintain," says Marit. "Developers spent a considerable portion of their time manually operating, monitoring, and maintaining the system."

Automating the AI Pipe

An obvious approach for Mito.ai was outsourcing as much infrastructure and administration as possible to the cloud. "After having burned through the free credits of the Google Cloud Platform trial program, our minds were made up," says Marit.

Explains Patrick, "Our initial reason for choosing Google Cloud Platform over AWS and Azure was the superior support for Apache Spark through Cloud Dataproc. Managing and running Spark jobs went from being a constant struggle with high costs of operations to becoming automatically managed and scalable."

Google Kubernetes Engine (GKE) and Cloud Dataproc were crucial to Mito.ai's successful revamping. Cloud Dataproc, which helps launch and tear down clusters supported by Compute Engine VMs on the fly to meet processing loads, enables the team to stay focused on analytics, not IT. The Google Kubernetes Engine container environment also further accelerates system deployment as well as greatly streamlines Mito.ai IT administration.

Mito.ai uses GCP and Kubernetes to help automate its deployment and build pipeline.
Mito.ai uses GCP and Kubernetes to help automate its deployment and build pipeline.

"We re-architected the system so we could have everything automatically deployed, managed, and monitored in Google Cloud Platform," explains Patrick. The solution's content processors, ML services, and APIs are conveniently contained in stateless microservices running in Kubernetes. Mito.ai runs its data ingestion (millions of articles a day) through Cloud Storage and the messaging queues use Cloud Pub/Sub.

When analyzing all of those blogs, news articles, reports, and social media posts and constantly adding new sources, the system's ability to scale is paramount. From the Mito.ai redesign around GKE, a horizontally scalable system emerged. By using autoscale features enabled by GCP, Mito.ai could triple or even quadruple its processing power within minutes. And GCP is proving agile all around for the startup, saving money by down-scaling cloud resources when traffic drops.

An example is Mito.ai's self-hosted NLP service based on state-of-the-art natural language frameworks. While the frameworks are optimized for batch processing multiple documents, GKE helps Mito.ai serve them through a low-latency single-document API. When experiencing an increase in load, the solution scales the number of workers, deployed as pods on Kubernetes, until demand is met.

Cloud Dataproc offers Mito.ai a siCloud Dataproc offers Mito.ai a simple, efficient way to run Apache Spark clusters to support the company's demanding data processing and ML requirements. Kubernetes Engine powers rapid deployments, manages containers, and supports API and client services. It also helps move data to and from distributed GCP storage. Built on GCP, the system scales in real time to meet peak demand and tears down Compute Engine VMs as demand shrinks.mple, efficient way to run Apache Spark clusters to support the company's demanding data processing and ML requirements. Kubernetes Engine powers rapid deployments, manages containers, and supports API and client services. It also helps move data to and from distributed GCP storage. Built on GCP, the system scales in real time to meet peak demand and tears down Compute Engine VMs as demand shrinks.
Cloud Dataproc offers Mito.ai a simple, efficient way to run Apache Spark clusters to support the company's demanding data processing and ML requirements. Google Kubernetes Engine powers rapid deployments, manages containers, and supports API and client services. It also helps move data to and from distributed GCP storage. Built on GCP, the system scales in real time to meet peak demand and tears down Compute Engine VMs as demand shrinks.

All of Mito.ai's code is hosted on GitHub and automatically deployed in the Kubernetes environment by Jenkins using Helm, a package manager for Kubernetes. Because GCP has also enabled the team to develop a system that can be managed easily via the internet, they have more flexibility to deal with any issue at any time. "Deploying a new service from concept to production can usually be done within an hour," says Patrick. "It's so easy that team members have sometimes deployed a critical bug fix from public transportation or the occasional bar."

In addition to the GCP benefits already noted by Mito.ai's developers, the company found other advantages to operating on Google Cloud and a containerized platform: greater freedom to experiment. It became easier to explore alternate setups and run environments with new features because these could be reverted quickly and reliably. "Our intellectual property is continuously expanding largely because GCP makes experimenting easy," says Marit.

"You could say that Google Cloud Platform has become our operations department, and that's been like adding two full-time developers to our team."

Patrick Skjennum, CTO and Co-founder, Mito.ai

More Focus on IP

One of the biggest takeaways going from barebones computers to a more supported and comprehensive environment built on GCP is that developers can spend practically all their time writing code. And that gives the team more time to create solutions and get them to market sooner.

"The ability to spin up dozens of machines with the click of a button has made it possible to test and deploy ML and big data analytical services in a matter of minutes, which previously could have taken hours or days," explains Marit.

Developing and delivering services has become so manageable that Mito.ai does not even have an operations department. "With Google Kubernetes Engine and powerful templating tools in Helm, DevOps has been reduced to modifying YAML files, pushing them to GitHub, and watching it build and deploy on the Jenkins monitor at our office," says Patrick. "You could say that Google Cloud Platform has become our operations department, and that's been like adding two full-time developers to our team."

Contributors to this story

Marit Rødevand: Mito.ai CEO and Co-founder. Second-time founder. Co-founded Rendra, a construction SaaS company acquired by JDMT. Marit founded Mito.ai while working as entrepreneur-in-residence at the Norwegian University of Science and Technology (NTNU), where she earned her MSc in Engineering Cybernetics and Entrepreneurship.

Patrick Skjennum: Mito.ai CTO and Co-founder. Patrick earned his MSc in Computer Science from NTNU, with a focus on multilingual news article classification using embedded words.

About Mito.ai

Seeded by a grant from the Norwegian Research Council in 2016, Mito.ai developed a machine learning solution that can extract meaning from millions of daily news articles, social media posts, reports, and so on, across languages. The small Trondheim-based firm uses its innovative approach to natural language processing to provide clients with actionable business insights in batch or on demand from a single API.

Industries: Technology
Location: Norway
Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE
Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE