Semi Supervised Learning with graphs

This experiment takes in a graph with a small percentage of labeled instances (“seeds”) and propagates labels to all unlabeled items, based on user-provided similarity weights between related items. It can be especially useful when customers have a graph of their data (or can easily generate one), but find labeling each node expensive or time consuming.

Apply for access Private documentation

Intended use

This technology is based on an abstract data structure, namely, similarity graphs, consisting of nodes and (weighted) edges. The nodes can represent any kind of object or entity, and the nodes of a graph can be either homogeneous or heterogeneous. Hence, the technology has had wide applicability across a number of different domains, including images, text, videos, and apps. We expect that external customers will exploit the general-purpose nature of the system by applying it to a diverse range of use cases.

For more context, please read more about semi-supervised graph-powered machine learning in this blog post.

Inputs and outputs:

  • Users provide:

    • Data set in TSV format that represents the weighted edges of a similarity graph
    • Seed and validation labels in TSV format associated with some subset of node IDs
  • Users receive:

    • Data set in TSV format with all graph nodes labeled

Technical challenges: This technology is particularly distinctive in its ability to handle very large graphs. The system scales to large numbers of nodes (XXB), edges (1T), and labels (millions).

What data do I need?

Data and label types: Data are submitted as a graph represented in a TSV file. All nodes have unique IDs, some of the nodes are labeled, and all of the edges between nodes are given a numerical weight relative to their similarity.

While only 1-2% of the nodes need to be labeled, all possible labels should be represented among the labeled seeds.

Specifications: The graph can contain up to:

  • Tens of billions of nodes
  • Trillions of edges
  • Millions of unique labels

What skills do I need?

As with all AI Workshop experiments, successful users are likely to be savvy with core AI concepts and skills in order to both deploy the experiment technology and interact with our AI researchers and engineers.

In particular, users of this experiment should:

  • Have experience using graph data structures
  • Be familiar with accessing Google APIs