Scanning User-generated Content Using the Cloud Video Intelligence and Cloud Vision APIs

This solution describes the Google Cloud Platform (GCP) services used to deploy a scalable system to filter image and video submissions using the Cloud Video Intelligence and Cloud Vision APIs.

Social marketing campaigns often invite consumers to submit user-generated images and videos. Campaigns that solicit videos and images often use them for contest submissions, product testimonials, or as user-generated content for public campaign websites. Processing these submissions at scale requires considerable resources. The Cloud Video Intelligence and Cloud Vision APIs offer you a scalable and serverless way to implement intelligent image and video filtering, accelerating submission processing. If you use the safe-search feature in the Vision API solution and the explicit content detection feature in the Video Intelligence API, you can eliminate images and videos that are identified as unsafe or undesirable content before further processing.


Building an application to process user-generated images and videos poses several unique challenges:

  • Scalability

    At the start of a campaign, the number of submissions grows quickly, but as the campaign winds down, submissions drop to almost zero. If this process is built into a service, it must scale in response to user activity.

  • Performance

    Processing each image and video requires an efficient, intelligent flow. At scale, the application must efficiently store and process each submitted video and image.

  • Intelligence

    Reducing the number of images or videos that need to be evaluated or reviewed prior to processing greatly increases efficiency. The application needs to classify each submission, and immediately stop processing all results detected as inappropriate.


Scalable and intelligent processing using GCP

GCP provides a scalable platform with the benefits of pre-trained machine learning (ML) models available by using straightforward API calls. The following figure shows the architecture of a system that intelligently classifies images and videos, and filters out inappropriate results.

Image and video processing architecture

Cloud Storage

In this architecture, you store all content in Cloud Storage, which provides durable and scalable object storage. One useful feature of Cloud Storage is the ability to generate notification messages based on events in a Cloud Storage bucket. With this feature, you can specify an action for each uploaded file. As your application uploads files to Cloud Storage, notification messages trigger processing.

The following image shows the JSON payload of a Cloud Storage notification message.

JSON payload of a Cloud Storage notification message

Cloud Pub/Sub

Cloud Pub/Sub offers you a scalable and reliable messaging service. In this architecture, when the content is uploaded to a Cloud Storage bucket, the system generates a Cloud Pub/Sub notification message and sends it to the configured Cloud Pub/Sub topic. Cloud Pub/Sub then delivers the Cloud Storage notification message. By separating the upload functionality from the processing functionality, Cloud Pub/Sub effectively separates the application into a microservices-based backend architecture.

Cloud Functions

Cloud Functions provides a lightweight and serverless application environment integrated with a range of advanced APIs such as the Cloud Vision and Video Intelligence APIs. In addition, storage services such as BigQuery, Cloud Storage, Cloud Spanner, and Cloud Datastore are integrated with Cloud Functions, making it a useful way to process on-demand events.

The following figure shows the Cloud Pub/Sub topic triggering the GCStoPubSub Cloud Function for each message sent to the intelligentcontentfileupload topic.

Cloud Pub/Sub topic triggering the Cloud Function


A set of Cloud Functions that listen to Cloud Pub/Sub topics provide backend processing.

  1. Each time a file is uploaded to the Cloud Storage bucket, a Cloud Pub/Sub topic receives a message.
  2. The backend application parses the Cloud Storage location from each notification message and determines whether the content is an image or a video.
  3. The backend application then sends a Cloud Pub/Sub message to a pair of Cloud Functions, calling the Cloud Vision or Cloud Video Intelligence API, depending on whether the file is an image or a video.
  4. Using the API call results, the backend application classifies the content based on logos, labels, text, and safe search results.

The following figure shows the processing steps that each Cloud Function performs. In this example, the results are stored in BigQuery, but you can store the results in any GCP data-storage service.

Cloud Function processing steps


In this architecture, you store the results of the image and video processing in BigQuery. As the scalable analytics engine for GCP, BigQuery provides simple SQL-based access to a petabyte-sized database. BigQuery provides your application with a simple and scalable way to analyze the labels for the submitted content.

Data Studio provides a visual dashboard, where you can filter submitted content using tags generated by machine learning APIs. The tags make it easy for you to dynamically filter content in the Data Studio set of dashboards.

The following image shows a simple Data Studio dashboard that is driven by real-time data from BigQuery.

Simple Data Studio dashboard

As additional content is processed and stored in BigQuery, the dashboard reflects the changes. You can filter the dashboard by content, labels, or inappropriate content count.

What's next

Send feedback about...