Processing User-Generated Content Using the Video Intelligence and Cloud Vision APIs (tutorial)

This tutorial shows you how to deploy a set of Cloud Functions in order to process images and videos with the Cloud Vision API and Cloud Video Intelligence API. This functionality was described in Processing User-Generated Content Using the Video Intelligence and Cloud Vision APIs.

Follow this tutorial to deploy Cloud Functions and other Google Cloud Platform (GCP) components necessary for the solution.

Objectives

Costs

This tutorial uses billable components of Cloud Platform, including:

  • Cloud Functions
  • Cloud Pub/Sub
  • Cloud Storage
  • BigQuery
  • Vision API
  • Video Intelligence API

The cost for running this tutorial can be completed within the free tier pricing. The cost without the free tier is approximately $5 for testing 10 images and 10 videos. Use the Pricing Calculator to generate a cost estimate based on your projected usage. New Cloud Platform users might be eligible for a free trial.

Before you begin

  1. Create a Google Cloud Platform (GCP) project, or use an existing one.

    Go to the Projects Page

  2. Make sure that billing is enabled for your Google Cloud Platform project.

    Learn how to enable billing

  3. Enable the Cloud Functions, Cloud Storage, BigQuery, Vision, and Video Intelligence APIs.

    Enable the APIs

  4. Install and initialize the Google Cloud SDK.

  5. Update and install the gcloud components:

    gcloud components update && gcloud components install beta
  6. Prepare your environment for Node.js development.

    GO TO THE SETUP GUIDE

Architecture

The following diagram outlines the high-level architecture:

high-level architecture

Creating Cloud Storage buckets

Cloud Storage buckets provide a storage location for uploading your images and videos. Follow these steps to create four different Cloud Storage buckets:

  1. Create a bucket for storing your images and video files. Replace [IV_BUCKET_NAME] with a valid Cloud Storage bucket name.

    gsutil mb [IV_BUCKET_NAME]
  2. Create a bucket for storing your filtered image and video files. Replace [FILTERED_BUCKET_NAME] with a valid Cloud Storage bucket name.

    gsutil mb [FILTERED_BUCKET_NAME]
  3. Create a bucket for storing your flagged image and video files. Replace [FLAGGED_BUCKET_NAME] with a valid Cloud Storage bucket name.

    gsutil mb [FLAGGED_BUCKET_NAME]
  4. Create a bucket for your Cloud Functions to use as a staging location. Replace [STAGING_BUCKET_NAME] with a valid Cloud Storage bucket name.

    gsutil mb [STAGING_BUCKET_NAME]

Creating Cloud Pub/Sub topics

You use Cloud Pub/Sub topics for Cloud Storage notification messages and for messages between your Cloud Functions.

  1. Create a topic to receive Cloud Storage notifications whenever one of your files is uploaded to Cloud Storage. Replace [UPLOAD_NOTIFICATION_TOPIC] with a valid Cloud Pub/Sub topic name

    gcloud pubsub topics create [UPLOAD_NOTIFICATION_TOPIC]
  2. Create a topic to receive your messages from the Vision API. Replace [VISION_TOPIC_NAME] with a valid topic name. The default value in the config.json file is visionapiservice.

    gcloud pubsub topics create [VISION_TOPIC_NAME]
  3. Create a topic to receive your messages from the Video Intelligence API. Replace [VIDEOIQ_TOPIC_NAME] with a valid topic name. The default value in the config.json file is videointelligenceservice.

    gcloud pubsub topics create [VIDEOIQ_TOPIC_NAME]
  4. Create a topic to receive your messages to store in BigQuery. Replace [BIGQUERY_TOPIC_NAME] with a valid topic name. The default value in the config.json file is bqinsert.

    gcloud pubsub topics create [BIGQUERY_TOPIC_NAME]

Creating Cloud Storage notifications

  1. Create a notification that is triggered only when one of your new objects is placed in the Cloud Storage file upload bucket. Replace [UPLOAD_NOTIFICATION_TOPIC] with your topic and [IV_BUCKET] with your file upload bucket name.

    gsutil notification create -t [UPLOAD_NOTIFICATION_TOPIC] -f json -e OBJECT_FINALIZE [IV_BUCKET_NAME]
  2. Confirm that your notification has been created for the bucket:

    gsutil notification list [IV_BUCKET_NAME]

    Resulting output, if the function succeeds:

    Filters: Event Types: OBJECT_FINALIZE

Create the BigQuery dataset and table

The results of the Vision and Video Intelligence APIs are stored in BigQuery.

  1. Create your BigQuery dataset. Replace [PROJECT_ID] with your project ID and [DATASET_ID] with your dataset name. The [DATASET_ID] default value in the config.json file is intelligentcontentfilter.

    bq --project_id [PROJECT_ID] mk [DATASET_ID]
  2. Create your BigQuery table from the schema file. Replace [PROJECT_ID] with your project ID and [DATASET_ID].[TABLE_NAME] with your dataset ID and table name. The [DATASET_ID] default value in the config.json file is intelligentcontentfilter and for the [TABLE_NAME], the default value is filtered_content.

    bq --project_id [PROJECT_ID] mk --schema bq_schema.json -t [DATASET_ID].[TABLE_NAME]
  3. Verify that your BigQuery table has been created. Replace [PROJECT_ID] with your project ID and [DATASET_ID].[TABLE_NAME] with your dataset ID and table name.

    bq --project_id [PROJECT_ID] show [DATASET_ID].[TABLE_NAME]

    Resulting output:

    BigQuery table

Deploying the Cloud Functions

All of this tutorial's Cloud Functions are available on GitHub. Download the code from GitHub using your choice of tools, or use the following command:

git clone https://github.com/GoogleCloudPlatform/cloud-functions-intelligentcontent-nodejs

Edit your JSON configuration file

After you download the code, edit the config.json file to use your specific Cloud Storage buckets, Cloud Pub/Sub topic names, and BigQuery dataset ID and table name.

{
  "VISION_TOPIC": "projects/[PROJECT-ID]/topics/visionapiservice",
  "VIDEOINTELLIGENCE_TOPIC": "projects/[PROJECT-ID]/topics/videointelligenceservice",
  "BIGQUERY_TOPIC": "projects/[PROJECT-ID]/topics/bqinsert",
  "REJECTED_BUCKET": "[FLAGGED_BUCKET_NAME]",
  "RESULT_BUCKET": "[FILTERED_BUCKET_NAME]",
  "DATASET_ID": "[DATASET_ID]",
  "TABLE_NAME": "[TABLE_NAME]",
  "GCS_AUTH_BROWSER_URL_BASE": "https://storage.cloud.google.com/" ,
  "API_Constants": {
  	"ADULT" : "adult",
  	"VIOLENCE" : "violence",
  	"SPOOF" : "spoof",
  	"MEDICAL" : "medical"
  }
}

Deploy the GCStoPubsub function

Deploy the GCStoPubsub Cloud Function, which contains the logic to receive a Cloud Storage notification message from Cloud Pub/Sub and forward the message to the appropriate function with another Cloud Pub/Sub message.

  • Replace [STAGING_BUCKET_NAME] with the Cloud Storage staging bucket name and [UPLOAD_NOTIIFICATION_TOPIC] with the file upload notification topic name.

    gcloud functions deploy GCStoPubsub --stage-bucket [STAGING_BUCKET_NAME] --trigger-topic [UPLOAD_NOTIIFICATION_TOPIC] --entry-point GCStoPubsub

The command-line output is similar to the following for each of the Cloud Functions:

Copying file:///var/folders/69/wsyfjkld5fq1w_wf7d5pxbv80030br/T/tmphzfCsc/fun.zip [Content-Type=application/zip]...
/ [1 files][138.4 KiB/138.4 KiB]
Operation completed over 1 objects/138.4 KiB.
Deploying function (may take a while - up to 2 minutes)...
...............................................................done.
availableMemoryMb: 256
entryPoint: GCStoPubsub
eventTrigger:
  eventType: providers/cloud.pubsub/eventTypes/topic.publish
  failurePolicy: {}
  resource: projects/[PROJECT-ID]/topics/intelligentcontentfileupload
latestOperation: operations/c2VjcmV0LXplcGh5ci0xMTIxL3VzLWNlbnRyYWwxL0dDU3RvUHVic3ViL0tRaGxHeVFhZHdR
name: projects/[PROJECT-ID]/locations/us-central1/functions/GCStoPubsub
serviceAccount: [PROJECT-ID]@appspot.gserviceaccount.com
sourceArchiveUrl: gs://[STAGING_BUCKET_NAME]/us-central1-GCStoPubsub-bnnmzdzqtjoo.zip
status: READY
timeout: 60s
updateTime: '2017-09-01T14:59:03Z'
versionId: '01'

Deploy the visionAPI function

  • Deploy your visionAPI Cloud Function, which contains the logic to receive a message with Cloud Pub/Sub, call the Vision API, and forward the message to the insertIntoBigQuery Cloud Function with another Cloud Pub/Sub message. Replace [STAGING_BUCKET_NAME] with your Cloud Storage staging bucket name and [VISION_TOPIC_NAME] with your Vision API topic name.

    gcloud functions deploy visionAPI --stage-bucket [STAGING_BUCKET_NAME] --trigger-topic [VISION_TOPIC_NAME] --entry-point visionAPI

Deploy the videoIntelligenceAPI function

  • Deploy your videoIntelligenceAPI Cloud Function, which contains the logic to receive a message with Cloud Pub/Sub, call the Video Intelligence API, and forward the message to the insertIntoBigQuery Cloud Function with another Cloud Pub/Sub message. Replace [STAGING_BUCKET_NAME] with your Cloud Storage staging bucket name and [VIDEOIQ_TOPIC_NAME] with your Video Intelligence API topic name.

    gcloud functions deploy videoIntelligenceAPI --stage-bucket [STAGING_BUCKET_NAME] --trigger-topic [VIDEOIQ_TOPIC_NAME] --entry-point videoIntelligenceAPI --timeout 540

Deploy the insertIntoBigQuery function

  • Deploy your insertIntoBigQuery Cloud Function, which contains the logic to receive a message with Cloud Pub/Sub and call the BigQuery API to insert the data into your BigQuery table. Replace [STAGING_BUCKET_NAME] with your Cloud Storage staging bucket name and [BIGQUERY_TOPIC_NAME] with your BigQuery topic name.

    gcloud functions deploy insertIntoBigQuery --stage-bucket [STAGING_BUCKET_NAME] --trigger-topic [BIGQUERY_TOPIC_NAME] --entry-point insertIntoBigQuery

Testing the Flow

The following diagram outlines the processing flow.

processing-flow

You test the process by uploading your files to Cloud Storage, checking your logs, and viewing your results in BigQuery.

  1. Upload an image and a video file to the [IV_BUCKET_NAME], where [LOCAL_FILE_NAME] is the filename.

    gsutil cp [LOCAL_FILE_NAME] [IV_BUCKET_NAME]
  2. Verify that your Cloud Functions were triggered and ran successfully by viewing the Cloud Functions logs captured in Cloud Logging:

    1. Test GCStoPubsub:

      gcloud functions logs read --filter "finished with status" "GCStoPubsub" --limit 100

      Resulting output:

      GCStoPubsub log

    2. Test insertIntoBigQuery:

      gcloud functions logs read --filter "finished with status" "insertIntoBigQuery" --limit 100

      Resulting output:

      insertIntoBigQuery log

  3. Create SQL commands to query BigQuery. Replace [PROJECT_ID], [DATASET_ID], and [TABLE_NAME] with your project ID, dataset ID, and BigQuery table name.

    echo "
    #standardSql
    SELECT insertTimestamp,
    contentUrl,
    flattenedSafeSearch.flaggedType,
    flattenedSafeSearch.likelihood
    FROM \`${PROJECT_ID}.${DATASET_ID}.${TABLE_NAME}\`
    CROSS JOIN UNNEST(safeSearch) AS flattenedSafeSearch
    ORDER BY insertTimestamp DESC,
    contentUrl,
    flattenedSafeSearch.flaggedType
    LIMIT 1000
    " > sql.txt
    
  4. View your BigQuery results with the following command:

    bq --project_id [PROJECT_ID] query < sql.txt

    Resulting output:

    BigQuery results

    Alternatively, you can sign in to the BigQuery web UI and run your queries:

    1. Open https://bigquery.cloud.google.com in your browser.
    2. Click Compose Query to begin a query as shown here:

      Compose Query

    3. Enter the following SQL into the text box. Replace [PROJECT_ID], [DATASET_ID], and [TABLE_NAME] with your project ID, dataset ID, and BigQuery table name.

      #standardSql
      SELECT insertTimestamp,
      contentUrl,
      flattenedSafeSearch.flaggedType,
      flattenedSafeSearch.likelihood
      FROM `[PROJECT_ID].[DATASET_ID].[TABLE_NAME]`
      CROSS JOIN UNNEST(safeSearch) AS flattenedSafeSearch
      ORDER BY insertTimestamp DESC,
      contentUrl,
      flattenedSafeSearch.flaggedType
      LIMIT 1000
      

      The following example shows what this SQL looks like in the UI:

      SQL query

      Resulting output:

      SQL query results

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

After you've finished the current tutorial, you can clean up the resources you created on Google Cloud Platform so you won't be billed for them in the future. The following sections describe how to delete or turn off these resources.

Delete the project

The easiest way to eliminate billing is to delete the project you created for the tutorial.

To delete the project:

  1. In the GCP Console, go to the Projects page.
  2. In the project list, select the project you want to delete and click Delete project.
  3. In the dialog, type the Project ID and click Shut down.

Delete all the components

  1. Delete the Cloud Functions:

    gcloud functions delete GCStoPubsub
    gcloud functions delete visionAPI
    gcloud functions delete videoIntelligenceAPI
    gcloud functions delete insertIntoBigQuery
  2. Delete the BigQuery table and dataset, replacing the variables with your values:

    bq --project_id [PROJECT_ID] rm -r -f [DATASET_ID]
  3. Delete the Cloud Storage buckets, replacing the variables with your values:

    gsutil -m rm -r [IV_BUCKET_NAME]
    gsutil -m rm -r [FLAGGED_BUCKET_NAME]
    gsutil -m rm -r [FILTERED_BUCKET_NAME]
    gsutil -m rm -r [STAGING_BUCKET_NAME]
  4. Delete the Cloud Pub/Sub topics, replacing the variables with your values:

    gcloud pubsub topics delete [UPLOAD_NOTIFICATION_TOPIC]
    gcloud pubsub topics delete [VISION_TOPIC_NAME]
    gcloud pubsub topics delete [VIDEOIQ_TOPIC_NAME]
    gcloud pubsub topics delete [BIGQUERY_TOPIC_NAME]

What's next

Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
Solutions