Upload and query time-series data

This page shows you how to upload and query a time-series dataset using Google Cloud Inference API and REST.

The Cloud Inference API enables easy integration of Google Search and Analysis technologies for time-series data into your applications. The Cloud Inference API allows you to:

  • Process time-series datasets:
    • Ingestion from JSON to query-efficient internal formats
    • Remove previously submitted datasets from the system
    • List active datasets in the system submitted by your project
  • Execute inference queries over loaded datasets:
    • How do values of different types correlate? For example: with a dataset of labeled news articles, what labels are correlated with articles about vacations?
    • How does event frequency vary across time? For example: what days have an unusually high number of related events to specific topics?
    • What is the background probability for an event in the system? For example: how often do images of various sports appear in the articles?

For more information, see What is the Google Cloud Inference API?.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  4. Enable the Cloud Inference API.

    Enable the API

  5. Create a service account:

    1. In the console, go to the Create service account page.

      Go to Create service account
    2. Select your project.
    3. In the Service account name field, enter a name. The console fills in the Service account ID field based on this name.

      In the Service account description field, enter a description. For example, Service account for quickstart.

    4. Click Create and continue.
    5. To provide access to your project, grant the following role(s) to your service account: Project > Owner.

      In the Select a role list, select a role.

      For additional roles, click Add another role and add each additional role.

    6. Click Continue.
    7. Click Done to finish creating the service account.

      Do not close your browser window. You will use it in the next step.

  6. Create a service account key:

    1. In the console, click the email address for the service account that you created.
    2. Click Keys.
    3. Click Add key, and then click Create new key.
    4. Click Create. A JSON key file is downloaded to your computer.
    5. Click Close.
  7. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.

  8. Install and initialize the Google Cloud CLI.
  9. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  10. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  11. Enable the Cloud Inference API.

    Enable the API

  12. Create a service account:

    1. In the console, go to the Create service account page.

      Go to Create service account
    2. Select your project.
    3. In the Service account name field, enter a name. The console fills in the Service account ID field based on this name.

      In the Service account description field, enter a description. For example, Service account for quickstart.

    4. Click Create and continue.
    5. To provide access to your project, grant the following role(s) to your service account: Project > Owner.

      In the Select a role list, select a role.

      For additional roles, click Add another role and add each additional role.

    6. Click Continue.
    7. Click Done to finish creating the service account.

      Do not close your browser window. You will use it in the next step.

  13. Create a service account key:

    1. In the console, click the email address for the service account that you created.
    2. Click Keys.
    3. Click Add key, and then click Create new key.
    4. Click Create. A JSON key file is downloaded to your computer.
    5. Click Close.
  14. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.

  15. Install and initialize the Google Cloud CLI.

Upload a dataset

In this section, you create a Google Cloud Inference API dataset by using the createdataset REST method.

  1. Create a JSON request file with the following text, and save it as a create-gdelt-dataset.json plain text file:

    {
      "name":"gdelt_2018_04_data",
      "data_names": [
        "PageURL",
        "PageDomain",
        "PageCountry",
        "PageLanguage",
        "PageTextTheme",
        "PageTextGeo",
        "ImageURL",
        "ImagePopularityRawScore",
        "ImagePopularity",
        "ImageSafeSearch",
        "ImageLabel",
        "ImageWebEntity",
        "ImageWebEntityBestGuessLabel",
        "ImageGeoLandmark",
        "ImageFaceToneHas"
      ],
      "data_sources": [
        { "uri":"gs://inference-gdelt-demo/inference-gdelt-demo.201804.json" },
      ]
    }
    

    This JSON snippet indicates that we want to create a dataset composed of GDELT annotated news articles that we'd like to run Cloud Inference API queries over (thanks to Kalev Leetaru from GDELT for making this available!). The dataset is publicly accessible, so you will not need authentication credentials to access them. Note, however, that you will need authentication credentials to use the API.

  2. Check that you have an authorization token:

    gcloud auth application-default print-access-token
      
  3. Use curl to make a createdataset request, passing it the access token, and the filename of the JSON request you set up in step 1:

    curl -s -H "Content-Type: application/json" \
      -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
      https://infer.googleapis.com/v1/projects/PROJECT_NUMBER/datasets \
      -d @create-gdelt-dataset.json
      

    Note that to pass a filename to curl you use the -d option (for "data") and precede the filename with an @ sign. This file should be in the same directory in which you execute the curl command.

    You should see a response similar to the following:

    {
      "name": "gdelt_2018_04_data",
      "state": "STATE_PENDING"
    }
    

Get the status of uploaded datasets

You can get the status for all datasets that you sent for processing to the Cloud Inference API from your client project by using the ListDataSets REST method.

  1. Get an access token like we've done for Create dataset above.

  2. Use curl to make a listdatasets request, passing it the access token:

    curl -s -H "Content-Type: application/json" \
      -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
      https://infer.googleapis.com/v1/projects/PROJECT_NUMBER/datasets
      

Query a loaded dataset

  1. Create a JSON request file with the following text, and save it as a query-gdelt-dataset.json plain text file:

    {
      "name": "gdelt_2018_04_data",
      "queries": [{
        "query": {
          "type": "TYPE_TERM",
          "term": {
          "name": "ImageWebEntity",
          "value": "Vacation"
          }
        },
        "distribution_configs": {
          "bgprob_exp": 0.7,
          "data_name": "ImageLabel",
          "max_result_entries": 5,
        }
      }]
    }

    This JSON snippet indicates that we want to query the 'gdelt_2018_04_data' dataset we previously submitted to the Cloud Inference API with a createdataset request that was reported as being in STATE_LOADED. The query itself indicates that we want to aggregate ImageLabels from all articles that contain 'Vacation' as an entity and return the top 5 most highly scored labels.

  2. Get an access token like we've done for Create dataset above.

  3. Use curl to make a query request, passing it the access token and the filename of the JSON request you set up in step 1:

    curl -s -H "Content-Type: application/json" \
      -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
      https://infer.googleapis.com/v1/projects/PROJECT_NUMBER/datasets/gdelt_2018_04_data:query \
      -d @query-gdelt-dataset.json
      

    Note that to pass a filename to curl you use the -d option (for "data") and precede the filename with an @ sign. This file should be in the same directory in which you execute the curl command.

    You should see a response similar to the following:

    {
      "results": [
        {
          "distributions": [
            {
              "dataName": "ImageLabel",
              "matchedGroupCount": "39124",
              "totalGroupCount": "7616785",
              "entries": [
                {
                  "value": "ImageLabel=vacation",
                  "score": 31.515648,
                  "matchedGroupCount": "37806",
                  "totalGroupCount": "52331"
                },
                {
                  "value": "ImageLabel=beach",
                  "score": 15.222198,
                  "matchedGroupCount": "6825",
                  "totalGroupCount": "12527"
                },
                {
                  "value": "ImageLabel=summer",
                  "score": 13.984301,
                  "matchedGroupCount": "6704",
                  "totalGroupCount": "13780"
                },
                {
                  "value": "ImageLabel=travel",
                  "score": 13.344194,
                  "matchedGroupCount": "6837",
                  "totalGroupCount": "15158"
                },
                {
                  "value": "ImageLabel=sun_tanning",
                  "score": 12.208676,
                  "matchedGroupCount": "2048",
                  "totalGroupCount": "2999"
                }
              ]
            }
          ]
        }
      ]
    }
      

    The returned response is a distribution of ImageLabel events scored by correlation to ImageWebEntity=Vacation. The highly scoring events are positively correlated with vacation-themed articles.

What's next