Tutorial


We use a small dataset provided by Kalev Leetaru to illustrate the Timeseries Insights API. The dataset is derived from The GDELT Project, a global database tracking world events and media coverage. This dataset contains entity mentionings in news URLs in April 2019.

Objectives

  • Learn the data format for Timeseries Insights API.
  • Learn how to create, query, update and delete datasets.

Before you begin

Set up a Cloud project and enable Timeseries Insights API following Setup for Full Access.

Tutorial dataset

The dataset includes entity annotations of locations, organizations, persons, among others.

The Timeseries Insights API takes JSON format inputs. A sample Event for this dataset is

{
  "groupId":"-6180929807044612746",
  "dimensions":[{"name":"EntityORGANIZATION","stringVal":"Medina Gazette"}],
  "eventTime":"2019-04-05T08:00:00+00:00"
}

Each event must have an eventTime field for the event timestamp. It is preferred each event also has a long-valued groupId to mark related events. Event properties are included as dimensions, each of which has a name and one of stringVal, boolVal, longVal, or doubleVal.

{"groupId":"-6180929807044612746","dimensions":[{"name":"EntityORGANIZATION","stringVal":"Medina Gazette"}],"eventTime":"2019-04-05T08:00:00+00:00"}

List datasets

projects.locations.datasets.list shows all datasets under ${PROJECT_ID}. gcurl is an alias and PROJECT_ID is an environment variable, both set up in Getting Started.

gcurl https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets

The result is a JSON string like

{
  "datasets": [
    {
      "name": "example",
      "state": "LOADED",
      ...
    },
    {
      "name": "dataset_tutorial",
      "state": "LOADING",
      ...
    }
  ]
}

The results show the datasets currently under the project. The state field indicates whether the dataset is ready to be used. When a dataset is just created, it is in state LOADING until the indexing completes, then transitions to LOADED state. If any errors occur during creation and indexing, it will be in FAILED state. The results also include the complete dataset information from the original create request.

Create dataset

projects.locations.datasets.create adds a new dataset to the project.

gcurl -X POST -d @create.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets

where create.json contains:

{
  name: "dataset_tutorial",
  dataNames: [
    "EntityCONSUMER_GOOD",
    "EntityEVENT",
    "EntityLOCATION",
    "EntityORGANIZATION",
    "EntityOTHER",
    "EntityPERSON",
    "EntityUNKNOWN",
    "EntityWORK_OF_ART",
  ],
  dataSources: [
    {uri: "gs://data.gdeltproject.org/blog/2021-timeseries-insights-api/datasets/webnlp-201904.json"}
  ]
}

This request creates a dataset named dataset_tutorial from GCS dataSources, which contain Event data in JSON format. Only dimensions listed in dataNames are indexed and used by the system.

The create request returns success if it is accepted by the API server. The dataset will be in LOADING state until indexing completes, then the state becomes LOADED, after which the dataset can start accepting queries and updates if any.

Query dataset

projects.locations.datasets.query performs anomaly detection queries.

gcurl -X POST -d @query.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets/dataset_tutorial:query

where query.json contains:

{
  "detectionTime": "2019-04-15T00:00:00Z",
  "numReturnedSlices": 5,
  "slicingParams": {
    "dimensionNames": ["EntityLOCATION"]
  },
  "timeseriesParams": {
    "forecastHistory": "1209600s",
    "granularity": "86400s"
  },
  "forecastParams": {
    "noiseThreshold": 100.0
  },
}

The query result looks like follows:

{
  "name": "projects/timeseries-staging/locations/us-central1/datasets/webnlp-201901-202104-dragosd",
  "slices": [
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "Notre Dame"
        }
      ],
      "detectionPointActual": 1514,
      "detectionPointForecast": 15.5,
      "expectedDeviation": 5.5,
      "anomalyScore": 14.203791469194313,
      "status": {}
    },
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "Seine"
        }
      ],
      "detectionPointActual": 1113,
      "detectionPointForecast": 14,
      "expectedDeviation": 15,
      "anomalyScore": 9.5565217391304351,
      "status": {}
    },
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "Ile de la Cite"
        }
      ],
      "detectionPointActual": 852,
      "detectionPointForecast": 0,
      "expectedDeviation": 1,
      "anomalyScore": 8.435643564356436,
      "status": {}
    },
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "Paris"
        }
      ],
      "detectionPointActual": 1461,
      "detectionPointForecast": 857,
      "expectedDeviation": 441,
      "anomalyScore": 1.1164510166358594,
      "status": {}
    },
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "France"
        }
      ],
      "detectionPointActual": 1098,
      "detectionPointForecast": 950.5,
      "expectedDeviation": 476.5,
      "anomalyScore": 0.25585429314830876,
      "status": {}
    }
  ]
}

Streaming update

projects.locations.datasets.appendEvents adds Event records in a streaming fashion.

gcurl -X POST -d @append.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets/dataset_tutorial:appendEvents

where append.json contains (please replace eventTime to a timestamp close to the present time):

{
  events: [
    {
      "groupId":"1324354349507023708",
      "dimensions":[{"name":"EntityPERSON","stringVal":"Jason Marsalis"}],
      "eventTime":"2022-02-16T15:45:00+00:00"
    },{
      "groupId":"1324354349507023708",
      "dimensions":[{"name":"EntityORGANIZATION","stringVal":"WAFA"}],
      "eventTime":"2022-02-16T04:00:00+00:00"
    }
  ]
}

Streamed updates get indexed near-real time so changes can respond quickly in query results. All events sent by a single projects.locations.datasets.appendEvents request must have the same groupdId.

Delete dataset

projects.locations.datasets.delete marks the dataset for deletion.

gcurl -X DELETE https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets/dataset_tutorial

The request returns immediately, and the dataset will not accept additional queries or updates. It may take sometime before the data is completely removed from the service, after which List datasets will not return this dataset.

What's next

Some other examples can be found on the GDELT website by searching for "Timeseries Insights API".