Tutorial

We use a small dataset provided by Kalev Leetaru to illustrate the Timeseries Insights API. The dataset is derived from The GDELT Project, a global database tracking world events and media coverage. This dataset contains entity mentionings in news URLs in April 2019.

Objectives

  • Learn the data format for Timeseries Insights API.
  • Learn how to create, query, update and delete datasets.

Costs

There is no cost for Preview.

Before you begin

Set up a Cloud project and enable Timeseries Insights API following Setup for Full Access.

Tutorial dataset

The dataset includes entity annotations of locations, organizations, persons, among others.

The Timeseries Insights API takes JSON format inputs. A sample Event for this dataset is

{
  "groupId":"-6180929807044612746",
  "dimensions":[{"name":"EntityORGANIZATION","stringVal":"Medina Gazette"}],
  "eventTime":"2019-04-05T08:00:00+00:00"
}

Each event must have an eventTime field for the event timestamp. It is preferred each event also has a long-valued groupId to mark related events. Event properties are included as dimensions, each of which has a name and one of stringVal, boolVal, longVal, or doubleVal.

NOTE: Google Cloud APIs accept both camel case (like camelCase) and snake case (like snake_case) for JSON field names. The documentations are mostly written as camel case.

NOTE: Since JSON long values (numbers) are actually float values with only integer precisions, both groupId and longVal are effectively limited to 53 binary digits if JSON uses numbers. To provide int64 data, the JSON value should be quoted as a string. A groupId is typically a numerical ID or generated with a deterministic hash function, satisfying the above restriction.

NOTE: The name field is supposed to be alphanumerical values including '_'. Special characters including the space are not supported. The stringVal field supports valid Unicode characters.

NOTE: When reading from a static Google Cloud Storage data source each JSON event is supposed to be a single line (also known as JSON-line format) as follows:

{"groupId":"-6180929807044612746","dimensions":[{"name":"EntityORGANIZATION","stringVal":"Medina Gazette"}],"eventTime":"2019-04-05T08:00:00+00:00"}

List datasets

projects.datasets.list shows all datasets under ${PROJECT_ID}. Note gcurl is an alias and PROJECT_ID is an environment variable, both set up in Getting Started.

gcurl https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets

The result is a JSON string like

{
  "datasets": [
    {
      "name": "example",
      "state": "LOADED",
      ...
    },
    {
      "name": "dataset_tutorial",
      "state": "LOADING",
      ...
    }
  ]
}

The results show the datasets currently under the project. The state field indicates whether the dataset is ready to be used. When a dataset is just created, it is in state LOADING until the indexing completes, then transitions to LOADED state. If any errors occur during creation and indexing, it will be in FAILED state. The result also include the complete dataset information from the original create request.

Create dataset

projects.datasets.create adds a new dataset to the project.

gcurl -X POST -d @create.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets

where create.json contains:

{
  name: "dataset_tutorial",
  dataNames: [
    "EntityCONSUMER_GOOD",
    "EntityEVENT",
    "EntityLOCATION",
    "EntityORGANIZATION",
    "EntityOTHER",
    "EntityPERSON",
    "EntityUNKNOWN",
    "EntityWORK_OF_ART",
  ],
  dataSources: [
    {uri: "gs://data.gdeltproject.org/blog/2021-timeseries-insights-api/datasets/webnlp-201904.json"}
  ]
}

This request create a dataset named dataset_tutorial from GCS dataSources, which contain Event data in JSON format. Only dimensions listed in dataNames are indexed and used by the system.

The create request returns success if it is accepted by the API server. The dataset will be in LOADING state until indexing completes, then the state becomes LOADED, after which the dataset can start accepting queries and updates if any.

Query dataset

projects.datasets.query performs anomaly detection queries.

gcurl -X POST -d @query.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets/dataset_tutorial:query

where query.json contains:

{
  "detectionTime": "2019-04-15T00:00:00Z",
  "slicingParams": {
    "dimensionNames": ["EntityLOCATION"]
  },
  "timeseriesParams": {
    "forecastHistory": "1209600s",
    "granularity": "86400s"
  },
  "forecastParams": {
    "sensitivity": 0.1,
    "noiseThreshold": 100.0
  },
  "returnNonAnomalies": true
}

Note: For information about how to configure your query and how to interpret the results, see the Query Building Guide.

The query result looks like follows:

{
  "name": "projects/myproject/datasets/dataset_tutorial",
  "anomalyDetectionResult": {
    "anomalies": [
      {
        "dimensions": [
          {
            "name": "EntityLOCATION",
            "stringVal": "Saint Julien Les Pauvres"
          }
        ],
        "result": {
          "holdoutErrors": {},
          "trainingErrors": {},
          "forecastStats": {},
          "detectionPointActual": 167,
          "detectionPointForecastLowerBound": -100,
          "detectionPointForecastUpperBound": 100,
          "label": "ANOMALY"
        },
        "status": {}
      },
      {
        "dimensions": [
          {
            "name": "EntityLOCATION",
            "stringVal": "Steamfitters Local 449"
          }
        ],
        "result": {
          "holdoutErrors": {},
          "trainingErrors": {},
          "forecastStats": {},
          "detectionPointActual": 164,
          "detectionPointForecastLowerBound": -100,
          "detectionPointForecastUpperBound": 100,
          "label": "ANOMALY"
        },
        "status": {}
      },
      {
        "dimensions": [
          {
            "name": "EntityLOCATION",
            "stringVal": "Notre Dame"
          }
        ],
        "result": {
          "holdoutErrors": {
            "mdape": 0.10975609756097561,
            "rmd": 0.10975609756097561
          },
          "trainingErrors": {
            "mdape": 0.27244421380384021,
            "rmd": 0.73122529644268774
          },
          "forecastStats": {
            "density": "100"
          },
          "detectionPointActual": 1514,
          "detectionPointForecast": 13.666666666666666,
          "detectionPointForecastLowerBound": -212.33681840089878,
          "detectionPointForecastUpperBound": 239.67015173423209,
          "label": "ANOMALY"
        },
        "status": {}
      },
      {
        "dimensions": [
          {
            "name": "EntityLOCATION",
            "stringVal": "Ile de la Cite"
          }
        ],
        "result": {
          "holdoutErrors": {},
          "trainingErrors": {
            "mdape": 1,
            "rmd": 1.153846153846154
          },
          "forecastStats": {
            "density": "23"
          },
          "detectionPointActual": 852,
          "detectionPointForecastLowerBound": -171.60209104053922,
          "detectionPointForecastUpperBound": 171.60209104053922,
          "label": "ANOMALY"
        },
        "status": {}
      },
      {
        "dimensions": [
          {
            "name": "EntityLOCATION",
            "stringVal": "Notre Dame Cathedral"
          }
        ],
        "result": {
          "holdoutErrors": {
            "mdape": 1.5000000000000002,
            "rmd": 1.5
          },
          "trainingErrors": {
            "mdape": 0.20384615384615384,
            "rmd": 0.31250000000000006
          },
          "forecastStats": {
            "density": "92"
          },
          "detectionPointActual": 274,
          "detectionPointForecast": 1.3333333333333333,
          "detectionPointForecastLowerBound": -207.469454720719,
          "detectionPointForecastUpperBound": 210.13612138738566,
          "label": "ANOMALY"
        },
        "status": {}
      },
      {
        "dimensions": [
          {
            "name": "EntityLOCATION",
            "stringVal": "Seine River"
          }
        ],
        "result": {
          "holdoutErrors": {
            "mdape": 0.65789473684210531,
            "rmd": 0.65789473684210531
          },
          "trainingErrors": {
            "mdape": 0.56003289473684215,
            "rmd": 0.52863436123348029
          },
          "forecastStats": {
            "density": "85"
          },
          "detectionPointActual": 1113,
          "detectionPointForecast": 19,
          "detectionPointForecastLowerBound": -539.01045520269611,
          "detectionPointForecastUpperBound": 577.01045520269611,
          "label": "ANOMALY"
        },
        "status": {}
      }
    ]
  }
}

Streaming update

projects.datasets.appendEvents adds Event records in a streaming fashion.

gcurl -X POST -d @append.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets/dataset_tutorial:appendEvents

where append.json contains (please replace eventTime to a timestamp close to the present time):

{
  events: [
    {
      "groupId":"1324354349507023708",
      "dimensions":[{"name":"EntityPERSON","stringVal":"Jason Marsalis"}],
      "eventTime":"2022-02-16T15:45:00+00:00"
    },{
      "groupId":"1324354349507023708",
      "dimensions":[{"name":"EntityORGANIZATION","stringVal":"WAFA"}],
      "eventTime":"2022-02-16T04:00:00+00:00"
    }
  ]
}

Streamed updates get indexed near-real time so changes can respond quickly in query results. All events sent by a single projects.datasets.appendEvents request must have the same groupdId.

Delete dataset

projects.datasets.delete marks the dataset for deletion.

gcurl -X DELETE https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT_ID}/datasets/dataset_tutorial

The request returns immediately, and the dataset will not accept additional queries or updates. It may take sometime before the data is completely removed from the service, after which List datasets will not return this dataset.

What's next

Some other examples can be found on the GDELT website by searching for "Timeseries Insights API".