Quickstart

This page shows you how to do basic queries in Timeseries Insights API from command line bash, using a preloaded read-only public dataset, kindly provided by Kalev Leetaru.

Before you begin

  1. Make sure gcloud and gsutil are installed in your system.
  2. As a Preview client, you should have received a service account key file in your email account given to us, save the key file. The service account key expires after 90 days. Please contact us if you did not receive thei key or if the key has expired. NOTE: you do not need this key for your own project. This key is only for accessing the read-only demo project.
  3. Assuming the key file is saved as ~/timeseries-insights-api-demo-readonly-key.json.

    KEY_FILE=~/timeseries-insights-api-demo-readonly-key.json
    DEMO_PROJECT=timeseries-insights-api-demo
    gcloud auth activate-service-account --key-file ${KEY_FILE}
    alias gcurl-demo='curl -s -H "Content-Type: application/json" -H "Authorization: Bearer $(gcloud auth print-access-token)"'
    

Peek into the data

The dataset is derived from The GDELT Project, a global database tracking world events and media coverage. This dataset contains entity mentionings in news URLs from January 2019 through April 2021. The Cloud Storage path is gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/. The URL itself is hashed into a numerical groupId. For example, the URL https://www.atlantaleader.com/news/267366057/wiseman-warriors-continue-road-swing-at-bulls is converted to the following line of JSON record:

{"groupId":"-3732616503047793456","eventTime":"2020-12-27T13:02:43+00:00","dimensions":[{"name":"EntityPERSON","stringVal":"JamesWiseman"},{"name":"EntityORGANIZATION","stringVal":"GoldenStateWarriors"},{"name":"EntityORGANIZATION","stringVal":"ChicagoBulls"},{"name":"EntityPERSON","stringVal":"PatrickWilliams"},{"name":"EntityPERSON","stringVal":"DraymondGreen"},{"name":"EntityORGANIZATION","stringVal":"NBA"},{"name":"EntityLOCATION","stringVal":"WindyCity"},{"name":"EntityPERSON","stringVal":"SteveKerr"},{"name":"EntityPERSON","stringVal":"RickCelebrini"},{"name":"EntityLOCATION","stringVal":"Milwaukee"},{"name":"EntityLOCATION","stringVal":"Brooklyn"},{"name":"EntityORGANIZATION","stringVal":"IndianaPacers"},{"name":"EntityPERSON","stringVal":"WiltChamberlain"},{"name":"EntityORGANIZATION","stringVal":"EasternConference"},{"name":"EntityLOCATION","stringVal":"Memphis"},{"name":"EntityLOCATION","stringVal":"Atlanta"},{"name":"EntityORGANIZATION","stringVal":"AtlantaHawks"},{"name":"EntityPERSON","stringVal":"TomasSatoransky"},{"name":"EntityPERSON","stringVal":"GarrettTemple"}]}

which in a more human-readable form is:

{
  "groupId":"-3732616503047793456",
  "eventTime":"2020-12-27T13:02:43+00:00",
  "dimensions":[
    {"name":"EntityPERSON","stringVal":"James Wiseman"},
    {"name":"EntityORGANIZATION","stringVal":"Golden State Warriors"},
    {"name":"EntityORGANIZATION","stringVal":"Chicago Bulls"},
    {"name":"EntityPERSON","stringVal":"Patrick Williams"},
    {"name":"EntityPERSON","stringVal":"Draymond Green"},
    {"name":"EntityORGANIZATION","stringVal":"NBA"},
    {"name":"EntityLOCATION","stringVal":"Windy City"},
    {"name":"EntityPERSON","stringVal":"Steve Kerr"},
    {"name":"EntityPERSON","stringVal":"Rick Celebrini"},
    {"name":"EntityLOCATION","stringVal":"Milwaukee"},
    {"name":"EntityLOCATION","stringVal":"Brooklyn"},
    {"name":"EntityORGANIZATION","stringVal":"Indiana Pacers"},
    {"name":"EntityPERSON","stringVal":"Wilt Chamberlain"},
    {"name":"EntityORGANIZATION","stringVal":"Eastern Conference"},
    {"name":"EntityLOCATION","stringVal":"Memphis"},
    {"name":"EntityLOCATION","stringVal":"Atlanta"},
    {"name":"EntityORGANIZATION","stringVal":"Atlanta Hawks"},
    {"name":"EntityPERSON","stringVal":"Tomas Satoransky"},
    {"name":"EntityPERSON","stringVal":"Garrett Temple"}
  ]
}

The URL is annotated with a collection of entities mentioned in the news report, The entities are of different types, including PERSON, LOCATION, ORGANIZATION, among others. They are converted to a JSON record known as Event with groupId and eventTime. The entities are represented as dimensions of name-value pairs.

List dataset

We can list the datasets loaded in the demo project:

gcurl-demo https://timeseriesinsights.googleapis.com/v1/projects/${DEMO_PROJECT}/datasets

The result shows among others a dataset with the name webnlp-201901-202104. It is loaded from the Cloud Storage JSON files gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/webnlp-201901-202104.*. The dataNames field shows the dimension names of this dataset.

{
  "datasets": [
    ...
    {
      "name": "webnlp-201901-202104",
      "dataNames": [
        "EntityCONSUMER_GOOD",
        "EntityEVENT",
        "EntityLOCATION",
        "EntityORGANIZATION",
        "EntityOTHER",
        "EntityPERSON",
        "EntityUNKNOWN",
        "EntityWORK_OF_ART",
      ],
      "dataSources": [
        {
          "uri": "gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/webnlp-201901-202104.*"
        }
      ],
      "state": "LOADED",
      "status": {
        "message": "name: \"processed-session\"\nvalue: 83611344\n,name: \"num-items-examined\"\nvalue: 83611344\n,name: \"num-items-ingested\"\nvalue: 83611344\n"
      }
    },
    ...
  ]
}

The status field also shows the service processed over 83 million JSON records from the input files without encountering bad records.

Retrieve and evaluate time series

The daily numbers of news URLs matching a slice forms a time series.
To check if there is anomaly for a given slice, we can use the "evaluateSlice" request:

gcurl-demo -X POST -d @eval.json https://timeseriesinsights.googleapis.com/v1/projects/${DEMO_PROJECT}/datasets/webnlp-201901-202104:evaluateSlice

where eval.json contains

{
  "pinnedDimensions": [
    {"name": "EntityLOCATION", "stringVal": "Paris"},
  ],
  "detectionTime": "2020-12-27T00:00:00Z",
  "timeseriesParams": {
    "forecastHistory": "7776000s",
    "granularity": "86400s",
    "minDensity": 90.0,
  },
  "forecastParams": {
    "sensitivity": 0.0,
    "noiseThreshold": 50.0,
    "seasonalityHint": "WEEKLY",
  },
}

This request asks whether there are anomalies on 12/27/2020 (detectionTime) mentioning any location entities (dimensionNames: ["EntityLOCATION"]), comparing to the previous 90 days (forecastHistory) assuming there might be a weekly event pattern (seasonalityHint). The other parameters control the sensitivity of anomaly detection.

NOTE: For information about how to configure your request (including the "query" API below) and how to interpret the results, see the Query Building Guide.

The result shows the history leading to the detection time, training and holdout errors, the predicted value on 2020/12/27 with bound, the actual value, and whether it is considered an anomaly.

{
  "holdoutErrors": {
    "mdape": 0.23611832116562845,
    "rmd": 0.24194373614302581
  },
  "trainingErrors": {
    "mdape": 0.09292655469885007,
    "rmd": 0.11135237258071833
  },
  "forecastStats": {
    "density": "100",
    "numAnomaliesInHoldout": 0
  },
  "history": {
    "point": [
      {
        "time": "2020-09-28T00:00:00Z",
        "value": 1232
      },
      {
        "time": "2020-09-29T00:00:00Z",
        "value": 1360
      },
      {
        "time": "2020-09-30T00:00:00Z",
        "value": 1504
      },
      {
        "time": "2020-10-01T00:00:00Z",
        "value": 992
      },
      {
        "time": "2020-10-02T00:00:00Z",
        "value": 1056
      },
      {
        "time": "2020-10-03T00:00:00Z",
        "value": 576
      },

      ......

      ......

      ......

      {
        "time": "2020-12-16T00:00:00Z",
        "value": 1408
      },
      {
        "time": "2020-12-17T00:00:00Z",
        "value": 1408
      },
      {
        "time": "2020-12-18T00:00:00Z",
        "value": 1216
      },
      {
        "time": "2020-12-19T00:00:00Z",
        "value": 1216
      },
      {
        "time": "2020-12-20T00:00:00Z",
        "value": 768
      },
      {
        "time": "2020-12-21T00:00:00Z",
        "value": 1344
      },
      {
        "time": "2020-12-22T00:00:00Z",
        "value": 960
      },
      {
        "time": "2020-12-23T00:00:00Z",
        "value": 1600
      },
      {
        "time": "2020-12-24T00:00:00Z",
        "value": 1328
      },
      {
        "time": "2020-12-25T00:00:00Z",
        "value": 1104
      },
      {
        "time": "2020-12-26T00:00:00Z",
        "value": 1088
      },
      {
        "time": "2020-12-27T00:00:00Z",
        "value": 1184
      }
    ]
  },
  "forecast": {
    "point": [
      {
        "time": "2020-12-27T00:00:00Z",
        "value": 1497.2710947663404
      }
    ]
  },
  "detectionPointActual": 1184,
  "detectionPointForecast": 1497.2710947663404,
  "detectionPointForecastLowerBound": -507.98518847755577,
  "detectionPointForecastUpperBound": 3502.5273780102366,
  "label": "WITHIN_EXPECTED_BOUNDS"
}

Varying granularity or timeInterval will result in different time series and thus different evaluation results. Similarly for varying pinnedDimensions.

We can also have multiple pinned dimension values. To check how many times "Joe Biden" and "Paris" are mentioned in the same news report,

{
  "pinnedDimensions": [
    {"name": "EntityLOCATION", "stringVal": "Paris"},
    {"name": "EntityPERSON", "stringVal": "Joe Biden"},
  ],
  "detectionTime": "2020-12-27T00:00:00Z",
  "timeseriesParams": {
    "forecastHistory": "7776000s",
    "granularity": "86400s",
    "minDensity": 90.0,
  },
  "forecastParams": {
    "sensitivity": 0.0,
    "noiseThreshold": 50.0,
    "seasonalityHint": "WEEKLY",
  },
}

results in

{
  "holdoutErrors": {
    "mdape": 0.34768311893590653,
    "rmd": 0.4301765678313702
  },
  "trainingErrors": {
    "mdape": 0.23388093110500341,
    "rmd": 0.26978250028011735
  },
  "forecastStats": {
    "density": "100",
    "numAnomaliesInHoldout": 0
  },
  "history": {
    "point": [
      {
        "time": "2020-09-28T00:00:00Z",
        "value": 16
      },
      {
        "time": "2020-09-29T00:00:00Z",
        "value": 71
      },
      {
        "time": "2020-09-30T00:00:00Z",
        "value": 166
      },
      {
        "time": "2020-10-01T00:00:00Z",
        "value": 34
      },
      {
        "time": "2020-10-02T00:00:00Z",
        "value": 35
      },
      {
        "time": "2020-10-03T00:00:00Z",
        "value": 31
      },

      ......

      ......

      ......


      {
        "time": "2020-12-16T00:00:00Z",
        "value": 112
      },
      {
        "time": "2020-12-17T00:00:00Z",
        "value": 200
      },
      {
        "time": "2020-12-18T00:00:00Z",
        "value": 216
      },
      {
        "time": "2020-12-19T00:00:00Z",
        "value": 104
      },
      {
        "time": "2020-12-20T00:00:00Z",
        "value": 216
      },
      {
        "time": "2020-12-21T00:00:00Z",
        "value": 352
      },
      {
        "time": "2020-12-22T00:00:00Z",
        "value": 80
      },
      {
        "time": "2020-12-23T00:00:00Z",
        "value": 216
      },
      {
        "time": "2020-12-24T00:00:00Z",
        "value": 219
      },
      {
        "time": "2020-12-25T00:00:00Z",
        "value": 38
      },
      {
        "time": "2020-12-26T00:00:00Z",
        "value": 62
      },
      {
        "time": "2020-12-27T00:00:00Z",
        "value": 58
      }
    ]
  },
  "forecast": {
    "point": [
      {
        "time": "2020-12-27T00:00:00Z",
        "value": 322.59856295569853
      }
    ]
  },
  "detectionPointActual": 58,
  "detectionPointForecast": 322.59856295569853,
  "detectionPointForecastLowerBound": -449.45171203352561,
  "detectionPointForecastUpperBound": 1094.6488379449227,
  "label": "WITHIN_EXPECTED_BOUNDS"
}

Query for anomalies

Although we can iterate through all possible "EntityLOCATION" to check anomalies using the above API, it is more convenient just asking if there are any "EntityLOCATION" showing anomalous behavior. This is conveniently achieved by the main "query" API, just replacing "pinnedDimensions" with "slicingParams" with "EntityLOCATION" as "dimensionNames" without specifying the value.

gcurl-demo -X POST -d @query.json https://timeseriesinsights.googleapis.com/v1/projects/${DEMO_PROJECT}/datasets/webnlp-201901-202104:query

where query.json contains

{
  "detectionTime": "2020-12-27T00:00:00Z",
  "slicingParams": {
    "dimensionNames": ["EntityLOCATION"]
  },
  "timeseriesParams": {
    "forecastHistory": "7776000s",
    "granularity": "86400s",
    "minDensity": 90.0
  },
  "forecastParams": {
    "sensitivity": 0.0,
    "noiseThreshold": 800.0,
    "seasonalityHint": "WEEKLY"
  }
}

The query result looks as follows:

{
  "name": "projects/timeseries-insights-api-demo/datasets/webnlp-201901-202104",
  "anomalyDetectionResult": {
    "anomalies": [
      {
        "dimensions": [
          {
            "name": "EntityLOCATION",
            "stringVal": "Antioch"
          }
        ],
        "result": {
          "holdoutErrors": {
            "mdape": 0.51443463913957355,
            "rmd": 2.3282947076265006
          },
          "trainingErrors": {
            "mdape": 0.31158819685121231,
            "rmd": 0.69097328122559232
          },
          "forecastStats": {
            "density": "100",
            "numAnomaliesInHoldout": 0
          },
          "detectionPointActual": 1024,
          "detectionPointForecast": 5.4668274167407818,
          "detectionPointForecastLowerBound": -816.15511437348277,
          "detectionPointForecastUpperBound": 827.08876920696434,
          "label": "ANOMALY"
        },
        "status": {}
      },
      {
        "dimensions": [
          {
            "name": "EntityLOCATION",
            "stringVal": "Durham Region"
          }
        ],
        "result": {
          "holdoutErrors": {
            "mdape": 0.302325089101291,
            "rmd": 0.66534366135940781
          },
          "trainingErrors": {
            "mdape": 0.13565782176320854,
            "rmd": 0.20726891051334043
          },
          "forecastStats": {
            "density": "100",
            "numAnomaliesInHoldout": 0
          },
          "detectionPointActual": 852,
          "detectionPointForecast": 17.204086143207551,
          "detectionPointForecastLowerBound": -813.7661071641968,
          "detectionPointForecastUpperBound": 848.174279450612,
          "label": "ANOMALY"
        },
        "status": {}
      }
    ]
  }
}

The anomalies detected include "Antioch", which appears 1024 times (detectionPointActual) with a forecasted value of 5.47 (detectionPointForecast), and "Durham Region", which appears 852 times (detectionPointActual) with a forecasted value of 17.20 (detectionPointForecast). For descriptions of the remaining fields in the result, see the Tutorial and the REST API documentation.

What's next

Some other examples can be found on the GDELT website by searching for "Timeseries Insights API".