Quickstart

This page shows you how to do basic queries in Timeseries Insights API from command line bash, using a preloaded read-only public dataset, kindly provided by Kalev Leetaru.

Before you begin

  1. Make sure gcloud and gsutil are install in your system.
  2. As a Preview client, you should have received a service account key file in your email account given to us, save the key file. If not, please contact us.
  3. Assuming the key file is saved as ~/timeseries-insights-api-demo-key.json.

    $ KEY_FILE=~/timeseries-insights-api-demo-key.json
    $ PROJECT=timeseries-insights-api-demo
    $ gcloud auth activate-service-account --key-file ${KEY_FILE}
    $ alias gcurl='curl -s -H "Content-Type: application/json" -H "Authorization: Bearer $(gcloud auth print-access-token)"'
    

Peek into the data

The dataset is derived from The GDELT Project, a global database tracking world events and media coverage. This dataset contains entity mentionings in news URLs from January 2019 through April 2021. The Cloud Storage path is gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/. The URL itself is hashed into a numerical groupId. For example, the URL https://www.atlantaleader.com/news/267366057/wiseman-warriors-continue-road-swing-at-bulls is converted to the following line of JSON record:

{"groupId":"-3732616503047793456","eventTime":"2020-12-27T13:02:43+00:00","dimensions":[{"name":"EntityPERSON","stringVal":"JamesWiseman"},{"name":"EntityORGANIZATION","stringVal":"GoldenStateWarriors"},{"name":"EntityORGANIZATION","stringVal":"ChicagoBulls"},{"name":"EntityPERSON","stringVal":"PatrickWilliams"},{"name":"EntityPERSON","stringVal":"DraymondGreen"},{"name":"EntityORGANIZATION","stringVal":"NBA"},{"name":"EntityLOCATION","stringVal":"WindyCity"},{"name":"EntityPERSON","stringVal":"SteveKerr"},{"name":"EntityPERSON","stringVal":"RickCelebrini"},{"name":"EntityLOCATION","stringVal":"Milwaukee"},{"name":"EntityLOCATION","stringVal":"Brooklyn"},{"name":"EntityORGANIZATION","stringVal":"IndianaPacers"},{"name":"EntityPERSON","stringVal":"WiltChamberlain"},{"name":"EntityORGANIZATION","stringVal":"EasternConference"},{"name":"EntityLOCATION","stringVal":"Memphis"},{"name":"EntityLOCATION","stringVal":"Atlanta"},{"name":"EntityORGANIZATION","stringVal":"AtlantaHawks"},{"name":"EntityPERSON","stringVal":"TomasSatoransky"},{"name":"EntityPERSON","stringVal":"GarrettTemple"}]}

which in a more human-readable form is:

{
  "groupId":"-3732616503047793456",
  "eventTime":"2020-12-27T13:02:43+00:00",
  "dimensions":[
    {"name":"EntityPERSON","stringVal":"James Wiseman"},
    {"name":"EntityORGANIZATION","stringVal":"Golden State Warriors"},
    {"name":"EntityORGANIZATION","stringVal":"Chicago Bulls"},
    {"name":"EntityPERSON","stringVal":"Patrick Williams"},
    {"name":"EntityPERSON","stringVal":"Draymond Green"},
    {"name":"EntityORGANIZATION","stringVal":"NBA"},
    {"name":"EntityLOCATION","stringVal":"Windy City"},
    {"name":"EntityPERSON","stringVal":"Steve Kerr"},
    {"name":"EntityPERSON","stringVal":"Rick Celebrini"},
    {"name":"EntityLOCATION","stringVal":"Milwaukee"},
    {"name":"EntityLOCATION","stringVal":"Brooklyn"},
    {"name":"EntityORGANIZATION","stringVal":"Indiana Pacers"},
    {"name":"EntityPERSON","stringVal":"Wilt Chamberlain"},
    {"name":"EntityORGANIZATION","stringVal":"Eastern Conference"},
    {"name":"EntityLOCATION","stringVal":"Memphis"},
    {"name":"EntityLOCATION","stringVal":"Atlanta"},
    {"name":"EntityORGANIZATION","stringVal":"Atlanta Hawks"},
    {"name":"EntityPERSON","stringVal":"Tomas Satoransky"},
    {"name":"EntityPERSON","stringVal":"Garrett Temple"}
  ]
}

The URL is annotated with a collection of entities mentioned in the news report, The entities are of different types, including PERSON, LOCATION, ORGANIZATION, among others. They are converted to a JSON record known as Event with groupId and eventTime. The entities are represented as dimensions of name-value pairs.

List dataset

We can list the datasets loaded in the demo project:

gcurl https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT}/datasets

The result shows among others a dataset with the name webnlp-201901-202104. It is loaded from the Cloud Storage JSON files gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/webnlp-201901-202104.*. The dataNames field shows the dimension names of this dataset.

{
  "datasets": [
    ...
    {
      "name": "webnlp-201901-202104",
      "dataNames": [
        "EntityCONSUMER_GOOD",
        "EntityEVENT",
        "EntityLOCATION",
        "EntityORGANIZATION",
        "EntityOTHER",
        "EntityPERSON",
        "EntityUNKNOWN",
        "EntityWORK_OF_ART",
      ],
      "dataSources": [
        {
          "uri": "gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/webnlp-201901-202104.*"
        }
      ],
      "state": "LOADED",
      "status": {
        "message": "name: \"processed-session\"\nvalue: 83611344\n,name: \"num-items-examined\"\nvalue: 83611344\n,name: \"num-items-ingested\"\nvalue: 83611344\n"
      }
    },
    ...
  ]
}

The status field also shows the service processed over 83 million JSON records from the input files without encountering bad records.

Query for anomalies

For a simple query, try

gcurl -X POST -d @query.json https://timeseriesinsights.googleapis.com/v1/projects/${PROJECT}/datasets/webnlp-201901-202104:query

where query.json contains


{
  dimensionNames: ["EntityLOCATION"],
  testedInterval: {
    startTime: "2020-12-27T00:00:00Z",
    length: "86400s"
  },
  forecastParams: {
    forecastHistory: "2592000s",
    seasonalityHint: "WEEKLY",
    holdout: 10.0,
    minDensity: 90.0,
    maxPositiveRelativeChange: 50.0,
    maxNegativeRelativeChange: 50.0
  }
}

This query asks whether there are anomalies on 12/27/2020 (testedInterval) mentioning any location entities (dimensionNames: ["EntityLOCATION"]), comparing to the previous 30 days (forecastHistory) and assuming there might be a weekly event pattern (seasonalityHint). The other parameters control the sensitivity of anomaly detection. Please see Tutorial for more detailed explanations of these and additional parameters.

The query result looks as follows:


{
  "name": "projects/timeseries-insights-api-demo/datasets/webnlp-201901-202104",
  "anomalyDetectionResult": {
    "anomalies": [
      {
        "dimensions": [
          {
            "name": "EntityLOCATION",
            "stringVal": "Antioch"
          }
        ],
        "result": {
          "holdoutErrors": {
            "mdape": 0.2149094039290923,
            "rmd": 6.4704109664436622
          },
          "trainingErrors": {
            "mdape": 0.21925334386858764,
            "rmd": 0.61886732168403158
          },
          "forecastStats": {
            "density": "100",
            "numAnomalies": 1
          },
          "history": {},
          "testedIntervalActual": 1073,
          "testedIntervalForecast": 6.2144490651332651,
          "testedIntervalForecastLowerBound": -6.4410972440844825,
          "testedIntervalForecastUpperBound": 18.869995374351014,
          "forecast": {}
        },
        "status": {}
      }
    ]
  }
}

The anomaly detected is "Antioch", which appears 1073 times (testedIntervalActual) while the forecasted value is 6.21 (testedIntervalForecast). Other fields in the result are explained in Tutorial and the REST API documentation.

What's next

Some other examples can be found on the GDELT website.