Stay organized with collections Save and categorize content based on your preferences.

Quickstart

This page shows you how to do basic queries in Timeseries Insights API from command line bash, using a preloaded read-only public dataset, kindly provided by Kalev Leetaru.

Before you begin

  1. Make sure gcloud and gsutil are installed in your system.
  2. As a Preview client, you should have received a service account key file in your email account given to us, save the key file. The service account key expires after 90 days. Please contact us if you did not receive thei key or if the key has expired. NOTE: you do not need this key for your own project. This key is only for accessing the read-only demo project.
  3. Assuming the key file is saved as ~/timeseries-insights-api-demo-readonly-key.json.

    KEY_FILE=~/timeseries-insights-api-demo-readonly-key.json
    DEMO_PROJECT=timeseries-insights-api-demo
    gcloud auth activate-service-account --key-file ${KEY_FILE}
    alias gcurl-demo='curl -s -H "Content-Type: application/json" -H "Authorization: Bearer $(gcloud auth print-access-token)"'
    

Peek into the data

The dataset is derived from The GDELT Project, a global database tracking world events and media coverage. This dataset contains entity mentionings in news URLs from January 2019 through April 2021. The Cloud Storage path is gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/. The URL itself is hashed into a numerical groupId. For example, the URL https://www.atlantaleader.com/news/267366057/wiseman-warriors-continue-road-swing-at-bulls is converted to the following line of JSON record:

{"groupId":"-3732616503047793456","eventTime":"2020-12-27T13:02:43+00:00","dimensions":[{"name":"EntityPERSON","stringVal":"JamesWiseman"},{"name":"EntityORGANIZATION","stringVal":"GoldenStateWarriors"},{"name":"EntityORGANIZATION","stringVal":"ChicagoBulls"},{"name":"EntityPERSON","stringVal":"PatrickWilliams"},{"name":"EntityPERSON","stringVal":"DraymondGreen"},{"name":"EntityORGANIZATION","stringVal":"NBA"},{"name":"EntityLOCATION","stringVal":"WindyCity"},{"name":"EntityPERSON","stringVal":"SteveKerr"},{"name":"EntityPERSON","stringVal":"RickCelebrini"},{"name":"EntityLOCATION","stringVal":"Milwaukee"},{"name":"EntityLOCATION","stringVal":"Brooklyn"},{"name":"EntityORGANIZATION","stringVal":"IndianaPacers"},{"name":"EntityPERSON","stringVal":"WiltChamberlain"},{"name":"EntityORGANIZATION","stringVal":"EasternConference"},{"name":"EntityLOCATION","stringVal":"Memphis"},{"name":"EntityLOCATION","stringVal":"Atlanta"},{"name":"EntityORGANIZATION","stringVal":"AtlantaHawks"},{"name":"EntityPERSON","stringVal":"TomasSatoransky"},{"name":"EntityPERSON","stringVal":"GarrettTemple"}]}

which in a more human-readable form is:

{
  "groupId":"-3732616503047793456",
  "eventTime":"2020-12-27T13:02:43+00:00",
  "dimensions":[
    {"name":"EntityPERSON","stringVal":"James Wiseman"},
    {"name":"EntityORGANIZATION","stringVal":"Golden State Warriors"},
    {"name":"EntityORGANIZATION","stringVal":"Chicago Bulls"},
    {"name":"EntityPERSON","stringVal":"Patrick Williams"},
    {"name":"EntityPERSON","stringVal":"Draymond Green"},
    {"name":"EntityORGANIZATION","stringVal":"NBA"},
    {"name":"EntityLOCATION","stringVal":"Windy City"},
    {"name":"EntityPERSON","stringVal":"Steve Kerr"},
    {"name":"EntityPERSON","stringVal":"Rick Celebrini"},
    {"name":"EntityLOCATION","stringVal":"Milwaukee"},
    {"name":"EntityLOCATION","stringVal":"Brooklyn"},
    {"name":"EntityORGANIZATION","stringVal":"Indiana Pacers"},
    {"name":"EntityPERSON","stringVal":"Wilt Chamberlain"},
    {"name":"EntityORGANIZATION","stringVal":"Eastern Conference"},
    {"name":"EntityLOCATION","stringVal":"Memphis"},
    {"name":"EntityLOCATION","stringVal":"Atlanta"},
    {"name":"EntityORGANIZATION","stringVal":"Atlanta Hawks"},
    {"name":"EntityPERSON","stringVal":"Tomas Satoransky"},
    {"name":"EntityPERSON","stringVal":"Garrett Temple"}
  ]
}

The URL is annotated with a collection of entities mentioned in the news report, The entities are of different types, including PERSON, LOCATION, ORGANIZATION, among others. They are converted to a JSON record known as Event with groupId and eventTime. The entities are represented as dimensions of name-value pairs.

List dataset

We can list the datasets loaded in the demo project:

gcurl-demo https://timeseriesinsights.googleapis.com/v1/projects/${DEMO_PROJECT}/datasets

The result shows among others a dataset with the name webnlp-201901-202104. It is loaded from the Cloud Storage JSON files gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/webnlp-201901-202104.*. The dataNames field shows the dimension names of this dataset.

{
  "datasets": [
    ...
    {
      "name": "webnlp-201901-202104",
      "dataNames": [
        "EntityCONSUMER_GOOD",
        "EntityEVENT",
        "EntityLOCATION",
        "EntityORGANIZATION",
        "EntityOTHER",
        "EntityPERSON",
        "EntityUNKNOWN",
        "EntityWORK_OF_ART",
      ],
      "dataSources": [
        {
          "uri": "gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/webnlp-201901-202104.*"
        }
      ],
      "state": "LOADED",
      "status": {
        "message": "name: \"processed-session\"\nvalue: 83611344\n,name: \"num-items-examined\"\nvalue: 83611344\n,name: \"num-items-ingested\"\nvalue: 83611344\n"
      }
    },
    ...
  ]
}

The status field also shows the service processed over 83 million JSON records from the input files without encountering bad records.

Retrieve and evaluate time series

The daily numbers of news URLs matching a slice forms a time series. To check if there is anomaly for a given slice, we can use the "evaluateSlice" request:

gcurl-demo -X POST -d @eval.json https://timeseriesinsights.googleapis.com/v1/projects/${DEMO_PROJECT}/datasets/webnlp-201901-202104:evaluateSlice

where eval.json contains

{
  "detectionTime": "2020-12-27T00:00:00Z",
  "pinnedDimensions": [
    {"name": "EntityLOCATION", "stringVal": "Paris"},
  ],
  "timeseriesParams": {
    "forecastHistory": "7776000s",
    "granularity": "86400s",
  },
}

This request asks whether there are anomalies on 12/27/2020 (detectionTime) mentioning any location entities (dimensionNames: ["EntityLOCATION"]), compared with the previous 90 days (forecastHistory).

The result shows the history leading to the detection time, the forecast time series (which only contains one point because we didn't ask for a longer period), the predicted and actual values on 2020/12/27, and the anomaly score, which shows how much the actual value was different from the expected value (see the field's documentation for explanations on how to interpret the value).

{
  "dimensions": [
    {
      "name": "EntityLOCATION",
      "stringVal": "Paris"
    }
  ],
  "history": {
    "point": [
      {
        "time": "2020-09-28T00:00:00Z",
        "value": 318
      },
      {
        "time": "2020-09-29T00:00:00Z",
        "value": 315
      },
      {
        "time": "2020-09-30T00:00:00Z",
        "value": 368
      },

      ......

      ......

      ......
      {
        "time": "2020-12-25T00:00:00Z",
        "value": 285
      },
      {
        "time": "2020-12-26T00:00:00Z",
        "value": 326
      },
      {
        "time": "2020-12-27T00:00:00Z",
        "value": 300
      }
    ]
  },
  "forecast": {
    "point": [
      {
        "time": "2020-12-27T00:00:00Z",
        "value": 19.694006033275819
      }
    ]
  },
  "detectionPointActual": 300,
  "detectionPointForecast": 19.694006033275819,
  "expectedDeviation": 319.40540373277327,
  "anomalyScore": 0.87758688429595888,
  "status": {}
}

Varying granularity or timeInterval will result in different time series and thus different evaluation results. Similarly for varying pinnedDimensions.

We can also have multiple pinned dimension values. To check how many times "Joe Biden" and "Paris" are mentioned in the same news report,

{
  "detectionTime": "2020-12-27T00:00:00Z",
  "pinnedDimensions": [
    {"name": "EntityLOCATION", "stringVal": "Paris"},
    {"name": "EntityPERSON", "stringVal": "Joe Biden"},
  ],
  "timeseriesParams": {
    "forecastHistory": "7776000s",
    "granularity": "86400s",
  },
}

results in

{
  "dimensions": [
    {
      "name": "EntityLOCATION",
      "stringVal": "Paris"
    },
    {
      "name": "EntityPERSON",
      "stringVal": "Joe Biden"
    }
  ],
  "history": {
    "point": [
      {
        "time": "2020-09-28T00:00:00Z",
        "value": 16
      },
      {
        "time": "2020-09-29T00:00:00Z",
        "value": 71
      },
      {
        "time": "2020-09-30T00:00:00Z",
        "value": 166
      },

      ......

      ......

      ......

      {
        "time": "2020-12-25T00:00:00Z",
        "value": 38
      },
      {
        "time": "2020-12-26T00:00:00Z",
        "value": 62
      },
      {
        "time": "2020-12-27T00:00:00Z",
        "value": 58
      }
    ]
  },
  "forecast": {
    "point": [
      {
        "time": "2020-12-27T00:00:00Z",
        "value": 247.30300528799364
      }
    ]
  },
  "detectionPointActual": 58,
  "detectionPointForecast": 247.30300528799364,
  "expectedDeviation": 194.72938872574315,
  "anomalyScore": 0.97213371620281852,
  "status": {}
}

Query for anomalies

Although we can iterate through all possible "EntityLOCATION" to check anomalies using the above API, it is more convenient just asking if there are any "EntityLOCATION" showing anomalous behavior. This is conveniently achieved by the main "query" API, just replacing "pinnedDimensions" with "slicingParams" with "EntityLOCATION" as "dimensionNames" without specifying the value.

gcurl-demo -X POST -d @query.json https://timeseriesinsights.googleapis.com/v1/projects/${DEMO_PROJECT}/datasets/webnlp-201901-202104:query

where query.json contains

{
  "detectionTime": "2020-12-27T00:00:00Z",
  "numReturnedSlices": 5,
  "slicingParams": {
    "dimensionNames": ["EntityLOCATION"]
  },
  "timeseriesParams": {
    "forecastHistory": "7776000s",
    "granularity": "86400s",
  },
  "forecastParams": {
    "noiseThreshold": 100.0,
  },
}

Because we only requested the top 5 anomaly slices, we will get a list of 5 slices sorted in descending order by their anomaly score:

{
  "name": "projects/timeseries-staging/locations/us-central1/datasets/webnlp-201901-202104-dragosd",
  "slices": [
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "Antioch"
        }
      ],
      "detectionPointActual": 1073,
      "detectionPointForecast": 16.891770879460367,
      "expectedDeviation": 111.22323058991122,
      "anomalyScore": 4.9999624859964769,
      "status": {}
    },
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "Rockford"
        }
      ],
      "detectionPointActual": 857,
      "detectionPointForecast": 38.923463390434179,
      "expectedDeviation": 67.357212683307253,
      "anomalyScore": 4.8882060324321079,
      "status": {}
    },
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "Slovakia"
        }
      ],
      "detectionPointActual": 580,
      "detectionPointForecast": 29.237814627838507,
      "expectedDeviation": 179.47917938040683,
      "anomalyScore": 1.970673402552481,
      "status": {}
    },
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "Hungary"
        }
      ],
      "detectionPointActual": 643,
      "detectionPointForecast": 32.147998291311353,
      "expectedDeviation": 220.43006550702734,
      "anomalyScore": 1.9063504566655345,
      "status": {}
    },
    {
      "dimensions": [
        {
          "name": "EntityLOCATION",
          "stringVal": "Illinois"
        }
      ],
      "detectionPointActual": 428,
      "detectionPointForecast": 22.038357768135313,
      "expectedDeviation": 184.96320406379709,
      "anomalyScore": 1.4246107442734208,
      "status": {}
    }
  ]
}

What's next

Some other examples can be found on the GDELT website by searching for "Timeseries Insights API".