Quickstart
This page shows you how to do basic queries in Timeseries Insights API from command
line bash
, using a preloaded read-only public dataset, kindly provided by
Kalev Leetaru.
Before you begin
- Make sure gcloud and gsutil are installed in your system.
- As a Preview client, you should have received a service account key file in your email account given to us, save the key file. The service account key expires after 90 days. Please contact us if you did not receive the key or if the key has expired. NOTE: you do not need this key for your own project. This key is only for accessing the read-only demo project.
Assuming the key file is saved as
~/timeseries-insights-api-demo-readonly-key.json
.KEY_FILE=~/timeseries-insights-api-demo-readonly-key.json DEMO_PROJECT=timeseries-insights-api-demo gcloud auth activate-service-account --key-file ${KEY_FILE} alias gcurl-demo='curl -s -H "Content-Type: application/json" -H "Authorization: Bearer $(gcloud auth print-access-token)"'
Peek into the data
The dataset is derived from The GDELT Project,
a global database tracking world events and media coverage. This dataset
contains entity mentionings in news URLs from January 2019 through April 2021.
The Cloud Storage path is
gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/
.
The URL itself is hashed into a numerical groupId
. For example, the URL
https://www.atlantaleader.com/news/267366057/wiseman-warriors-continue-road-swing-at-bulls
is converted to the following line of JSON record:
{"groupId":"-3732616503047793456","eventTime":"2020-12-27T13:02:43+00:00","dimensions":[{"name":"EntityPERSON","stringVal":"JamesWiseman"},{"name":"EntityORGANIZATION","stringVal":"GoldenStateWarriors"},{"name":"EntityORGANIZATION","stringVal":"ChicagoBulls"},{"name":"EntityPERSON","stringVal":"PatrickWilliams"},{"name":"EntityPERSON","stringVal":"DraymondGreen"},{"name":"EntityORGANIZATION","stringVal":"NBA"},{"name":"EntityLOCATION","stringVal":"WindyCity"},{"name":"EntityPERSON","stringVal":"SteveKerr"},{"name":"EntityPERSON","stringVal":"RickCelebrini"},{"name":"EntityLOCATION","stringVal":"Milwaukee"},{"name":"EntityLOCATION","stringVal":"Brooklyn"},{"name":"EntityORGANIZATION","stringVal":"IndianaPacers"},{"name":"EntityPERSON","stringVal":"WiltChamberlain"},{"name":"EntityORGANIZATION","stringVal":"EasternConference"},{"name":"EntityLOCATION","stringVal":"Memphis"},{"name":"EntityLOCATION","stringVal":"Atlanta"},{"name":"EntityORGANIZATION","stringVal":"AtlantaHawks"},{"name":"EntityPERSON","stringVal":"TomasSatoransky"},{"name":"EntityPERSON","stringVal":"GarrettTemple"}]}
which in a more human-readable form is:
{
"groupId":"-3732616503047793456",
"eventTime":"2020-12-27T13:02:43+00:00",
"dimensions":[
{"name":"EntityPERSON","stringVal":"James Wiseman"},
{"name":"EntityORGANIZATION","stringVal":"Golden State Warriors"},
{"name":"EntityORGANIZATION","stringVal":"Chicago Bulls"},
{"name":"EntityPERSON","stringVal":"Patrick Williams"},
{"name":"EntityPERSON","stringVal":"Draymond Green"},
{"name":"EntityORGANIZATION","stringVal":"NBA"},
{"name":"EntityLOCATION","stringVal":"Windy City"},
{"name":"EntityPERSON","stringVal":"Steve Kerr"},
{"name":"EntityPERSON","stringVal":"Rick Celebrini"},
{"name":"EntityLOCATION","stringVal":"Milwaukee"},
{"name":"EntityLOCATION","stringVal":"Brooklyn"},
{"name":"EntityORGANIZATION","stringVal":"Indiana Pacers"},
{"name":"EntityPERSON","stringVal":"Wilt Chamberlain"},
{"name":"EntityORGANIZATION","stringVal":"Eastern Conference"},
{"name":"EntityLOCATION","stringVal":"Memphis"},
{"name":"EntityLOCATION","stringVal":"Atlanta"},
{"name":"EntityORGANIZATION","stringVal":"Atlanta Hawks"},
{"name":"EntityPERSON","stringVal":"Tomas Satoransky"},
{"name":"EntityPERSON","stringVal":"Garrett Temple"}
]
}
The URL is annotated with a collection of entities mentioned in the news
report, The entities are of different types, including PERSON
, LOCATION
,
ORGANIZATION
, among others. They are converted to a JSON record known as
Event
with groupId
and eventTime
. The entities are represented as dimensions
of name-value pairs.
List dataset
We can list the datasets loaded in the demo project:
gcurl-demo https://timeseriesinsights.googleapis.com/v1/projects/${DEMO_PROJECT}/datasets
The result shows among others a dataset with the name webnlp-201901-202104
.
It is loaded from the Cloud Storage JSON files
gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/webnlp-201901-202104.*
. The dataNames
field shows the dimension names of this dataset.
{
"datasets": [
...
{
"name": "webnlp-201901-202104",
"dataNames": [
"EntityCONSUMER_GOOD",
"EntityEVENT",
"EntityLOCATION",
"EntityORGANIZATION",
"EntityOTHER",
"EntityPERSON",
"EntityUNKNOWN",
"EntityWORK_OF_ART",
],
"dataSources": [
{
"uri": "gs://timeseries-insights-api-demo/gdelt/webnlp-201901-202104/webnlp-201901-202104.*"
}
],
"state": "LOADED",
"status": {
"message": "name: \"processed-session\"\nvalue: 83611344\n,name: \"num-items-examined\"\nvalue: 83611344\n,name: \"num-items-ingested\"\nvalue: 83611344\n"
}
},
...
]
}
The status
field also shows the service processed over 83 million JSON records
from the input files without encountering bad records.
Retrieve and evaluate time series
The daily numbers of news URLs matching a slice forms a time series. To check if there is anomaly for a given slice, we can use the "evaluateSlice" request:
gcurl-demo -X POST -d @eval.json https://timeseriesinsights.googleapis.com/v1/projects/${DEMO_PROJECT}/datasets/webnlp-201901-202104:evaluateSlice
where eval.json
contains
{
"detectionTime": "2020-12-27T00:00:00Z",
"pinnedDimensions": [
{"name": "EntityLOCATION", "stringVal": "Paris"},
],
"timeseriesParams": {
"forecastHistory": "7776000s",
"granularity": "86400s",
},
}
This request asks whether there are anomalies on 12/27/2020 (detectionTime
)
mentioning the location entity (dimensionNames: ["EntityLOCATION"]
) "Paris",
compared with the previous 90 days (forecastHistory
).
The result shows the history leading to the detection time, the forecast time series (which only contains one point because we didn't ask for a longer period), the predicted and actual values on 2020/12/27, and the anomaly score, which shows how much the actual value was different from the expected value (see the field's documentation for explanations on how to interpret the value).
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Paris"
}
],
"history": {
"point": [
{
"time": "2020-09-28T00:00:00Z",
"value": 1290
},
{
"time": "2020-09-29T00:00:00Z",
"value": 1226
},
{
"time": "2020-09-30T00:00:00Z",
"value": 1474
},
......
......
......
{
"time": "2020-12-25T00:00:00Z",
"value": 1100
},
{
"time": "2020-12-26T00:00:00Z",
"value": 1304
},
{
"time": "2020-12-27T00:00:00Z",
"value": 1190
}
]
},
"detectionPointActual": 1190,
"detectionPointForecast": 1070.5972848121648,
"expectedDeviation": 384.575069781912,
"anomalyScore": 0.310479602708172,
"status": {}
}
Varying granularity
or timeInterval
will result in different time series and
thus different evaluation results. Similarly for varying pinnedDimensions
.
We can also have multiple pinned dimension values. To check how many times "Joe Biden" and "Paris" are mentioned in the same news report,
{
"detectionTime": "2020-12-27T00:00:00Z",
"pinnedDimensions": [
{"name": "EntityLOCATION", "stringVal": "Paris"},
{"name": "EntityPERSON", "stringVal": "Joe Biden"},
],
"timeseriesParams": {
"forecastHistory": "7776000s",
"granularity": "86400s",
},
}
results in
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Paris"
},
{
"name": "EntityPERSON",
"stringVal": "Joe Biden"
}
],
"history": {
"point": [
{
"time": "2020-09-28T00:00:00Z",
"value": 16
},
{
"time": "2020-09-29T00:00:00Z",
"value": 71
},
{
"time": "2020-09-30T00:00:00Z",
"value": 166
},
......
......
......
{
"time": "2020-12-25T00:00:00Z",
"value": 38
},
{
"time": "2020-12-26T00:00:00Z",
"value": 62
},
{
"time": "2020-12-27T00:00:00Z",
"value": 58
}
]
},
"forecast": {
"point": [
{
"time": "2020-12-27T00:00:00Z",
"value": 247.30300528799364
}
]
},
"detectionPointActual": 58,
"detectionPointForecast": 247.30300528799364,
"expectedDeviation": 194.72938872574315,
"anomalyScore": 0.97213371620281852,
"status": {}
}
Query for anomalies
Although we can iterate through all possible "EntityLOCATION" to check anomalies using the above API, it is more convenient just asking if there are any "EntityLOCATION" showing anomalous behavior. This is conveniently achieved by the main "query" API, just replacing "pinnedDimensions" with "slicingParams" with "EntityLOCATION" as "dimensionNames" without specifying the value.
gcurl-demo -X POST -d @query.json https://timeseriesinsights.googleapis.com/v1/projects/${DEMO_PROJECT}/datasets/webnlp-201901-202104:query
where query.json
contains
{
"detectionTime": "2020-12-27T00:00:00Z",
"numReturnedSlices": 5,
"slicingParams": {
"dimensionNames": ["EntityLOCATION"]
},
"timeseriesParams": {
"forecastHistory": "7776000s",
"granularity": "86400s",
},
"forecastParams": {
"noiseThreshold": 100.0,
},
}
Because we only requested the top 5 anomaly slices, we will get a list of 5 slices sorted in descending order by their anomaly score:
{
"name": "projects/timeseries-staging/locations/us-central1/datasets/webnlp-201901-202104-dragosd",
"slices": [
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Antioch"
}
],
"detectionPointActual": 1073,
"detectionPointForecast": 16.891770879460367,
"expectedDeviation": 111.22323058991122,
"anomalyScore": 4.9999624859964769,
"status": {}
},
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Rockford"
}
],
"detectionPointActual": 857,
"detectionPointForecast": 38.923463390434179,
"expectedDeviation": 67.357212683307253,
"anomalyScore": 4.8882060324321079,
"status": {}
},
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Slovakia"
}
],
"detectionPointActual": 580,
"detectionPointForecast": 29.237814627838507,
"expectedDeviation": 179.47917938040683,
"anomalyScore": 1.970673402552481,
"status": {}
},
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Hungary"
}
],
"detectionPointActual": 643,
"detectionPointForecast": 32.147998291311353,
"expectedDeviation": 220.43006550702734,
"anomalyScore": 1.9063504566655345,
"status": {}
},
{
"dimensions": [
{
"name": "EntityLOCATION",
"stringVal": "Illinois"
}
],
"detectionPointActual": 428,
"detectionPointForecast": 22.038357768135313,
"expectedDeviation": 184.96320406379709,
"anomalyScore": 1.4246107442734208,
"status": {}
}
]
}
What's next
- Timeseries Insights API Concepts
- Follow Setup for Full Access to create your own project
- A more detailed Tutorial
- A Query Building Guide
- Learn more about the REST API
Some other examples can be found on the GDELT website by searching for "Timeseries Insights API".