Concepts

In this article we will go through common concepts that we work with in the Timeseries Insights API and try to provide an intuitive explanation on what they represent.

Event

An event is a data point and the raw input that the Timeseries Insights API works with. Conceptually it represents either an action being carried out by some agent (e.g. a transaction by a client or the publishing of a news article) or an observation (e.g. the readings of a temperature sensor, or CPU usage on a machine).

An event contains:

  • A set of values across different dimensions, representing properties which describe the event, such as labels or numerical measurements.
  • A timestamp representing the time when the event occurred. This timestamp will be used when placing events onto a time series.
  • A group id.

Dimension

A dimension represents a property type for the events in a data set and the domain of values it can take. A dimension can be:

  • Categorical. An event property on this dimension can hold one of a limited/finite values, usually strings. Examples include: the country or publisher name in a data set with news articles, the machine name in a data set with production monitoring data.
  • Numerical. A measurement or a general numerical property for an event. Examples: number of page views for news articles, CPU usage or number of errors for production monitoring data.

Dataset

A dataset is a collection of events.

Group

Events can be grouped together by specifying the same group id (see Event.group_id).

The purpose of the group is to compute correlations between events from the same group, but the current version of the API does not expose this functionality. For example, if your dataset holds monitoring data (such as CPU%, RAM, etc), then a group could hold all the monitoring data from one process. That would eventually allow us to detect that an increase in CPU% is correlated with another event, such as a binary version update at a previous moment in time.

If unsure, or if not interested in computing these types of correlations, then each event should have a globally unique group id.

Slice

A slice is the subset of all events from a dataset which have the same values across some categorical dimensions.

For example, let's consider we have a data set with the sales from an international retailer and each event is a sale that has these categorical dimensions: the country where the sale occurred, the name of the product, the name of the company that made the product. Example of slices in this case are: all the sales for a given product, all the sales from a given country for all the products made by a given company.

Time series

A time series is a sequence of aggregated events, placed in equally sized time buckets. It is computed by taking as input:

  • A slice, and, thus, all the events in that slice.
  • A time interval where the time series begins and when should it end. For a given QueryDataSetRequest, these limits are [tested_interval.start_time - forecast_params.forecast_history, tested_interval.start_time + tested_interval.length]. We will only select events from the slice which have their Event.event_time within these limits when forming the time series.
  • The length in time for each time bucket in the time series. For a given QueryDataSetRequest, this length is equal to tested_interval.length.
  • An aggregation method for the events. Currently there are two aggregation methods that we support: counting the events or summing up a numerical dimension that is present in all events (specified by forecast_params.aggregated_dimension).

Forecasting

The process of predicting future values for a given time series.

Holdout

The holdout is the last portion of the time series (usually the last 5%-10%) that is used to evaluate how well our forecasting model performs. If we have higher forecast errors during the holdout period, we will reduce the confidence of our forecast by widening the forecast bounds.

Horizon

We will forecast the values of a time series starting from the tested interval up to the time horizon (given by the ForecastParams.horizon_time field).

Tested interval

The tested interval (QueryDataSetRequest.tested_interval) is a time interval for which we want to detect any slices in our data set which have unexpected values (during the tested interval) when comparing to their historical time series.

Anomaly

A slice is marked as an anomaly if, after forecasting, we have a predicted value during the tested interval that is outside the expected range by a configurable threshold.

What's next