數值和分類資料通常會隨時間收集。舉例來說,下圖顯示資料中心中單一執行中工作在一段時間內每分鐘的 CPU 使用率、記憶體使用率和狀態。CPU 使用率和記憶體用量是數值,而狀態則是分類值。
活動
Timeseries Insights API 會使用事件做為基本資料輸入項目。每個事件都會附上時間戳記和一組維度,也就是鍵/值組合,其中鍵是維度名稱。這個簡單的表示法可讓我們處理數十億規模的資料。舉例來說,系統會納入資料中心、使用者、工作名稱和工作編號,以便完整呈現單一事件。上圖顯示為單一工作記錄的一系列事件,說明維度子集。
資料集是使用批次和串流資料建立而成。批次資料建構作業會從多個 Cloud Storage URI 讀取資料做為資料來源。批次建構作業完成後,資料集就能使用串流資料進行更新。系統可使用批次建構歷來資料,避免發生冷啟動問題。
資料集必須先建立或建立索引,才能進行查詢或更新。資料集建立完成後,系統就會開始建立索引,這項作業通常需要幾分鐘到幾小時才能完成,具體時間視資料量而定。具體來說,系統會在初始索引期間掃描一次資料來源。如果初始索引建立完成後,Cloud Storage URI 的內容有所變更,系統就不會再次掃描這些 URI。使用串流更新來取得其他資料。串流更新會以近乎即時的方式持續建立索引。
時間序列和異常偵測
在 TimeSeries Insights API 中,切片是指包含特定維度值組合的事件集合。我們想評估這些切片中,隨著時間推移而發生的事件。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# Overview\n\nIntroduction\n------------\n\nForecasting and anomaly detection over billions of time series is\ncomputationally intensive. Most existing systems run forecasting and anomaly\ndetection as batch jobs (for example, risk pipelines, traffic forecasting,\ndemand planning, and so on). This severely limits the type of analysis that you\ncan perform online, such as deciding whether to alert based on a sudden increase or\ndecrease across a set of event dimensions.\n\nThe main goals of the Timeseries Insights API are:\n\n- Scale to billions of time series that are dynamically constructed from raw events and their properties, based on query parameters.\n- Provide real-time forecasting and anomaly detection results. That is, within a few seconds, detect trends and seasonality across all time series and decide whether any slices are spiking or decreasing unexpectedly.\n\nAPI functionality\n-----------------\n\n- Manage datasets\n - Index and load a dataset consisting of multiple data sources stored on Cloud Storage. Allow appending new events in a streaming fashion.\n - Unload a no longer needed dataset.\n - Ask for the processing status of a dataset.\n- Query datasets\n - Retrieve the time series that matches the given property values. The time series is forecast up to a specified time horizon. The time series is also evaluated for anomalies.\n - Automatically detect combinations of property values for anomalies.\n- Update datasets\n - Ingest new events recently occurred and incorporate them into the index in nearly real-time (seconds to minutes delay).\n\nDisaster recovery\n-----------------\n\nThe Timeseries Insights API does not serve as a backup for\nCloud Storage or return raw streaming updates. Clients are\nresponsible to store and backup data separately.\n\nAfter a regional outage, the service performs a best effort recovery.\nMetadata (information about dataset and operational status) and streamed user\ndata updated within 24 hours of the start of the outage might not\nbe recovered.\n\nDuring recovery, queries and streaming updates to datasets might not\nbe available.\n\nInput data\n----------\n\nIt is common that numerical and categorical data is collected over time. For\nexample, the following figure shows the CPU usage, memory usage, and status of\na single running job in a data center for every minute over a period of time.\nThe CPU usage and memory usage are numerical values, and the status is a\ncategorical value.\n\n### Event\n\nThe Timeseries Insights API uses events as the basic data entry. Each event has a\ntimestamp and a collection of dimensions, that is, key value pairs where the key\nis the dimension name. This simple representation allows us to handle data in\nthe scale of trillions. For example, the data center, user, job names, and task\nnumbers are included to fully represent a single event. The above figure shows\na series of events recorded for a single job illustrating a subset of dimensions. \n\n {\"name\":\"user\",\"stringVal\":\"user_64194\"},\n {\"name\":\"job\",\"stringVal\":\"job_45835\"},\n {\"name\":\"data_center\",\"stringVal\":\"data_center_30389\"},\n {\"name\":\"task_num\",\"longVal\":19},\n {\"name\":\"cpu\",\"doubleVal\":3840787.5207877564},\n {\"name\":\"ram\",\"doubleVal\":1067.01},\n {\"name\":\"state\",\"stringVal\":\"idle\"}\n\n### DataSet\n\nA [DataSet](/timeseries-insights/docs/reference/rest/v1/projects.locations.datasets#DataSet)\nis a collection of events. Queries are performed within the same dataset. Each\nproject can have multiple datasets.\n\nA dataset is built from batch and streaming data. Batch data build reads\nfrom multiple Cloud Storage URIs as data sources. After batch build completes,\nthe dataset can be updated with streaming data. Using batch build for historical\ndata, the system can avoid cold-start problems.\n\nA dataset needs to be built or indexed before it can be queried or updated.\nIndexing starts when the dataset is created, and typically takes minutes to\nhours to complete, depending on the amount of data. More specifically, the\ndata sources are scanned once during the initial indexing. If the contents of\nthe Cloud Storage URIs change after initial indexing completes, they are not\nscanned again. Use streaming updates for additional data. Streaming updates\nget indexed continuously in near real time.\n| **Note:** The Timeseries Insights API can not return the raw streaming updates, so clients should store these data separately if raw data is needed.\n| **Note:** Streaming data is expected to have timestamps not too far back from realtime, so streaming updates cannot be used to incrementally add historical data.\n\nTimeseries and anomaly detection\n--------------------------------\n\nFor the Timeseries Insights API, a [slice](/timeseries-insights/docs/concept#slice)\nis a collection of events with a certain combination of dimension values. We\nare interested in a measure of events falling into these slices over time.\n\nFor a given slice, the events are aggregated into numerical values per\nuser-specified resolution of time intervals, which are the time series to detect\nanomalies. The preceding figure illustrates different choices of slices resulted\nfrom different combinations of \"user\", \"job\", and \"data_center\" dimensions.\n\nAn anomaly happens for a certain slice if the numerical value from the time\ninterval of interest is significantly different from the values in the past. The\nabove figure illustrates a time series based on temperatures measured across the\nworld over 10 years. Suppose we are interested in whether the last month of\n2015 is an anomaly. A query to the system specifies the time of interest,\n`detectionTime`, to be \"2015/12/01\" and the `granularity` to be \"1 month\". The\nretrieved time series before the `detectionTime` is partitioned into an earlier\n**training** period followed by a **holdout** period. The system uses data from\nthe training period to train a model, and uses the holdout period to verify that\nthe model can reliably predict the next values. For this example, the holdout\nperiod is 1 year. The picture shows the actual data and predicted values from\nthe model with upper and lower bounds. The temperature for 2015/12 is marked\nanomaly because the actual value is outside the predicted bounds.\n\nWhat's next\n-----------\n\n- Timeseries Insights API [Concepts](/timeseries-insights/docs/concept)\n- A more detailed [Tutorial](/timeseries-insights/docs/tutorial)\n- Learn more about the [REST API](/timeseries-insights/docs/reference/rest/v1/projects.locations.datasets)"]]