このページは Cloud Translation API によって翻訳されました。

予測モデルをトレーニングする

このページでは、Google Cloud コンソールまたは Vertex AI API を使用して表形式のデータセットから予測モデルをトレーニングする方法について説明します。

始める前に

予測モデルをトレーニングするには、次の作業を完了しておく必要があります。

モデルをトレーニングする

Google Cloud コンソール

Google Cloud コンソールの [Vertex AI] セクションで、[データセット] ページに移動します。

[データセット] ページに移動
モデルのトレーニングに使用するデータセットの名前をクリックして、詳細ページを開きます。
目的のデータ型でアノテーションセットが使用されている場合は、モデルで使用するアノテーションセットを選択します。
[新しいモデルのトレーニング] をクリックします。
[その他] を選択します。
[トレーニング方法] ページで、次のように構成します。
1. モデルのトレーニング方法を選択します。詳細については、モデルのトレーニング方法をご覧ください。
2. [続行] をクリックします。
[モデルの詳細] ページで、次のように構成します。
1. 新しいモデルの表示名を入力します。
2. ターゲット列を選択します。
  
  ターゲット列は、モデルが予測する値です。ターゲット列の要件をご覧ください。
3. データセットに [シリーズ識別子] 列と [タイムスタンプ] 列を設定していない場合は、ここで選択します。
4. [データの粒度] を選択します。休日効果のモデリングを使用する場合は、Daily を選択します。詳しくは、データの粒度を選択する方法をご覧ください。
5. 省略可: [地域の休日] プルダウンで、1 つ以上の地域を選択して、休日効果のモデリングを有効にします。トレーニング中に、Vertex AI は [タイムスタンプ] 列の日付と指定された地理的地域に基づいて、モデル内で休日のカテゴリ特徴を作成します。このオプションは、[データの粒度] が Daily に設定されている場合にのみ選択できます。デフォルトでは、休日効果のモデリングは無効になっています。休日効果のモデリングに使用される地理的リージョンについては、地域の休日をご覧ください。
6. [Context window] と [Forecast horizon] を入力します。
  
  予測ホライズンでは、予測データの各行のターゲット値について、モデルがどの程度の未来の期間を予測するかが決まります。[予測ホライズン] は [データ粒度] の単位で指定されます。
  
  コンテキストウィンドウでは、トレーニング時（および予測用）にモデルのルックバック期間を設定します。つまり、各トレーニングデータポイントにおいて、コンテキストウィンドウの設定によって、モデルが予測パターンをどの程度の期間さかのぼるかが決まります。[コンテキスト期間] は [データの粒度] の単位で指定されます。
  
  詳細
7. テストデータセットを BigQuery にエクスポートする場合は、[Export test dataset to BigQuery] をオンにして、テーブルの名前を指定します。
8. データ分割を手動で制御する場合や、予測ウィンドウを構成する場合は、[詳細オプション] を開きます。
9. デフォルトのデータ分割は時系列で、標準の 80 / 10 / 10 パーセントです。どの分割をどの分割に割り当てるかを手動で指定するには、[手動] を選択してデータ分割列を指定します。
  
  データ分割をご覧ください。
10. 予測ウィンドウを生成するローリングウィンドウ戦略を選択します。デフォルトの戦略は [カウント] です。
  - カウント: 表示されるテキストボックスの最大ウィンドウ数を設定します。
  - ストライド: 表示されるテキストボックスでストライドの長さの値を設定します。
  - 列: 表示されるプルダウンから、適切な列名を選択します。
  詳細については、ローリングウィンドウ戦略をご覧ください。
11. [続行] をクリックします。
[トレーニングオプション] ページで、次のように構成します。
1. まだクリックしていない場合は、[GENERATE STATISTICS] をクリックします。
  
  このメニューにより、[変換] プルダウンメニューが生成されます。
2. 列のリストを確認して、モデルのトレーニングに使用すべきではない列を除外します。
  
  データ分割列を使用する場合は、それを含める必要があります。
3. 含めた特徴に対して選択されている変換を確認し、必要な更新を行います。
  
  選択した変換で無効なデータを含む行は、トレーニングから除外されます。詳細は、変換をご覧ください。
4. トレーニングに含める列ごとに、[特徴タイプ] で、その特徴と時系列との関係性と、予測時にその特徴が使用可能かどうかを指定します。詳細については、特徴タイプと利用可否をご覧ください。
5. 重み列を指定する場合、最適化目標をデフォルトから変更する場合、階層予測を有効にする場合は、[詳細オプション] を開きます。
6. （省略可）重み列を指定する場合は、プルダウンリストから選択します。詳しくは、重み列をご覧ください。
7. （省略可）最適化の目標を選択する場合は、リストから選択します。最適化の目標の詳細をご確認ください。
8. （省略可）階層予測を使用する場合は、[階層予測を有効にする] を選択します。以下の 3 つのグループオプションから選択できます。
  - No grouping
  - Group by columns
  - Group all
  次の集約損失の重みを設定することもできます。
  - Group total weight です。このフィールドは、Group by columns オプションまたは Group all オプションを選択した場合にのみ設定できます。
  - Temporal total weight。
  - Group temporal total weight です。このフィールドは、Group by columns オプションまたは Group all オプションを選択した場合にのみ設定できます。
  階層予測の詳細を確認してください。
9. [続行] をクリックします。
[コンピューティングと料金] ページで、次のように構成します。
1. モデルのトレーニングの最大時間数を入力します。この設定により、トレーニング費用に上限を設定できます。新しいモデルの作成には他のオペレーションも必要なため、実際の経過時間がこの値より長くなることがあります。
  
  推奨されるトレーニング時間は、予測ホライズンのサイズとトレーニングデータに関連します。次の表では、予測トレーニングの実行例と、高品質のモデルをトレーニングするために必要なトレーニング時間を示します。
  
  行特徴予測ホライズントレーニング時間
  
  1,200 万 10 6 3～6 時間
  
  2,000 万 50 13 6～12 時間
  
  1,600 万 30 365 24～48 時間
  
  トレーニングの料金については、料金ページをご覧ください。
2. [トレーニングを開始] をクリックします。
  データのサイズ、複雑さ、トレーニング予算（指定された場合）に応じて、モデルのトレーニングに何時間もかかることがあります。このタブを閉じて、後で戻ることもできます。モデルのトレーニングが完了すると、メールが送られてきます。
  
  Cloud Storage または BigQuery の表形式のトレーニングデータは、Vertex AI にインポートされません。（ローカルファイルからインポートする場合は、Cloud Storage にインポートされます）。表形式データを使ってデータセットを作成すると、そのデータはデータセットに関連付けられます。データセットの作成後に Cloud Storage または BigQuery のデータソースに加えられた変更は、その後、そのデータセットでトレーニングされたモデルに組み込まれます。データセットのスナップショットは、モデルのトレーニング開始時に作成されます。

行	特徴	予測ホライズン	トレーニング時間
1,200 万	10	6	3～6 時間
2,000 万	50	13	6～12 時間
1,600 万	30	365	24～48 時間

API

お使いの言語または環境に応じて、以下のタブを選択してください。

REST

trainingPipelines.create コマンドを使用してモデルをトレーニングします。

リクエストのデータを使用する前に、次のように置き換えます。

LOCATION: 使用するリージョン。
PROJECT: 実際のプロジェクト ID。
TRAINING_PIPELINE_DISPLAY_NAME: このオペレーション用に作成されたトレーニングパイプラインの表示名。
TRAINING_TASK_DEFINITION: モデルのトレーニング方法。
- 時系列高密度エンコーダ（TiDE）
  gs://google-cloud-aiplatform/schema/trainingjob/definition/time_series_dense_encoder_forecasting_1.0.0.yaml
- Temporal Fusion Transformer（TFT）
  gs://google-cloud-aiplatform/schema/trainingjob/definition/temporal_fusion_transformer_time_series_forecasting_1.0.0.yaml
- AutoML（L2L）
  gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_forecasting_1.0.0.yaml
- Seq2Seq+
  gs://google-cloud-aiplatform/schema/trainingjob/definition/seq2seq_plus_time_series_forecasting_1.0.0.yaml
詳細については、モデルのトレーニング方法をご覧ください。
TARGET_COLUMN: このモデルに予測させる列（値）。
TIME_COLUMN: 時間列。詳細。
TIME_SERIES_IDENTIFIER_COLUMN: 時系列識別子の列。詳細。
WEIGHT_COLUMN:（省略可）重み列。詳細。
TRAINING_BUDGET: モデルをトレーニングするミリノード時間単位の最大時間（1,000 ミリノード時間は 1 ノード時間に等しい）。
GRANULARITY_UNIT: トレーニングデータの粒度、予測ホライズン、コンテキストウィンドウに使用する単位。minute、hour、day、week、month、year のいずれかを指定します。休日効果のモデリングを使用する場合は、day を選択します。詳しくは、データの粒度を選択する方法をご覧ください。
GRANULARITY_QUANTITY: トレーニングデータの観測の間隔を構成する粒度単位の数。minute 以外のすべての単位では 1 になります。minute には 1、5、10、15、30 を指定できます。詳しくは、データの粒度を選択する方法をご覧ください。
GROUP_COLUMNS: 階層レベルのグループ化を表すトレーニング入力テーブルの列名。この列は time_series_attribute_columns にする必要があります。詳細
GROUP_TOTAL_WEIGHT: 個々の損失に対するグループの合計損失の重み。0.0 に設定するか未設定のままにした場合は無効になります。グループ列が設定されていない場合、すべての時系列は同じグループの一部として扱われ、すべての時系列で集計されます。詳細。
TEMPORAL_TOTAL_WEIGHT: 個々の損失に対する合計損失時間の重み。0.0 に設定するか未設定のままにした場合は無効になります。詳細。
GROUP_TEMPORAL_TOTAL_WEIGHT: 個々の損失に対する合計損失（グループ × 時間）の重み。0.0 に設定するか未設定のままにした場合は無効になります。グループ列が設定されていない場合、すべての時系列は同じグループの一部として扱われ、すべての時系列で集計されます。詳細
HOLIDAY_REGIONS: （省略可）1 つ以上の地理的地域を選択して、休日効果のモデリングを有効にできます。トレーニング中に、Vertex AI は TIME_COLUMN の日付と指定された地理的地域に基づいて、モデル内に休日のカテゴリ特徴を作成します。有効にするには、GRANULARITY_UNIT を day に設定し、HOLIDAY_REGIONS フィールドに 1 つ以上のリージョンを指定します。デフォルトでは、休日効果のモデリングは無効になっています。詳しくは、地域の休日をご覧ください。
FORECAST_HORIZON: 予測ホライズンでは、予測データの各行のターゲット値について、モデルがどの程度の未来の期間を予測するかが決まります。予測ホライズンは、データの粒度（GRANULARITY_UNIT）の単位で指定されます。詳細。
CONTEXT_WINDOW: コンテキストウィンドウでは、トレーニング時（および予測用）のモデルのルックバック期間を設定します。つまり、各トレーニングデータポイントにおいて、コンテキストウィンドウの設定によって、モデルが予測パターンをどの程度の期間さかのぼるかが決まります。コンテキストウィンドウはデータの粒度（GRANULARITY_UNIT）の単位で指定されます。詳細。
OPTIMIZATION_OBJECTIVE: デフォルトでは、Vertex AI は二乗平均平方根誤差（RMSE）を最小化します。予測モデルに対して別の最適化目標が必要な場合は、予測モデルの最適化目標のいずれかのオプションを選択します。分位点損失を最小限に抑える場合は、QUANTILES の値も指定する必要があります。
PROBABILISTIC_INFERENCE: （省略可）true に設定すると、Vertex AI は予測の確率分布をモデル化します。確率的推論により、ノイズの多いデータを処理し、不確実性を定量化することで、モデルの品質を改善できます。QUANTILES を指定すると、Vertex AI により確率分布の分位数も返されます。確率推定は、Time series Dense Encoder (TiDE) and the AutoML (L2L) training methods. It is incompatible with hierarchical forecasting and the minimize-quantile-loss optimization objective. とのみ互換性があります


  
    QUANTILES: Quantiles to use for the minimize-quantile-loss optimization
    objective and probabilistic inference. Provide a list of up to five unique numbers between
    0 and 1, exclusive.
  
  
    TIME_SERIES_ATTRIBUTE_COL: The name or names of the columns that are time series
    attributes.
    Learn more.
  
  
    AVAILABLE_AT_FORECAST_COL: The name or names of the covariate columns whose value
    is known at forecast time.
    Learn more.
  
  
    UNAVAILABLE_AT_FORECAST_COL: The name or names of the covariate columns whose value
    is unknown at forecast time.
    Learn more.
  
  
    TRANSFORMATION_TYPE: The transformation type is provided for each column used to
    train the model.
    Learn more.
  
  
    COLUMN_NAME: The name of the column with the specified transformation type. Every
    column used to train the model must be specified.
  
  MODEL_DISPLAY_NAME: Display name for the newly trained model.
  DATASET_ID: ID for the training Dataset.
  
    You can provide a Split object to control your data split. For information about
    controlling data split, see Control the data split using REST.
  
  
    You can provide a windowConfig object to configure a rolling window strategy for
    forecast window generation. For further information, see
    Configure the rolling window strategy using REST.
  
  PROJECT_NUMBER: Your project's automatically generated project number







  HTTP method and URL:


  
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines





  Request JSON body:


  
{
    "displayName": "TRAINING_PIPELINE_DISPLAY_NAME",
    "trainingTaskDefinition": "TRAINING_TASK_DEFINITION",
    "trainingTaskInputs": {
        "targetColumn": "TARGET_COLUMN",
        "timeColumn": "TIME_COLUMN",
        "timeSeriesIdentifierColumn": "TIME_SERIES_IDENTIFIER_COLUMN",
        "weightColumn": "WEIGHT_COLUMN",
        "trainBudgetMilliNodeHours": TRAINING_BUDGET,
        "dataGranularity": {"unit": "GRANULARITY_UNIT", "quantity": GRANULARITY_QUANTITY},
        "hierarchyConfig": {"groupColumns": GROUP_COLUMNS, "groupTotalWeight": GROUP_TOTAL_WEIGHT, "temporalTotalWeight": TEMPORAL_TOTAL_WEIGHT, "groupTemporalTotalWeight": GROUP_TEMPORAL_TOTAL_WEIGHT}
        "holidayRegions" : ["HOLIDAY_REGIONS_1", "HOLIDAY_REGIONS_2", ...]
        "forecast_horizon": FORECAST_HORIZON,
        "context_window": CONTEXT_WINDOW,
        "optimizationObjective": "OPTIMIZATION_OBJECTIVE",
        "quantiles": "QUANTILES",
        "enableProbabilisticInference": "PROBABILISTIC_INFERENCE",
        "time_series_attribute_columns": ["TIME_SERIES_ATTRIBUTE_COL_1", "TIME_SERIES_ATTRIBUTE_COL_2", ...]
        "available_at_forecast_columns": ["AVAILABLE_AT_FORECAST_COL_1", "AVAILABLE_AT_FORECAST_COL_2", ...]
        "unavailable_at_forecast_columns": ["UNAVAILABLE_AT_FORECAST_COL_1", "UNAVAILABLE_AT_FORECAST_COL_2", ...]
        "transformations": [
            {"TRANSFORMATION_TYPE_1":  {"column_name" : "COLUMN_NAME_1"} },
            {"TRANSFORMATION_TYPE_2":  {"column_name" : "COLUMN_NAME_2"} },
            ...
    },
    "modelToUpload": {"displayName": "MODEL_DISPLAY_NAME"},
    "inputDataConfig": {
      "datasetId": "DATASET_ID",
    }
}






To send your request, expand one of these options:





  curl (Linux, macOS, or Cloud Shell)

  
  
    
      Note:
        
          The following command assumes that you have logged in to
          the gcloud CLI with your user account by running
          gcloud init
          or
          gcloud auth login
            
            , or by using Cloud Shell,
            which automatically logs you into the gcloud CLI
            .
          You can check the currently active account by running
          gcloud auth list.
        
      
    
  

  
    
      Save the request body in a file named request.json,
      and execute the following command:
    
    

  

  
  
    
  

  
  

  
  

  
  

  
  

  
  
    
  

  
  
    
  

  
  

  
  
curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines"





  PowerShell (Windows)

  
  
    
      Note:
        
          The following command assumes that you have logged in to
          the gcloud CLI with your user account by running
          gcloud init
          or
          gcloud auth login
            .
          You can check the currently active account by running
          gcloud auth list.
        
      
    
  

  
    
      Save the request body in a file named request.json,
      and execute the following command:
    
    

  

  
  
    
  

  
  

  
  
    
    
  

  
  

  
  

  
  

  
  

  

  
  
    
  

  
  
    
  

  
  
    
  

  
  
  
    
  

  
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines" | Select-Object -Expand Content












    You should receive a JSON response similar to the following:
    




  
{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/trainingPipelines/TRAINING_PIPELINE_ID",
  "displayName": "myModelName",
  "trainingTaskDefinition": "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_tabular_1.0.0.yaml",
  "modelToUpload": {
    "displayName": "myModelName"
  },
  "state": "PIPELINE_STATE_PENDING",
  "createTime": "2020-08-18T01:22:57.479336Z",
  "updateTime": "2020-08-18T01:22:57.479336Z"
}




  
  
  
  
  
  
  
  
  
  













  
  



  
  
  
  
  
    
      
    
    
  




  



  









  



  
  Vertex AI SDK for Python
  
  
    
      To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python.
      
        For more information, see the
        
         Vertex AI SDK for Python API reference documentation.
      
      
    
  
  
  
  





















  
  
  
  





  
    
  
  











  




  




  



  


  
def create_training_pipeline_forecasting_time_series_dense_encoder_sample(
    project: str,
    display_name: str,
    dataset_id: str,
    location: str = "us-central1",
    model_display_name: str = "my_model",
    target_column: str = "target_column",
    time_column: str = "date",
    time_series_identifier_column: str = "time_series_id",
    unavailable_at_forecast_columns: List[str] = [],
    available_at_forecast_columns: List[str] = [],
    forecast_horizon: int = 1,
    data_granularity_unit: str = "week",
    data_granularity_count: int = 1,
    training_fraction_split: float = 0.8,
    validation_fraction_split: float = 0.1,
    test_fraction_split: float = 0.1,
    budget_milli_node_hours: int = 8000,
    timestamp_split_column_name: str = "timestamp_split",
    weight_column: str = "weight",
    time_series_attribute_columns: List[str] = [],
    context_window: int = 0,
    export_evaluated_data_items: bool = False,
    export_evaluated_data_items_bigquery_destination_uri: Optional[str] = None,
    export_evaluated_data_items_override_destination: bool = False,
    quantiles: Optional[List[float]] = None,
    enable_probabilistic_inference: bool = False,
    validation_options: Optional[str] = None,
    predefined_split_column_name: Optional[str] = None,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    # Create training job
    forecasting_tide_job = aiplatform.TimeSeriesDenseEncoderForecastingTrainingJob(
        display_name=display_name,
        optimization_objective="minimize-rmse",
    )

    # Retrieve existing dataset
    dataset = aiplatform.TimeSeriesDataset(dataset_id)

    # Run training job
    model = forecasting_tide_job.run(
        dataset=dataset,
        target_column=target_column,
        time_column=time_column,
        time_series_identifier_column=time_series_identifier_column,
        unavailable_at_forecast_columns=unavailable_at_forecast_columns,
        available_at_forecast_columns=available_at_forecast_columns,
        forecast_horizon=forecast_horizon,
        data_granularity_unit=data_granularity_unit,
        data_granularity_count=data_granularity_count,
        training_fraction_split=training_fraction_split,
        validation_fraction_split=validation_fraction_split,
        test_fraction_split=test_fraction_split,
        predefined_split_column_name=predefined_split_column_name,
        timestamp_split_column_name=timestamp_split_column_name,
        weight_column=weight_column,
        time_series_attribute_columns=time_series_attribute_columns,
        context_window=context_window,
        export_evaluated_data_items=export_evaluated_data_items,
        export_evaluated_data_items_bigquery_destination_uri=export_evaluated_data_items_bigquery_destination_uri,
        export_evaluated_data_items_override_destination=export_evaluated_data_items_override_destination,
        quantiles=quantiles,
        enable_probabilistic_inference=enable_probabilistic_inference,
        validation_options=validation_options,
        budget_milli_node_hours=budget_milli_node_hours,
        model_display_name=model_display_name,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    print(model.uri)
    return model






























  
  



  
  
  
  
  
  
  
  
  
  












  
  



  
  
  
  
  
    
      
    
    
  




  



  









  



  
  Vertex AI SDK for Python
  
  
    
      To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python.
      
        For more information, see the
        
         Vertex AI SDK for Python API reference documentation.
      
      
    
  
  
  
  





















  
  
  
  





  
    
  
  











  




  




  



  


  
def create_training_pipeline_forecasting_temporal_fusion_transformer_sample(
    project: str,
    display_name: str,
    dataset_id: str,
    location: str = "us-central1",
    model_display_name: str = "my_model",
    target_column: str = "target_column",
    time_column: str = "date",
    time_series_identifier_column: str = "time_series_id",
    unavailable_at_forecast_columns: List[str] = [],
    available_at_forecast_columns: List[str] = [],
    forecast_horizon: int = 1,
    data_granularity_unit: str = "week",
    data_granularity_count: int = 1,
    training_fraction_split: float = 0.8,
    validation_fraction_split: float = 0.1,
    test_fraction_split: float = 0.1,
    budget_milli_node_hours: int = 8000,
    timestamp_split_column_name: str = "timestamp_split",
    weight_column: str = "weight",
    time_series_attribute_columns: List[str] = [],
    context_window: int = 0,
    export_evaluated_data_items: bool = False,
    export_evaluated_data_items_bigquery_destination_uri: Optional[str] = None,
    export_evaluated_data_items_override_destination: bool = False,
    validation_options: Optional[str] = None,
    predefined_split_column_name: Optional[str] = None,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    # Create training job
    forecasting_tft_job = aiplatform.TemporalFusionTransformerForecastingTrainingJob(
        display_name=display_name,
        optimization_objective="minimize-rmse",
    )

    # Retrieve existing dataset
    dataset = aiplatform.TimeSeriesDataset(dataset_id)

    # Run training job
    model = forecasting_tft_job.run(
        dataset=dataset,
        target_column=target_column,
        time_column=time_column,
        time_series_identifier_column=time_series_identifier_column,
        unavailable_at_forecast_columns=unavailable_at_forecast_columns,
        available_at_forecast_columns=available_at_forecast_columns,
        forecast_horizon=forecast_horizon,
        data_granularity_unit=data_granularity_unit,
        data_granularity_count=data_granularity_count,
        training_fraction_split=training_fraction_split,
        validation_fraction_split=validation_fraction_split,
        test_fraction_split=test_fraction_split,
        predefined_split_column_name=predefined_split_column_name,
        timestamp_split_column_name=timestamp_split_column_name,
        weight_column=weight_column,
        time_series_attribute_columns=time_series_attribute_columns,
        context_window=context_window,
        export_evaluated_data_items=export_evaluated_data_items,
        export_evaluated_data_items_bigquery_destination_uri=export_evaluated_data_items_bigquery_destination_uri,
        export_evaluated_data_items_override_destination=export_evaluated_data_items_override_destination,
        validation_options=validation_options,
        budget_milli_node_hours=budget_milli_node_hours,
        model_display_name=model_display_name,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    print(model.uri)
    return model






























  
  



  
  
  
  
  
  
  
  
  
  












  
  



  
  
  
  
  
    
      
    
    
  




  



  









  



  
  Vertex AI SDK for Python
  
  
    
      To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python.
      
        For more information, see the
        
         Vertex AI SDK for Python API reference documentation.
      
      
    
  
  
  
  





















  
  
  
  





  
    
  
  











  




  




  



  


  
def create_training_pipeline_forecasting_sample(
    project: str,
    display_name: str,
    dataset_id: str,
    location: str = "us-central1",
    model_display_name: str = "my_model",
    target_column: str = "target_column",
    time_column: str = "date",
    time_series_identifier_column: str = "time_series_id",
    unavailable_at_forecast_columns: List[str] = [],
    available_at_forecast_columns: List[str] = [],
    forecast_horizon: int = 1,
    data_granularity_unit: str = "week",
    data_granularity_count: int = 1,
    training_fraction_split: float = 0.8,
    validation_fraction_split: float = 0.1,
    test_fraction_split: float = 0.1,
    budget_milli_node_hours: int = 8000,
    timestamp_split_column_name: str = "timestamp_split",
    weight_column: str = "weight",
    time_series_attribute_columns: List[str] = [],
    context_window: int = 0,
    export_evaluated_data_items: bool = False,
    export_evaluated_data_items_bigquery_destination_uri: Optional[str] = None,
    export_evaluated_data_items_override_destination: bool = False,
    quantiles: Optional[List[float]] = None,
    enable_probabilistic_inference: bool = False,
    validation_options: Optional[str] = None,
    predefined_split_column_name: Optional[str] = None,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    # Create training job
    forecasting_job = aiplatform.AutoMLForecastingTrainingJob(
        display_name=display_name, optimization_objective="minimize-rmse"
    )

    # Retrieve existing dataset
    dataset = aiplatform.TimeSeriesDataset(dataset_id)

    # Run training job
    model = forecasting_job.run(
        dataset=dataset,
        target_column=target_column,
        time_column=time_column,
        time_series_identifier_column=time_series_identifier_column,
        unavailable_at_forecast_columns=unavailable_at_forecast_columns,
        available_at_forecast_columns=available_at_forecast_columns,
        forecast_horizon=forecast_horizon,
        data_granularity_unit=data_granularity_unit,
        data_granularity_count=data_granularity_count,
        training_fraction_split=training_fraction_split,
        validation_fraction_split=validation_fraction_split,
        test_fraction_split=test_fraction_split,
        predefined_split_column_name=predefined_split_column_name,
        timestamp_split_column_name=timestamp_split_column_name,
        weight_column=weight_column,
        time_series_attribute_columns=time_series_attribute_columns,
        context_window=context_window,
        export_evaluated_data_items=export_evaluated_data_items,
        export_evaluated_data_items_bigquery_destination_uri=export_evaluated_data_items_bigquery_destination_uri,
        export_evaluated_data_items_override_destination=export_evaluated_data_items_override_destination,
        quantiles=quantiles,
        enable_probabilistic_inference=enable_probabilistic_inference,
        validation_options=validation_options,
        budget_milli_node_hours=budget_milli_node_hours,
        model_display_name=model_display_name,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    print(model.uri)
    return model






























  
  



  
  
  
  
  
  
  
  
  
  












  
  



  
  
  
  
  
    
      
    
    
  




  



  









  



  
  Vertex AI SDK for Python
  
  
    
      To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python.
      
        For more information, see the
        
         Vertex AI SDK for Python API reference documentation.
      
      
    
  
  
  
  





















  
  
  
  





  
    
  
  











  




  




  



  


  
def create_training_pipeline_forecasting_seq2seq_sample(
    project: str,
    display_name: str,
    dataset_id: str,
    location: str = "us-central1",
    model_display_name: str = "my_model",
    target_column: str = "target_column",
    time_column: str = "date",
    time_series_identifier_column: str = "time_series_id",
    unavailable_at_forecast_columns: List[str] = [],
    available_at_forecast_columns: List[str] = [],
    forecast_horizon: int = 1,
    data_granularity_unit: str = "week",
    data_granularity_count: int = 1,
    training_fraction_split: float = 0.8,
    validation_fraction_split: float = 0.1,
    test_fraction_split: float = 0.1,
    budget_milli_node_hours: int = 8000,
    timestamp_split_column_name: str = "timestamp_split",
    weight_column: str = "weight",
    time_series_attribute_columns: List[str] = [],
    context_window: int = 0,
    export_evaluated_data_items: bool = False,
    export_evaluated_data_items_bigquery_destination_uri: Optional[str] = None,
    export_evaluated_data_items_override_destination: bool = False,
    validation_options: Optional[str] = None,
    predefined_split_column_name: Optional[str] = None,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    # Create training job
    forecasting_seq2seq_job = aiplatform.SequenceToSequencePlusForecastingTrainingJob(
        display_name=display_name, optimization_objective="minimize-rmse"
    )

    # Retrieve existing dataset
    dataset = aiplatform.TimeSeriesDataset(dataset_id)

    # Run training job
    model = forecasting_seq2seq_job.run(
        dataset=dataset,
        target_column=target_column,
        time_column=time_column,
        time_series_identifier_column=time_series_identifier_column,
        unavailable_at_forecast_columns=unavailable_at_forecast_columns,
        available_at_forecast_columns=available_at_forecast_columns,
        forecast_horizon=forecast_horizon,
        data_granularity_unit=data_granularity_unit,
        data_granularity_count=data_granularity_count,
        training_fraction_split=training_fraction_split,
        validation_fraction_split=validation_fraction_split,
        test_fraction_split=test_fraction_split,
        predefined_split_column_name=predefined_split_column_name,
        timestamp_split_column_name=timestamp_split_column_name,
        weight_column=weight_column,
        time_series_attribute_columns=time_series_attribute_columns,
        context_window=context_window,
        export_evaluated_data_items=export_evaluated_data_items,
        export_evaluated_data_items_bigquery_destination_uri=export_evaluated_data_items_bigquery_destination_uri,
        export_evaluated_data_items_override_destination=export_evaluated_data_items_override_destination,
        validation_options=validation_options,
        budget_milli_node_hours=budget_milli_node_hours,
        model_display_name=model_display_name,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    print(model.uri)
    return model



Control the data split using REST

You can control how your training data is split between the training,
validation, and test sets. Use a split column to manually specify the data
split for each row and provide it as part of a PredefinedSplit
Split object in the
inputDataConfig of the JSON request.

DATA_SPLIT_COLUMN is the column containing the data split values
(TRAIN, VALIDATION, TEST).
 "predefinedSplit": {
   "key": DATA_SPLIT_COLUMN
 },

Learn more about data splits.

Configure the rolling window strategy using REST

You can provide a windowConfig object to configure a rolling window strategy for
forecast window generation. The default strategy is maxCount.


To use the maxCount option, add the following to trainingTaskInputs of
the JSON request. MAX_COUNT_VALUE refers to the maximum number
of windows.
 "windowConfig": {
   "maxCount": MAX_COUNT_VALUE
 },
 ```

To use the strideLength option, add the following to trainingTaskInputs
of the JSON request. STRIDE_LENGTH_VALUE refers to the value of
the stride length.
 "windowConfig": {
   "strideLength": STRIDE_LENGTH_VALUE
 },
 ```

To use the column option, add the following to trainingTaskInputs of
the JSON request. COLUMN_NAME refers to the name of the column
with True or False values.
 "windowConfig": {
   "column": "COLUMN_NAME"
 },
 ```



To learn more, see Rolling window strategies.


What's next


Evaluate your model.