创建和管理数据集

本页面介绍了创建和管理 AML AI 数据集的步骤。数据集用作引擎配置、训练、回测和预测流水线的输入。AML AI 数据集包含对与 Google Cloud 项目中的 AML AI 输入数据模型匹配的 BigQuery 表的引用。

前提条件

  • 如需获取创建和管理数据集所需的权限,请让管理员向您授予项目的 Financial Services Admin (financialservices.admin) IAM 角色。如需详细了解如何授予角色,请参阅管理访问权限

    您也可以通过自定义角色或其他预定义角色来获取所需的权限。

  • 创建实例

创建数据集

某些 API 方法会返回长时间运行的操作 (LRO)。这些方法是异步的。当该方法返回响应时,操作可能尚未完成。对于这些方法,请发送请求,然后检查结果。

发送请求

如需创建数据集,请使用 projects.locations.instances.datasets.create 方法。

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_IDIAM 设置中列出的 Google Cloud 项目 ID
  • LOCATION:实例的位置;请使用某个受支持的区域
    显示位置
    • us-central1
    • us-east1
    • asia-south1
    • europe-west1
    • europe-west2
    • europe-west4
    • northamerica-northeast1
    • southamerica-east1
  • INSTANCE_ID:用户定义的实例标识符
  • DATASET_ID:AML AI 数据集的用户定义的标识符;只能使用小写字母、数字、短划线和下划线(例如 train_jan2018_apr2020
  • BQ_INPUT_DATASET_NAME:BigQuery 输入数据集名称
  • PARTY_TABLE:BigQuery 输入数据集中的 Party
  • ACCOUNT_PARTY_LINK_TABLE:BigQuery 输入数据集中的 AccountPartyLink
  • TRANSACTION_TABLE:BigQuery 输入数据集中的 Transaction
  • RISK_CASE_EVENT_TABLE:BigQuery 输入数据集中的 RiskCaseEvent
  • PARTY_SUPPLEMENTARY_DATA:BigQuery 输入数据集中的 PartySupplementaryData 表;此表为可选表,可以从请求 JSON 中移除
  • DATA_START_DATE:要在数据集中使用的数据的开始日期和时间;使用 RFC3339 世界协调时间 (UTC)(即“祖鲁时”)格式(例如 2014-10-02T15:01:23Z
  • DATA_END_DATE:要在数据集中使用的数据的结束日期和时间;使用 RFC3339 世界协调时间 (UTC)(即“祖鲁时”)格式(例如 2014-10-02T15:01:23Z

请求 JSON 正文:

{
  "tableSpecs": {
    "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
    "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
    "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
    "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
    "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
  },
  "dateRange": {
    "startTime": "DATA_START_DATE",
    "endTime": "DATA_END_DATE"
  },
  "timeZone": {
    "id": "UTC"
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:

cat > request.json << 'EOF'
{
  "tableSpecs": {
    "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
    "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
    "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
    "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
    "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
  },
  "dateRange": {
    "startTime": "DATA_START_DATE",
    "endTime": "DATA_END_DATE"
  },
  "timeZone": {
    "id": "UTC"
  }
}
EOF

然后,执行以下命令以发送 REST 请求:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID"

PowerShell

将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:

@'
{
  "tableSpecs": {
    "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
    "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
    "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
    "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
    "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
  },
  "dateRange": {
    "startTime": "DATA_START_DATE",
    "endTime": "DATA_END_DATE"
  },
  "timeZone": {
    "id": "UTC"
  }
}
'@  | Out-File -FilePath request.json -Encoding utf8

然后,执行以下命令以发送 REST 请求:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata",
    "createTime": CREATE_TIME,
    "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
    "verb": "create",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": false
}

查看结果

使用 projects.locations.operations.get 方法检查是否已创建数据集。如果响应包含 "done": false,请重复该命令,直到响应包含 "done": true。这些操作可能需要几分钟到几个小时才能完成。

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_IDIAM 设置中列出的 Google Cloud 项目 ID
  • LOCATION:实例的位置;请使用某个受支持的区域
    显示位置
    • us-central1
    • us-east1
    • asia-south1
    • europe-west1
    • europe-west2
    • europe-west4
    • northamerica-northeast1
    • southamerica-east1
  • OPERATION_ID:操作的标识符

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"

PowerShell

执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata",
    "createTime": CREATE_TIME,
    "endTime": END_TIME,
    "target": "projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID",
    "verb": "create",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.financialservices.v1.Dataset",
    "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
    "createTime": CREATE_TIME,
    "updateTime": UPDATE_TIME,
    "tableSpecs": {
      "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
      "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
      "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
      "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
      "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
    },
    "state": "ACTIVE",
    "dateRange": {
      "start_time": "DATA_START_DATE",
      "end_time": "DATA_END_DATE"
    },
    "timeZone": {
      "id": "UTC"
    }
  }
}

获取数据集

如需获取数据集,请使用 projects.locations.instances.datasets.get 方法。

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_IDIAM 设置中列出的 Google Cloud 项目 ID
  • LOCATION:实例的位置;请使用某个受支持的区域
    显示位置
    • us-central1
    • us-east1
    • asia-south1
    • europe-west1
    • europe-west2
    • europe-west4
    • northamerica-northeast1
    • southamerica-east1
  • INSTANCE_ID:用户定义的实例标识符
  • DATASET_ID:用户定义的数据集标识符

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"

PowerShell

执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
  "createTime": CREATE_TIME,
  "updateTime": UPDATE_TIME,
  "tableSpecs": {
    "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
    "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
    "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
    "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
    "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
  },
  "state": "ACTIVE",
  "dateRange": {
    "start_time": "DATA_START_DATE",
    "end_time": "DATA_END_DATE"
  },
  "timeZone": {
    "id": "UTC"
  }
}

更新数据集

如需更新数据集,请使用 projects.locations.instances.datasets.patch 方法。

唯一可以更新的字段是 AML AI 中的标签字段。以下示例会更新与数据集关联的键值对用户标签

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_IDIAM 设置中列出的 Google Cloud 项目 ID
  • LOCATION:实例的位置;请使用某个受支持的区域
    显示位置
    • us-central1
    • us-east1
    • asia-south1
    • europe-west1
    • europe-west2
    • europe-west4
    • northamerica-northeast1
    • southamerica-east1
  • INSTANCE_ID:用户定义的实例标识符
  • DATASET_ID:用户定义的数据集标识符
  • KEY:键值对中用于组织数据集的键。如需了解详情,请参阅 labels
  • VALUE:用于整理数据集的键值对中的值。如需了解详情,请参阅 labels

请求 JSON 正文:

{
  "labels": {
    "KEY": "VALUE"
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:

cat > request.json << 'EOF'
{
  "labels": {
    "KEY": "VALUE"
  }
}
EOF

然后,执行以下命令以发送 REST 请求:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels"

PowerShell

将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:

@'
{
  "labels": {
    "KEY": "VALUE"
  }
}
'@  | Out-File -FilePath request.json -Encoding utf8

然后,执行以下命令以发送 REST 请求:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata",
    "createTime": CREATE_TIME,
    "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
    "verb": "update",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": false
}

如需详细了解如何获取长时间运行的操作 (LRO) 的结果,请参阅检查结果

列出数据集

如需列出给定实例的数据集,请使用 projects.locations.instances.datasets.list 方法。

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_IDIAM 设置中列出的 Google Cloud 项目 ID
  • LOCATION:实例的位置;请使用某个受支持的区域
    显示位置
    • us-central1
    • us-east1
    • asia-south1
    • europe-west1
    • europe-west2
    • europe-west4
    • northamerica-northeast1
    • southamerica-east1
  • INSTANCE_ID:用户定义的实例标识符

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets"

PowerShell

执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{
  "datasets": [
    {
      "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
      "createTime": CREATE_TIME,
      "updateTime": UPDATE_TIME,
      "tableSpecs": {
        "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
        "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
        "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
        "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
        "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
      },
      "state": "ACTIVE",
      "dateRange": {
        "start_time": "DATA_START_DATE",
        "end_time": "DATA_END_DATE"
      },
      "timeZone": {
        "id": "UTC"
      }
    }
  ]
}

删除数据集

如需删除数据集,请使用 projects.locations.instances.datasets.delete 方法。

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_IDIAM 设置中列出的 Google Cloud 项目 ID
  • LOCATION:实例的位置;请使用某个受支持的区域
    显示位置
    • us-central1
    • us-east1
    • asia-south1
    • europe-west1
    • europe-west2
    • europe-west4
    • northamerica-northeast1
    • southamerica-east1
  • INSTANCE_ID:用户定义的实例标识符
  • DATASET_ID:用户定义的数据集标识符

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"

PowerShell

执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata",
    "createTime": CREATE_TIME,
    "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
    "verb": "delete",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": false
}

如需详细了解如何获取长时间运行的操作 (LRO) 的结果,请参阅检查结果