This page shows you the steps to create and manage AML AI datasets. A dataset is used as an input for the engine configuration, training, backtest, and prediction pipelines. An AML AI dataset contains references to BigQuery tables matching the AML AI input data model in a Google Cloud project.
Prerequisites
-
To get the permissions that you need to create and manage datasets, ask your administrator to grant you the Financial Services Admin (
financialservices.admin
) IAM role on your project. For more information about granting roles, see Manage access to projects, folders, and organizations.You might also be able to get the required permissions through custom roles or other predefined roles.
- Create an instance
-
Some API methods return a long-running operation (LRO). These methods are asynchronous and return an Operation object; for details, see REST Reference. The operation might not be completed when the method returns a response. For these methods, send the request and then check for the result. In general, all POST, PUT, UPDATE, and DELETE operations are long-running.
Create a dataset
To create a dataset, send the create request and then check for the result of the LRO.
Send the request
To create a dataset, use the
projects.locations.instances.datasets.create
method.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regionsShow locationsus-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: the user-defined identifier for the instanceDATASET_ID
: a user-defined identifier for the AML AI dataset; use only lowercase letters, numbers, dashes, and underscores (for example,train_jan2018_apr2020
)BQ_INPUT_DATASET_NAME
: the BigQuery input dataset namePARTY_TABLE
: the Party table in the BigQuery input datasetACCOUNT_PARTY_LINK_TABLE
: the AccountPartyLink table in the BigQuery input datasetTRANSACTION_TABLE
: the Transaction table in the BigQuery input datasetRISK_CASE_EVENT_TABLE
: the RiskCaseEvent table in the BigQuery input datasetPARTY_SUPPLEMENTARY_DATA
: the PartySupplementaryData table in the BigQuery input dataset; this table is optional and can be removed from the request JSONDATA_START_DATE
: the start date and time of the data to use in the dataset; use RFC3339 UTC "Zulu" format (for example,2014-10-02T15:01:23Z
)DATA_END_DATE
: the end date and time of the data to use in the dataset; use RFC3339 UTC "Zulu" format (for example,2014-10-02T15:01:23Z
)
Request JSON body:
{ "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "dateRange": { "startTime": "DATA_START_DATE", "endTime": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "dateRange": { "startTime": "DATA_START_DATE", "endTime": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } EOF
Then execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "dateRange": { "startTime": "DATA_START_DATE", "endTime": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "verb": "create", "requestedCancellation": false, "apiVersion": "v1" }, "done": false }
Copy the returned OPERATION_ID
to use in the next section.
Check for the result
Use the
projects.locations.operations.get
method to check if the dataset has been created. If the response contains
"done": false
, repeat the command until the response contains "done": true
.
These operations can take a few minutes to several hours to complete.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regionsShow locationsus-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
OPERATION_ID
: the identifier for the operation
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": "2023-03-14T15:52:55.358979323Z", "endTime": "2023-03-14T16:52:55.358979323Z", "target": "projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID", "verb": "create", "requestedCancellation": false, "apiVersion": "v1" }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.Dataset", "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "createTime": CREATE_TIME, "updateTime": UPDATE_TIME, "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "state": "ACTIVE", "dateRange": { "start_time": "DATA_START_DATE", "end_time": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } }
Get a dataset
To get a dataset, use the
projects.locations.instances.datasets.get
method.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regionsShow locationsus-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: the user-defined identifier for the instanceDATASET_ID
: the user-defined identifier for the dataset
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "createTime": CREATE_TIME, "updateTime": UPDATE_TIME, "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "state": "ACTIVE", "dateRange": { "start_time": "DATA_START_DATE", "end_time": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } }
Update a dataset
To update a dataset, use the
projects.locations.instances.datasets.patch
method.
The only fields which can be updated are label fields in AML AI. The following example updates the key-value pair user labels associated with the dataset.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regionsShow locationsus-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: a user-defined identifier for the instanceDATASET_ID
: the user-defined identifier for the datasetKEY
: The key in a key-value pair used to organize datasets. Seelabels
for more information.VALUE
: The value in a key-value pair used to organize datasets. Seelabels
for more information.
Request JSON body:
{ "labels": { "KEY": "VALUE" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "labels": { "KEY": "VALUE" } } EOF
Then execute the following command to send your REST request:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "labels": { "KEY": "VALUE" } } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "verb": "update", "requestedCancellation": false, "apiVersion": "v1" }, "done": false }
For more information on how to get the result of the long-running operation (LRO), see Check for the result.
List the datasets
To list the datasets for a given instance, use the
projects.locations.instances.datasets.list
method.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regionsShow locationsus-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: the user-defined identifier for the instance
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "datasets": [ { "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "createTime": CREATE_TIME, "updateTime": UPDATE_TIME, "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "state": "ACTIVE", "dateRange": { "start_time": "DATA_START_DATE", "end_time": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } ] }
Delete a dataset
To delete a dataset, use the
projects.locations.instances.datasets.delete
method.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regionsShow locationsus-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: the user-defined identifier for the instanceDATASET_ID
: the user-defined identifier for the dataset
To send your request, choose one of these options:
curl
Execute the following command:
curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "verb": "delete", "requestedCancellation": false, "apiVersion": "v1" }, "done": false }
For more information on how to get the result of the long-running operation (LRO), see Check for the result.