This page shows you how to create and manage AML AI datasets. A dataset is used as an input for the training, prediction, and backtesting pipelines. A dataset contains references to BigQuery tables in a Google Cloud project.
You only need to create the dataset at this point. The other dataset methods are provided as a convenience.
Before you begin
-
To get the permissions that you need to create and manage datasets, ask your administrator to grant you the Financial Services Admin (
financialservices.admin
) IAM role on your project. For more information about granting roles, see Manage access.You might also be able to get the required permissions through custom roles or other predefined roles.
- Create an instance
Create a dataset
Some API methods return a long-running operation (LRO). These methods are asynchronous. The operation might not be completed when the method returns a response. For these methods, send the request and then check for the result.
Send the request
To create a dataset, use the
projects.locations.instances.datasets.create
method.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regions:us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: the user-defined identifier for the instanceDATASET_ID
: a user-defined identifier for the AML AI dataset; use only lowercase letters, numbers, dashes, and underscores (for example,train_jan2018_apr2020
)BQ_INPUT_DATASET_NAME
: the BigQuery input dataset namePARTY_TABLE
: the Party table in the BigQuery input datasetACCOUNT_PARTY_LINK_TABLE
: the AccountPartyLink table in the BigQuery input datasetTRANSACTION_TABLE
: the Transaction table in the BigQuery input datasetRISK_CASE_EVENT_TABLE
: the RiskCaseEvent table in the BigQuery input datasetPARTY_SUPPLEMENTARY_DATA
: the PartySupplementaryData table in the BigQuery input dataset; this table is optional and can be removed from the request JSONDATA_START_DATE
: the start date and time of the data to use in the dataset; use RFC3339 UTC "Zulu" format (for example,2014-10-02T15:01:23Z
)DATA_END_DATE
: the end date and time of the data to use in the dataset; use RFC3339 UTC "Zulu" format (for example,2014-10-02T15:01:23Z
)
Request JSON body:
{ "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "dateRange": { "startTime": "DATA_START_DATE", "endTime": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "dateRange": { "startTime": "DATA_START_DATE", "endTime": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } EOF
Then execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "dateRange": { "startTime": "DATA_START_DATE", "endTime": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "verb": "create", "requestedCancellation": false, "apiVersion": "v1" }, "done": false }
Check for the result
Use the
projects.locations.operations.get
method to check if the dataset has been created. If the response contains
"done": false
, repeat the command until the response contains "done": true
.
These operations can take a few minutes to several hours to complete.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regions:us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
OPERATION_ID
: the identifier for the operation
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "endTime": END_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID", "verb": "create", "requestedCancellation": false, "apiVersion": "v1" }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.Dataset", "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "createTime": CREATE_TIME, "updateTime": UPDATE_TIME, "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "state": "ACTIVE", "dateRange": { "start_time": "DATA_START_DATE", "end_time": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } }
Optional methods
The following dataset methods are provided as a convenience.
Get a dataset
To get a dataset, use the
projects.locations.instances.datasets.get
method.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regions:us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: the user-defined identifier for the instanceDATASET_ID
: the user-defined identifier for the dataset
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "createTime": CREATE_TIME, "updateTime": UPDATE_TIME, "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "state": "ACTIVE", "dateRange": { "start_time": "DATA_START_DATE", "end_time": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } }
Update a dataset
To update a dataset, use the
projects.locations.instances.datasets.patch
method.
Not all fields in a dataset can be updated. The following example updates the key-value pair user labels associated with the dataset.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regions:us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: a user-defined identifier for the instanceDATASET_ID
: the user-defined identifier for the datasetKEY
: The key in a key-value pair used to organize datasets. Seelabels
for more information.VALUE
: The value in a key-value pair used to organize datasets. Seelabels
for more information.
Request JSON body:
{ "labels": { "KEY": "VALUE" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "labels": { "KEY": "VALUE" } } EOF
Then execute the following command to send your REST request:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "labels": { "KEY": "VALUE" } } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "verb": "update", "requestedCancellation": false, "apiVersion": "v1" }, "done": false }
For more information on how to get the result of the long-running operation (LRO), see Check for the result.
List the datasets
To list the datasets for a given instance, use the
projects.locations.instances.datasets.list
method.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regions:us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: the user-defined identifier for the instance
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "datasets": [ { "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "createTime": CREATE_TIME, "updateTime": UPDATE_TIME, "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "state": "ACTIVE", "dateRange": { "start_time": "DATA_START_DATE", "end_time": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } ] }
Delete a dataset
To delete a dataset, use the
projects.locations.instances.datasets.delete
method.
Before using any of the request data, make the following replacements:
PROJECT_ID
: your Google Cloud project ID listed in the IAM SettingsLOCATION
: the location of the instance; use one of the supported regions:us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
INSTANCE_ID
: the user-defined identifier for the instanceDATASET_ID
: the user-defined identifier for the dataset
To send your request, choose one of these options:
curl
Execute the following command:
curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "verb": "delete", "requestedCancellation": false, "apiVersion": "v1" }, "done": false }
For more information on how to get the result of the long-running operation (LRO), see Check for the result.