Configure and use entity resolution in BigQuery
This document shows how to implement entity resolution for entity resolution end users (hereafter referred to as end users) and identity providers.
End users can use this document to connect with an identity provider and use the provider's service to match records. Identity providers can use this document to set up and configure services to share with end users on the Google Cloud Marketplace.
Workflow for end users
The following sections show end users how to configure entity resolution in BigQuery. For a visual representation of the complete setup, see the architecture for entity resolution.
Before you begin
- Contact and establish a relationship with an identity provider. BigQuery supports entity resolution with LiveRamp.
- Acquire the following items from the identity provider:
- Service account credentials
- Remote function signature
- Create two datasets in your project:
- Input dataset
- Output dataset
Required roles
To get the permissions that you need to run entity resolution jobs, ask your administrator to grant you the following IAM roles:
-
For the identity provider's service account to read the input dataset and write to the output dataset:
-
BigQuery Data Viewer (
roles/bigquery.dataViewer
) on the input dataset -
BigQuery Data Editor (
roles/bigquery.dataEditor
) on the output dataset
-
BigQuery Data Viewer (
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Translate or resolve entities
For specific identity provider instructions, refer to the following sections.
LiveRamp
Prerequisites
- Configure LiveRamp Embedded Identity in BigQuery. For more information, see Enabling LiveRamp Embedded Identity in BigQuery.
- Coordinate with LiveRamp to enable API credentials for use with Embedded Identity. For more information, see Authentication.
Setup
The following steps are required when you use LiveRamp Embedded Identity for the first time. After setup is complete, only the input table and metadata table need to be modified between runs.
Create an input table
Create a table in the input dataset. Populate the table with RampIDs, target domains, and target types. For details and examples, see Input Table Columns and Descriptions.
Create a metadata table
The metadata table is used to control the execution of LiveRamp Embedded Identity on BigQuery. Create a metadata table in the input dataset. Populate the metadata table with client IDs, execution modes, target domains, and target types. For details and examples, see Metadata Table Columns and Descriptions.
Share tables with LiveRamp
Grant the LiveRamp Google Cloud service account access to view and process data in your input dataset. For details and examples, see Share Tables and Datasets with LiveRamp.
Run an embedded identity job
To run an embedded identity job with LiveRamp in BigQuery, do the following:
- Confirm that all RampIDs that were encoded in your domain are in your input table.
- Confirm that your metadata table is still accurate before you run the job.
- Contact LiveRampIdentitySupport@liveramp.com with a job process request. Include the project ID, dataset ID, and table ID (if applicable) for your input table, metadata table, and output dataset. For more information, see Notify LiveRamp to Initiate Transcoding.
Results are generally delivered to your output dataset within three business days.
LiveRamp support
For support issues, contact LiveRamp Identity Support.
LiveRamp billing
LiveRamp handles billing for entity resolution.
Workflow for identity providers
The following sections show identity providers how to configure entity resolution in BigQuery. For a visual representation of the complete setup, see the architecture for entity resolution.
Before you begin
- Create a Cloud Run job or a Cloud Run function to integrate with the remote function. Both options are suitable for this purpose.
Note the name of the service account that's associated with the Cloud Run or Cloud Run function:
In the Google Cloud console, go to the Cloud Functions page.
Click the function's name, and then click the Details tab.
In the General Information pane, find and note the service account name for the remote function.
Create a remote function.
Collect end-user principals from the end user.
Required roles
To get the permissions that you need to run entity resolution jobs, ask your administrator to grant you the following IAM roles:
-
For the service account that's associated with your function to read and write on associated datasets and launch jobs:
-
BigQuery Data Editor (
roles/bigquery.dataEditor
) on the project -
BigQuery Job User (
roles/bigquery.jobUser
) on the project
-
BigQuery Data Editor (
-
For the end-user principal to see and connect to the remote function:
-
BigQuery Connection User (
roles/bigquery.connectionUser
) on the connection -
BigQuery Data Viewer (
roles/bigquery.dataViewer
) on the control plane dataset with the remote function
-
BigQuery Connection User (
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Share entity resolution remote function
Modify and share the following remote interface code with the end user. The end user needs this code to start the entity resolution job.
`PARTNER_PROJECT_ID.DATASET_ID`.match`(LIST_OF_PARAMETERS)
Replace LIST_OF_PARAMETERS with the list of parameters that are passed to the remote function.
Optional: Provide job metadata
You can optionally provide job metadata by using a separate remote function or by writing a new status table in the user's output dataset. Examples of metadata include job statuses and metrics.
Billing for identity providers
To streamline customer billing and onboarding, we recommend that you integrate your entity resolution service with the Google Cloud Marketplace. This lets you set up a pricing model based on the entity resolution job usage, with Google handling the billing for you. For more information, see Offering software as a service (SaaS) products.