此页面由 Cloud Translation API 翻译。

使用 Salesforce Batch Source 插件来分析 BigQuery 中的潜在客户数据

了解如何在 Cloud Data Fusion 中使用 Salesforce Batch Source 插件来分析 BigQuery 中的潜在客户数据。

如需在 Google Cloud 控制台中直接遵循有关此任务的分步指导，请点击操作演示：

场景

假设一位营销经理正在策划一项高度精细的电子邮件营销活动，以宣传新产品。您在 Salesforce Sales Cloud 中有一个潜在客户名单。在制作定位广告系列之前，为了更好地了解目标受众群体，您希望使用 Cloud Data Fusion 中的 Salesforce Batch Source 插件提取特定的潜在客户数据。

准备工作

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Cloud Data Fusion, BigQuery, Cloud Storage, and Dataproc APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.
Enable the APIs

创建 Cloud Data Fusion 实例。
通过为 Cloud Data Fusion 创建 Salesforce 关联的应用，配置与 Salesforce API 的连接。

管理权限

创建并分配所需的自定义角色和权限。

创建自定义角色并添加权限

在 Google Cloud 控制台中，前往角色页面：

前往角色
点击 创建角色。
在书名字段中，输入 Custom Role-Tutorial。
点击 添加权限。
在添加权限窗口中，选择以下权限，然后点击添加：
- bigquery.datasets.create
- bigquery.jobs.create
- storage.buckets.create
点击创建。

为默认的 Compute Engine 服务账号分配自定义角色

前往 Cloud Data Fusion 实例页面：
前往“实例”页面
点击您实例的名称。
记下默认的 Compute Engine 服务账号。实例详情页面包含此信息。

Cloud Data Fusion 默认 Compute Engine 服务账号名称的格式为 CUSTOMER_PROJECT_NUMBER-compute@developer.gserviceaccount.com。
转到 IAM 页面：

进入 IAM
在 Filer 栏中，输入您的默认 Compute Engine 服务账号的名称。
对于默认的 Compute Engine 服务账号，请点击修改。
点击 添加其他角色。
在请选择一个角色字段中，选择 Custom Role-Tutorial（自定义角色 - 教程）。
点击保存。

配置 Cloud Data Fusion Salesforce Batch Source 插件

前往 Cloud Data Fusion 实例页面：
前往“实例”页面
对于您的实例，请点击查看实例。系统随即会打开 Cloud Data Fusion 网页界面。
前往 Studio 页面。
点击 Hub。
在搜索栏中，输入 Salesforce。
点击 Salesforce plugins（Salesforce 插件），然后点击部署。
在 Salesforce 插件部署窗口中，点击完成。

部署完成后，系统会显示一个对话框，其中包含成功消息。
在该对话框中，点击创建流水线。

此时会显示 Cloud Data Fusion Studio 页面。
选择 Data pipeline - batch（数据流水线 - 批量）作为数据流水线的类型。
在来源菜单中，点击 Salesforce。
转到 Salesforce 节点，然后点击属性。这会打开 Salesforce 插件属性页面。
在 Reference name（参考名称）字段中，输入来源的名称。例如 Leads_generated。
在连接部分，点击使用连接切换开关。
点击浏览连接。系统随即会打开 Browse connections（浏览连接）窗口。
点击添加连接，然后选择 Salesforce。
在 Create a Salesforce connection（创建 Salesforce 连接）窗口中，点击配置标签页，然后执行以下操作：
1. 在名称字段中，输入用于标识连接的名称，例如 Salesforce_connection。
2. 在凭据部分中，输入 Salesforce 账号的以下详细信息：
  - 用户名
  - 密码
  - 使用方密钥
  - 使用方密钥
  - 安全令牌
3. 点击测试连接。如果输入的详细信息正确无误，测试会成功，并显示“已成功连接”的消息。
4. 点击创建。
5. 选择 Salesforce_connection 并返回 Salesforce 插件属性页面。

从 Salesforce Batch Source 插件提取数据

在 Salesforce 插件属性页面的 SOQL query（SOQL 查询）部分，输入以下查询：
```
Select LastName,FirstName,Company,Email,Phone,LeadSource,Industry,OwnerId,CreatedDate,LastModifiedDate,LastActivityDate from Lead where Status like '%Open%'
```
此查询从 sObject Lead 中提取投放广告系列所需的潜在客户的详细信息。

重要提示：在处理之前提取的字段中的隐私数据方面，请遵循您的组织政策。如需详细了解如何在 Cloud Data Fusion 流水线中隐去私有数据，请参阅隐去机密数据。
如需确定对象架构的有效性，请点击获取架构。
如需针对广告系列投放，按特定日期或时间过滤记录，请使用以下字段：
- 上次修改日期晚于
- 上次修改时间早于
- 时长
- 偏移值

使用 Wrangler 插件转换数据

使用 Cloud Data Fusion 中的 Wrangler 插件可清理并丰富您的数据：

返回 Studio 页面。
在转换菜单中，点击 Wrangler。
将 Wrangler 连接到 Salesforce Batch Source 插件。
前往 Wrangler 插件，然后点击属性。这会打开 Wrangler 插件属性页面。
确保已填充 Input schema（输入架构）。
点击 Wrangler。
在连接窗格中，选择一个有效的连接。
选择要转换的 sObject，例如 Lead。

使用所需的指令转换数据：

keep :LastName,:FirstName,:Company,:Phone,:Email,:LeadSource,:OwnerId,
:CreatedDate,:LastModifiedDate,:LastActivityDatemerge :FirstName :LastName :Name ' '
fill-null-or-empty :Email 'no email found'
mask-number :Phone ########xxxxxxxx
format-date :LastActivityDate yyyy-MM-dd HH:mm:ss
drop :LastName,:FirstName

将数据加载到 BigQuery 中

返回 Studio 页面。
在水槽菜单中，点击 BigQuery。
转到 BigQuery 节点，然后点击属性。这会打开 BigQuery 插件属性页面。
在基本部分的 Reference name（参考名称）字段中，输入用于标识此接收器的名称。例如 Leads_generated。
在数据集字段中，输入表所属的数据集。例如 Salesforce_Leads。
在表字段中，输入需要用于存储所提取记录的表。例如 Incoming_Open_Leads。
如需验证插件，请点击验证。

部署、安排和运行流水线

如需部署流水线，请点击部署。
如需使用调度器设置适当的刷新时间表，请按以下步骤操作：
1. 点击时间表。
2. 输入以下详细信息：
  - 流水线运行重复
  - 重复频率
  - 起价
  - 最大并发运行数量
  - 计算配置文件
3. 点击 Save and start schedule（保存并启动时间表）。
注意：默认情况下，数据扩展的完全刷新每 24 小时执行一次。在设置中，hourly 时间表频率是指 Salesforce Batch Source 插件从 Salesforce Sales Cloud 查找增量数据的频率。
如需运行流水线，请点击运行。

验证数据提取和注入

在 Google Cloud 控制台中，前往 BigQuery 页面：

转到 BigQuery
搜索数据集 Salesforce_Leads 和表名称 Incoming_Open_Leads 以查看提取的记录。
如需运行查询，请点击查询。

分析潜在客户数据，更好地了解您的受众群体并大规模投放量身定制的广告系列。

清理

为避免因本页中使用的资源导致您的 Google Cloud 账号产生费用，请按照以下步骤操作。

删除 Cloud Data Fusion 实例

请按照以下说明删除 Cloud Data Fusion 实例。

删除项目

为了避免产生费用，最简单的方法是删除您为本教程创建的项目。

要删除项目，请执行以下操作：

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

后续步骤

探索 Cloud Data Fusion 插件。