Cortex Framework: integration with YouTube (with DV360)

This page describes the required configurations to bring data from YouTube with Display & Video 360 (DV360) as a data source of the marketing workload of Cortex Framework Data Foundation.

DV360 is an advertising platform by Google for managing YouTube advertising alongside other digital channels, offering advertisers greater control and efficiency in their marketing efforts.

The following diagram describes how DV360 data is available through the marketing workload of Cortex Framework Data Foundation:

DV360 data source

Figure 1. DV360 data source.

Configuration file

The config.json file configures the settings required to connect to data sources for transferring data from various workloads. This file contains the following parameters for DV360:

   "marketing": {
        "deployDV360": true,
        "DV360": {
            "deployCDC": true,
            "datasets": {
                "cdc": "",
                "raw": "",
                "reporting": "REPORTING_DV360"
            }
        }
   }

The following table describes the value for each marketing parameter:

Parameter Meaning Default Value Description
marketing.deployDV360 Deploy DV360 true Execute the deployment for DV360 data source.
marketing.DV360.deployCDC Deploy CDC scripts for DV360 true Generate DV360 CDC processing scripts to run as DAGs in Cloud Composer.
marketing.DV360.datasets.cdc CDC dataset for DV360 - CDC dataset for DV360.
marketing.DV360.datasets.raw Raw dataset for DV360 - Raw dataset for DV360.
marketing.DV360.datasets.reporting Reporting dataset for DV360 REPORTING_DV360 Reporting dataset for DV360.

Data Model

This section describes the Youtube (with DV360) Data Model using the Entity Relationship Diagram (ERD). For a better view of the diagram, see the repository source.

DV360)

Figure 2. Youtube (with DV360): Entity Relationship Diagram.

Reporting views

The blue objects in the ERD represent reporting views that contain aggregate metrics. See scripts in src/marketing/src/DV360/src/reporting/ddls.

Set up DV360 Raw data extraction

Cortex Framework integrates with DV360 using its Instant Reporting feature. Instant Reporting is an out-of-the-box feature of DV360 which allows larger and more complex reports to be saved, scheduled and downloaded to files and different locations. Each report periodically creates export data tables in BigQuery with the same configured table name prefix, that contains data for the selected date range, at the selected frequency.

Permissions

To successfully export raw data from DV360 using Instant Reporting, persons with the following permissions are required:

  • DV360:
  • Google Cloud:
    • BigQuery User role assigned to the DV360 service account.
    • BigQuery Data Editor role assigned to the DV360 service account.

Set up

Follow these steps to set up raw table export from DV360.

  1. Identify the service account required by the DV360 BigQuery Exporter by looking at the Link BigQuery Exporter Account, permission details section. See Figure 2 for more reference. Identify the service account required by the DV360
    Figure 2. Identifying the service account in the Link BigQuery Exporter Account
  2. Get Google Cloud permissions. A person with BigQuery Administrator role needs to grant BigQuery User and BigQuery Data Editor roles to the DV360 service account identified in the previous step.
  3. Link BigQuery to DV360, specifying the Raw dataset you intend to use in the process. Follow the instructions in Offline reporting > Enable BigQuery section.
  4. Go into Instant Reporting and create one report for each row in the following table, with the columns exactly as shown. For more information about detailed column schema definitions files under src/DV360/config/table_schema, see the Raw to CDC table schema section.

    1. Add at least one Partner or Advertiser filter. Ensure this aligns with the permissions of your BigQuery link in step 3.
    1. Select Advertiser time zone as the report's time zone setting. For more information, see the following section Additional considerations.
    Table Prefix Fields to include
    lineitem_details
    • Date
    • Line Item ID
    • Line Item
    • Campaign ID
    • Campaign
    • Line Item Type
    • Line Item Start Date
    • Line Item End Date
    • Impressions
    lineitem_insights
    • Date
    • Line Item ID
    • Device Type
    • Browser
    • Environment
    • Country
    • Partner ID
    • Partner
    • Partner Currency
    • Advertiser ID
    • Advertiser
    • Advertiser Currency
    • Campaign ID
    • Campaign
    • Insertion Order ID
    • Insertion Order
    • Line Item
    • Line Item Type
    • Line Item Start Date
    • Line Item End Date
    • Impressions
    • Clicks
    • Revenue (USD)
    • Engagements
    • Revenue (Partner Currency)
    • Revenue (Adv Currency)
    • TrueView: Views
    adgroup_insights_by_age_gender
    • Date
    • YouTube Ad Group ID
    • Age (YouTube)
    • Gender
    • Line Item ID
    • Partner ID
    • Partner
    • Partner Currency
    • AdvertiserID
    • Advertiser
    • Advertiser Currency
    • Insertion Order ID
    • Insertion Order
    • Line Item
    • YouTube AdGroup
    • Revenue (USD)
    • Impressions
    • Clicks
    • Engagements (YouTube)
    • Revenue (Partner Currency)
    • Revenue (Adv Currency)
    • TrueView: Views
    adgroup_insights_by_audience
    • Date
    • YouTube Ad Group ID
    • Audience segment
    • Audience segment type
    • Line Item ID
    • Partner ID
    • Partner
    • PartnerCurrency
    • Advertiser ID
    • Advertiser
    • Advertiser Currency
    • Insertion Order ID
    • Insertion Order
    • Line Item
    • Impressions
    • Clicks
    • YouTube Ad Group
    • Revenue (USD)
    • Engagements (YouTube)
    • Revenue (PartnerCurrency)
    • Revenue (Adv Currency)
    • TrueView: Views
    adgroup_insights_by_adformat
    • Date
    • YouTube Ad Group ID
    • YouTube Ad Format
    • Line Item ID
    • Partner ID
    • Partner
    • Partner Currency
    • Advertiser ID
    • Advertiser
    • Advertiser Currency
    • Insertion Order ID
    • Insertion Order
    • Line Item
    • Impressions
    • Clicks
    • YouTube AdGroup
    • Revenue (USD)
    • Engagements (YouTube)
    • Revenue (Partner Currency)
    • Revenue (Adv Currency)
    • TrueView: Views
    adgroup_insights_by_placement
    • Date
    • YouTube Ad Group ID
    • Placement (All YouTube Channels)
    • Placement Name (All YouTube Channels)
    • Line Item ID
    • PartnerID
    • Partner
    • Partner Currency
    • Advertiser ID
    • Advertiser
    • Advertiser Currency
    • Insertion Order ID
    • Insertion Order
    • Line Item
    • Impressions
    • YouTube Ad Group
    • Revenue (USD)
    • Engagements (YouTube)
    • Revenue (PartnerCurrency)
    • Revenue (Adv Currency)
    • TrueView: Views
    adgroup_insights_by_adtype
    • Date
    • YouTube Ad Group ID
    • YouTube Ad Type
    • Line Item ID
    • Partner ID
    • Partner
    • Partner Currency
    • Advertiser ID
    • Advertiser
    • Advertiser Currency
    • Insertion Order ID
    • Insertion Order
    • Line Item
    • Impressions
    • Clicks
    • YouTube AdGroup
    • Revenue (USD)
    • Engagements (YouTube)
    • Revenue (Partner Currency)
    • Revenue (Adv Currency)
    • TrueView: Views
    ad_insights
    • Date
    • YouTube Ad ID
    • Line Item ID
    • Partner ID
    • Partner
    • Partner Currency
    • Advertiser ID
    • Advertiser
    • AdvertiserCurrency
    • Insertion Order ID
    • Insertion Order
    • Line Item
    • YouTube Ad Group ID
    • YouTube AdGroup
    • YouTube Ad
    • Impressions
    • Clicks
    • Revenue (USD)
    • Engagements (YouTube)
    • Revenue (Partner Currency)
    • Revenue (AdvCurrency)
    • TrueView: Views
  5. Set up the schedule and fill in historical data. Make sure you use the same BigQuery link as configured in the step 1, as well as the table prefix exactly as shown in the previous table. For more information, see the following section Additional considerations.

    • Manually trigger a backfill run, or wait for the scheduled export to start. Either way, your data automatically lands in the Raw dataset configured for your BigQuery link.

Additional considerations

Regarding time zone selection:

  • For some report types, you have the option to choose Advertiser time zone or UTC time zone for dates at the time of setting up the export. However, Youtube-related reports supports Advertiser time zone only. Therefore, make sure to select Advertiser time zone setting when setting up the exports.

  • As the report export is pre-aggregated to daily level, Cortex Framework uses the date provided directly without timezone conversions. You are responsible for interpreting the reported time zone information.

Regarding scheduling and backfilling historical data:

  • We recommend to select last 7 days as the export date range and Daily as export frequency. Selecting an export date range and an export frequency is necessary to strike a balance between data accuracy and storage cost, although any setting will work with our CDC process.

  • Instant Reporting supports exporting up to 14 days of data when exporting to BigQuery, although DV360 may occasionally make minor updates to data, up to 31 days after the report date. If you set up export for multiple Advertisers, each advertiser's date will be in their own time zones.

  • If historical data is required, as non-scheduled one-offs, Instant Reporting allows for exporting of up to 2 years worth of data into BigQuery tables directly. Make sure to do this at least once, manually from the DV360 UI.

Configurations

This section describes the configurations for the data process.

Cloud Composer connections

Create the following connections in Cloud Composer. See more details in Cloud Composer documentation.

Connection Name Purpose
dv360_cdc_bq For Raw dataset > CDC dataset transfer.
dv360_reporting_bq For CDC dataset > Reporting dataset transfer.

Raw to CDC table schema

The directory src/DV360/config/table_schema contains one schema file per set of tables (identified by the same prefix), that is exported from DV360. Each schema file contains two columns: ColumnName and ColumnDataType.

Ingestion settings

The file src/DV360/config/ingestion_settings.yaml contains further settings that control Source to CDC data pipelines.

Source to CDC tables

This section has entries that control how DV360 exported tables are ingested and relevant CDC process behavior. Each entry corresponds with one Instant Reporting report, identified by export prefix.

The following parameters control the settings for Source to CDC for each entry:

Parameter Description
base_table Table in CDC dataset where the raw data after CDC transformation is stored (for example, customer).
load_frequency How frequently a DAG for this entity runs to populate the CDC table. For more information about possible values, see Airflow documentation.
raw_table_prefix Prefix used by the corresponding Instant Reporting report to set up this BigQuery Export.
row_identifiers Columns (separated by comma) that form a unique record for this table.
partition_details Optional: If you want this table to be partitioned for performance considerations. For more information, see Table Partition.
cluster_details Optional: If you want this table to be clustered for performance considerations, see more information in Cluster Settings.

Reporting settings

You can configure and control how Cortex Framework generates data for the DV360 final reporting layer using the reporting settings file src/DV360/config/reporting_settings.yaml. This file controls how reporting layer BigQuery objects (tables, views,functions or stored procedures) are generated.

For more information, see Customizing reporting settings file.

What's next?