Integration with TikTok

This page describes the required configurations to bring data from TikTok as a data source of the marketing workload of Cortex Framework Data Foundation.

TikTok is a popular social media app known for short-form videos that Cortex Framework can bring data to analyze an overall marketing performance. By combining data from TikTok and various sources, you can gain a more comprehensive understanding of your target audience and the effectiveness of your social media campaigns across different platforms.

The following diagram describes how TikTok data is available through the marketing workload of Cortex Framework Data Foundation:

TikTok data source

Figure 1. TikTok data source.

Configuration file

The config.json file configures the settings required to connect to data sources for transferring data from various workloads. This file contains the following parameters for TikTok:

   "marketing": {
        "deployTikTok": true,
        },
        "TikTok": {
            "deployCDC": true,
            "datasets": {
                "cdc": "",
                "raw": "",
                "reporting": "REPORTING_TikTok"
            }
        }

The following table describes the value for each marketing parameter:

Parameter Meaning Default Value Description
marketing.deployTikTok Deploy TikTok true Execute the deployment for TikTok data source.
marketing.TikTok.deployCDC Deploy CDC scripts for TikTok true Generate TikTok CDC processing scripts to run as DAGs in Cloud Composer.
marketing.TikTok.datasets.cdc CDC dataset for TikTok CDC dataset for TikTok.
marketing.TikTok.datasets.raw Raw dataset for TikTok Raw dataset for TikTok.
marketing.TikTok.datasets.reporting Reporting dataset for TikTok "REPORTING_TikTok" Reporting dataset for TikTok.

Data Model

This section describes the TikTok Data Model using the Entity Relationship Diagram (ERD).

Entity Relationship Diagram for TikTok

Figure 2. TikTok: Entity Relationship Diagram.

Base views

These are the blue objects in the ERD and are views on CDC tables with no transforms other than some column name aliases. See scripts in src/marketing/src/TikTok/src/reporting/ddls.

Reporting views

These are the green objects in the ERD and are reporting views that contain aggregate metrics. See scripts in src/marketing/src/TikTok/src/reporting/ddls.

API connection

Cortex Framework uses TikTok Reporting APIs, version v1.3, as the authoritative source for TikTok data. Cortex Framework uses the synchronous mode and calls Basic Reporting APIs to retrieve performance metrics for advertisements and ad groups. This ensures that Cortex Framework has access to up-to-date and accurate information from TikTok, enabling effective data analysis and reporting.

For more information about the API connection, see TikTok Reporting APIs.

Account authentication

To configure a TikTok account and account authentication, follow these steps:

  1. Set up a TikTok Developer Account, if you don't have it already.
  2. Create an app for Cortex Framework integration. See TikTok API for Business for more information. Ensure you select the following two in the scopes for the app:
    • Ad Account Management/Ad Account Information
    • Reporting/All
  3. Get app ID, secret and long term access token as described in the TikTok guide, and store them respectively in Secret Manager with the following names:
    • App ID: cortex_tiktok_app_id
    • Secret: cortex_tiktok_app_secret
    • Long term access token: cortex_tiktok_access_token

Data Freshness and Delay

As a general rule, data freshness for Cortex Framework data sources is limited by what upstream connection allows for, as well as the frequency of your DAG execution. Adjust your DAG execution frequency to align with upstream frequency, resource constraints, and your business needs.

With TikTok Marketing API, most data (excluding conversions) is available near real time.

Cloud Composer connections

Create the following connections in Cloud Composer. For more details, see Manage Airflow connections documentation.

Connection Name Purpose
tiktok_raw_dataflow For TikTok API > BigQuery Raw Dataset
tiktok_cdc_bq For Raw dataset > CDC dataset transfer
tiktok_reporting_bq For CDC dataset > Reporting dataset transfer

Cloud Composer service account permissions

Grant Dataflow permissions to the service account used in Cloud Composer (as configured in the tiktok_raw_dataflow connection). See instructions in Dataflow documentation.

Also, the same service account should also have Secret Manager Accessor access.

Ingestion settings

Control Source to Raw and Raw to CDC data pipelines through the settings in the file src/TikTok/config/ingestion_settings.yaml. This section describes the parameters of each data pipeline.

Source to raw tables

This section has entries that control how data from TikTok is fetched and where data end up in the raw dataset. Each entry corresponds with one raw table that has data fetched from TikTok API for that entity. Based on this configuration parameters, Cortex Framework creates Airflow DAGs that run Dataflow pipelines to process data from TikTok APIs.

The following parameters control the settings for Source to Raw for each entry:

Parameter Description
base_table Table in Raw dataset where the data for an entity is stored(for example, 'Ad' data).
load_frequency How often a DAG is run for this entity to process data. See Airflow documentation for details on possible values.
schema_file Schema file in src/table_schema directory that maps API response fields to destination table's column names.
partition_details Optional: If you want this table to be partitioned for performance considerations. For more information, see Table Partition.
cluster_details Optional: If you want this table to be clustered for performance considerations. For more information, see Cluster Settings.

Raw to CDC tables

This section has entries that control how data moves from raw tables to CDC tables. Each entry corresponds with a CDC table (which in turn corresponds with an entity mentioned for the Source to Raw table.)

The following parameters control the settings for Raw to CDC for each entry:

Parameter Description
base_table Table in CDC dataset where the raw data after CDC transformation is stored (for example,auction_ad_performance)
load_frequency How frequently a DAG for this entity runs to populate the CDC table. For more information, see Airflow documentation for details on possible values.
row_identifiers List of columns (separated by comma) that forms a unique record for this table.
partition_details Optional: If you want this table to be partitioned for performance considerations. For more information, see Table Partition.
cluster_details Optional: If you want this table to be clustered for performance considerations. For more information, see Cluster Settings.

Reporting settings

Configure and control how Cortex Framework generates data for the TikTok final reporting layer using the reporting settings filesrc/TikTok/config/reporting_settings.yaml. This file controls how reporting layer BigQuery objects (tables, views, functions or stored procedures) are generated.

For more information, see Customizing reporting settings file.

What's next?