Integration with Google Analytics 4
This page describes the required configurations to bring data from Google Analytics 4 (GA4) as a data source of the marketing workload of Cortex Framework Data Foundation.
GA4 is the latest version of Google Analytics. It provides a holistic view of user behavior, focusing on event-based tracking and machine learning to offer deeper insights. Cortex Framework lets you extract data from GA4 and integrate it into BigQuery for further analysis and reporting. You can gain valuable insights and drive better business outcomes.
The following diagram describes how GA4 data is available through the marketing workload of Cortex Framework Data Foundation:
Configuration file
The config.json
file configures the settings required to connect to data sources for transferring
data from various workloads. This file contains the following parameters for GA4:
"marketing": {
"deployGA4": true,
"GA4": {
"datasets": {
"cdc": [
{"property_id": 0, "name": ""}
],
"reporting": "REPORTING_GA4"
}
}
}
The following table describes the value for each marketing parameter:
Parameter | Meaning | Default Value | Description |
marketing.deployGA4
|
Deploy GA4 | true
|
Execute the deployment for GA4 data source. |
marketing.GA4.datasets.cdc
|
BigQuery Export datasets for GA4 | [{"property_id": 0, "name": ""}]
|
Array of Google Analytics 4 BigQuery
Export datasets. Each element specifies Property ID as
INT , as well as its corresponding BigQuery Export dataset name.
|
marketing.GA4.datasets.reporting
|
Reporting dataset for GA4 | REPORTING_GA4
|
Reporting dataset for GA4. |
Data Model
This section describes the GA4 Data Model using the Entity Relationship Diagram (ERD).
Base views
These are the blue objects in the ERD and are views on CDC tables with
minimal transformations to unpack complex data structures. See scripts in
src/marketing/src/GA4/src/reporting/ddls
.
Reporting views
These are the green objects in the ERD and are reporting views that contain
aggregate metrics. See scripts in
src/marketing/src/GA4/src/reporting/ddls
.
Configure integration for GA4
Cortex Framework Data Foundation integrates with GA4 by creating a Reporting layer on top of GA4's BigQuery Export datasets (treated as CDC datasets in Cortex Framework architecture). This is accomplished by creating runtime views on top of CDC tables or running Cloud Composer DAGs for materialized data in BigQuery tables, depending on the reporting settings configuration.
Set up GA4 BigQuery Export
Cortex Framework uses GA4's BigQuery Export feature to load data from the source system into BigQuery. Follow the instructions for setting up BigQuery Export or each GA4 property in this GA4 Help article: GA4 - Set up BigQuery Export.
Known issues, limitations, and other considerations
Consider the following when setting up GA4 BigQuery Export:
- Backfilling: GA4 BigQuery Export starts from the day it is set up and there is no backfilling.
- Difference between GA4 UI and Cortex Framework reported numbers: Multiple factors, including but not limited to sampling, data collection delay, and high-cardinality reports, may cause minor discrepancy between Google Analytics UI and Cortex Framework. This is a known and innate limitation of Google Analytics. For more information, see Bridge the gap between the Google Analytics UI and BigQuery export .
- Event export volume restrictions: Depending on your Google Analytics edition, you may face varying degree of BigQuery export volume restriction per day. For more information, see GA4 - Set up BigQuery Export.
- Time zone: In BigQuery Export,
event_date
is set in the property's reporting time zone whileevent_timestamp
is the UTC timestamp in microseconds. As a result, ifevent_timestamp
is used, make sure to adjust for the correct reporting time zone when comparing with UI numbers. - Daily versus Streaming (real-time) Event exports: For Event exports,
Cortex Framework only supports the
events_YYYYMMDD
tables created by full daily export. For more information, see GA4 - BigQuery Export. - GA4 360 Service Level Agreement (SLA) for BigQuery Export:
While Cortex Framework doesn't support the
events_fresh_
tables created by Fresh Daily exports as separate source tables, you can follow the##CORTEX-CUSTOMER
customization comments in theEvents
Reporting view to replace the source tables with these, to take advantage of the SLA provided by this feature. All Reporting views will continue to work after this substitution.
Data Freshness and Delay
As a general rule, data freshness for Cortex Framework data sources is limited by what upstream connection allows for, as well as the frequency of your DAG execution. Adjust your DAG execution frequency to align with upstream frequency, resource constraints, and your business needs.
With Google Analytics 4, BigQuery export data may be delayed up to a day depending on your time zone, unless you are using Fresh Daily Export.
Configurations
This section describes the configurations for the data process.
Cloud Composer connections
Create the following connections in Cloud Composer. See more details in the Manage Airflow connections documentation.
Connection Name | Purpose |
dv360_cdc_bq
|
For Raw dataset > CDC dataset transfer. |
dv360_reporting_bq
|
For CDC dataset > Reporting dataset transfer. |
Reporting settings
You can configure and control how Cortex Framework generates
data for the GA4 final reporting layer using the reporting
settings file src/GA4/config/reporting_settings.yaml
. This file controls how reporting layer BigQuery objects
(tables, views,functions or stored procedures) are generated.
For more information, see Customizing reporting settings file.
What's next?
- For more information about other data sources and workloads, see Data sources and workloads.
- For more information about the steps for deployment in production environments, see Cortex Framework Data Foundation deployment prerequisites.