Step 5: Configure deployment
This page describes the fifth step to deploy Cortex Framework Data Foundation, the core of Cortex Framework. In this step, you modify the configuration file in the Cortex Framework Data Foundation repository to match your requirements.
Configuration file
The behavior of the deployment is controlled by the configuration file config.json
in the Cortex Framework Data Foundation. This file contains global configuration, specific configuration to each workload.
Edit the config.json
file according to your needs with the following steps:
- Open the file
config.json
from Cloud Shell. Edit the
config.json
file according to the following parameters:Parameter Meaning Default Value Description testData
Deploy Test Data. true
Project where the source dataset is and the build runs. Note: Test data deployment will only execute if the raw dataset is empty and has no tables. deploySAP
Deploy SAP true
Execute the deployment for SAP workload (ECC or S/4 HANA). deploySFDC
Deploy Salesforce true
Execute the deployment for Salesforce workload. deployMarketing
Deploy Marketing true
Execute the deployment for Marketing sources (Google Ads, CM360, and TikTok). deployOracleEBS
Deploy Oracle EBS true
Execute the deployment for Oracle EBS workload. deployDataMesh
Deploy Data Mesh true
Execute the deployment for Data Mesh. For more information, see the Data Mesh User Guide. turboMode
Deploy in Turbo mode. true
Execute all views builds as a step in the same Cloud Build process, in parallel for a faster deployment. If set to false
, each reporting view is generated in its own sequential build step. We recommend only setting it totrue
when using test data or after any mismatch between reporting columns and the source data have been resolved.projectIdSource
Source Project ID - Project where the source dataset is and the build runs. projectIdTarget
Target Project ID - Target project for user-facing datasets (reporting and ML datasets). targetBucket
Target Bucket to storage generated DAG scripts - Bucket created previously where DAGs (and Dataflow temp files) are generated. Avoid using the actual Airflow bucket. location
Location or Region "US"
Location where the BigQuery dataset and Cloud Storage buckets are. See restrictions listed under BigQuery dataset locations.
testDataProject
Source for test harness kittycorn-public
Source of the test data for demo deployments. Applies when testData
istrue
.Don't change this value, unless you have your own test harness.
k9.datasets.processing
K9 datasets - Processing "K9_PROCESSING"
Execute cross-workload templates (for example, date dimension) as defined in the K9 configuration file. These templates are normally required by the downstream workloads. k9.datasets.reporting
K9 datasets - Reporting "K9_REPORTING"
Execute cross-workload templates and external data sources (for example: weather) as defined in the K9 configuration file. Commented out by default. DataMesh.deployDescriptions
Data Mesh - Asset descriptions true
Deploy BigQuery asset schema descriptions. DataMesh.deployLakes
Data Mesh - Lakes & Zones false
Deploy Dataplex Lakes and Zones that organize tables by processing layer, requires configuration before enabling. DataMesh.deployCatalog
Data Mesh - Catalog Tags and Templates false
Deploy Data Catalog Tags that allow custom metadata on BigQuery assets or fields, requires configuration before enabling. DataMesh.deployACLs
Data Mesh - Access Control false
Deploy asset, row, or column level access control on BigQuery assets, requires configuration before enabling. Configure your required workload(s) as needed. You don't need to configure them if the deployment parameter (for example,
deploySAP
ordeployMarketing
) for the workload is set toFalse
. For more information, see Step 3: Determine integration mechanism.
For a better customization of your deployment, see the following optional steps:
Performance optimization for reporting views
Reporting artifacts can be created as views or as tables refreshed regularly through DAGs. On one hand, views compute the data on each execution of a query, which keep the results always fresh. On the other hand, the table runs the computations once, and the results can be queried multiple times without incurring higher computing costs and achieving faster runtime. Each customer creates their own configuration according to their needs.
Materialized results are updated into a table. These tables can be further fine-tuned by adding Partitioning and Clustering properties to these tables.
The configuration files for each workload are located in the following paths within the Cortex Framework Data Foundation repository:
Data Source | Settings files |
Operational - SAP | src/SAP/SAP_REPORTING/reporting_settings_ecc.yaml
|
Operational - Salesforce Sales Cloud | src/SFDC/config/reporting_settings.yaml
|
Operational - Oracle EBS | src/oracleEBS/config/reporting_settings.yaml
|
Marketing - Google Ads | src/marketing/src/GoogleAds/config/reporting_settings.yaml
|
Marketing - CM360 | src/marketing/src/CM360/config/reporting_settings.yaml
|
Marketing - Meta | src/marketing/src/Meta/config/reporting_settings.yaml
|
Marketing - Salesforce Marketing Cloud | src/marketing/src/SFMC/config/reporting_settings.yaml
|
Marketing - TikTok | src/marketing/src/TikTok/config/reporting_settings.yaml
|
Marketing - YouTube (with DV360) | src/marketing/src/DV360/config/reporting_settings.yaml
|
Marketing - Google Analytics 4 | src/marketing/src/GA4/config/reporting_settings.yaml
|
Customizing reporting settings file
The reporting_settings
files drives how the BigQuery objects
(tables or views) are created for reporting datasets. Customize your file with
the following parameters descriptions. Consider that this file contains two sections:
bq_independent_objects
: All BigQuery objects that can be created independently, without any other dependencies. WhenTurbo mode
is enabled, these BigQuery objects are created in parallel during the deployment time, speeding up the deployment process.bq_dependent_objects
: All BigQuery objects that need to be created in a specific order due to dependencies on other BigQuery objects.Turbo mode
doesn´t apply to this section.
The deployer first creates all the BigQuery objects listed
in bq_independent_objects
, and then all the objects listed in
bq_dependent_objects
. Define The following properties for each object:
sql_file
: Name of the SQL file that creates a given object.type
: Type of BigQuery object. Possible values:view
: If you want the object to be a BigQuery view.table
: If you want the object to be a BigQuery table.script
: This is to create other types of objects (for example, BigQuery functions and stored processes).
- If
type
is set totable
, the following optional properties can be defined:load_frequency
: Frequency at which a Composer DAG is executed to refresh this table. See Airflow documentation for details on possible values.partition_details
: How the table should be partitioned. This value is optional. For more information, see section Table partition.cluster_details
: How the table should be clustered. This value is optional. For more information, see section Cluster settings.
Table partition
Certain settings files let you configure materialized tables with custom
clustering and partitioning options. This can significantly improve query
performance for large datasets. This option applies only for SAP cdc_settings.yaml
and all reporting_settings.yaml
files.
Table Partitioning can be enabled by specifying the followingpartition_details
:
- base_table: vbap
load_frequency: "@daily"
partition_details: {
column: "erdat", partition_type: "time", time_grain: "day" }
Use the following parameters to control partitioning details for a given table:
Property | Description | Value |
column
|
Column by which the CDC table is partitioned. | Column name. |
partition_type
|
Type of partition. | "time" for time based partition. For more information, see Timestamp partitioned tables.
"integer_range" for integer based partition. For more information, see Integer range documentation.
|
time_grain
|
Time part to partition with
Required when partition_type = "time" .
|
"hour" , "day" , "month" or "year" .
|
integer_range_bucket
|
Bucket range
Required when partition_type = "integer_range"
|
"start" = Start value,
"end" = End value, and "interval " = Interval of range.
|
For more information about options and related limitations, see BigQuery Table Partition.
Cluster settings
Table clustering can be enabled by specifying cluster_details
:
- base_table: vbak
load_frequency: "@daily"
cluster_details: {columns: ["vkorg"]}
Use the following parameters to control cluster details for a given table:
Property | Description | Value |
columns
|
Columns by which a table is clustered. | List of column names. For example,
"mjahr" and "matnr" .
|
For more information about options and related limitations, see Table cluster documentation.
Next steps
After you complete this step, move on to the following deployment step:
- Establish workloads.
- Clone repository.
- Determine integration mechanism.
- Set up components.
- Configure deployment (this page).
- Execute deployment.