Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Migrasi DAG eksternal dari v4.2 ke v5.0
Panduan ini menguraikan langkah-langkah yang diperlukan untuk memindahkan tabel output dari Directed Acyclic Graph (DAG) eksternal ke lokasi barunya dalam arsitektur Cortex Data Foundation v5.0. Misalnya, Cuaca dan Tren. Panduan ini
dirancang khusus untuk pengguna yang telah menerapkan DAG Eksternal di versi
Cortex Framework Data Foundation sebelumnya (4.2 hingga 5.0) dan sekarang mengupgrade. Jika
Anda belum menggunakan DAG Eksternal atau belum men-deploy SAP, panduan ini tidak
berlaku.
Konteks
Versi Cortex Framework Data Foundation sebelum 4.2 menggunakan flag _GEN_EXT untuk mengelola deployment sumber data eksternal, dengan beberapa sumber terikat dengan beban kerja tertentu (seperti konversi mata uang untuk SAP). Namun, dengan versi 5.0, flag
ini telah dihapus. Sekarang, ada modul baru yang dikhususkan untuk mengelola DAG
yang dapat menayangkan beberapa beban kerja. Panduan ini menguraikan langkah-langkah untuk menyesuaikan
pipeline data yang ada agar berfungsi dengan struktur baru ini.
DAG lintas beban kerja yang dapat digunakan kembali
Cortex Framework Data Foundation v5.0 memperkenalkan K9, komponen baru yang bertanggung jawab untuk
menyerap, memproses, dan membuat model elemen data yang dapat digunakan kembali yang dibagikan
di berbagai sumber data. Tampilan pelaporan kini mereferensikan set data K9_PROCESSING untuk mengakses komponen yang dapat digunakan kembali ini, sehingga menyederhanakan akses data dan mengurangi redundansi. Sumber data eksternal berikut kini
di-deploy sebagai bagian dari K9, ke set data K9_PROCESSING:
date_dimension
holiday_calendar
trends
weather
DAG yang bergantung pada SAP
DAG dependen SAP berikut masih dipicu oleh
skrip generate_external_dags.sh, tetapi sekarang dieksekusi selama langkah build
pelaporan, dan sekarang menulis ke set data pelaporan SAP, bukan tahap
CDC (Change Data Capture).
currency_conversion
inventory_snapshots
prod_hierarchy_texts
Panduan Migrasi
Panduan ini menguraikan langkah-langkah untuk mengupgrade Foundation Data Cortex Framework Anda ke versi 5.0.
Men-deploy Cortex Framework Data Foundation 5.0
Pertama, deploy Cortex Framework Data Foundation versi terbaru (v5.0) ke project Anda, dengan panduan berikut:
Gunakan set data RAW dan CDC yang ada dari deployment staging atau pengembangan sebelumnya sebagai set data RAW dan CDC deployment ini, karena tidak ada perubahan yang dilakukan pada set data tersebut selama deployment.
Tetapkan testData dan SAP.deployCDC ke False di config/config.json.
Buat project Pelaporan SAP baru yang terpisah dari lingkungan v4.2
yang ada untuk tujuan pengujian. Hal ini mengevaluasi proses upgrade dengan aman
tanpa memengaruhi operasi Anda saat ini.
Opsional. Jika Anda memiliki DAG Airflow aktif yang berjalan untuk versi
Cortex Framework Data Foundation sebelumnya, jeda DAG tersebut sebelum melanjutkan migrasi.
Hal ini dapat dilakukan melalui UI Airflow. Untuk petunjuk mendetail, lihat dokumentasi
Membuka UI Airflow dari Composer
dan Menjeda DAG.
Dengan mengikuti langkah-langkah ini, Anda dapat bertransisi dengan aman ke Cortex Framework Data Foundation
versi 5.0 dan memvalidasi fitur dan fungsi baru.
Memigrasikan tabel yang ada
Untuk memigrasikan tabel yang ada ke lokasi barunya, gunakan jinja-cli untuk
memformat template skrip migrasi yang disediakan guna menyelesaikan migrasi.
Instal jinja-cli dengan perintah berikut:
pipinstalljinja-cli
Identifikasi parameter berikut dari deployment versi 4.2 yang ada dan deployment versi 5.0 baru:
Nama
Deskripsi
project_id_src
Project Google Cloud Sumber: Project tempat set data SAP CDC yang ada dari deployment versi 4.2 berada. Set data K9_PROCESSING juga dibuat dalam project ini.
project_id_tgt
Target Google Cloud tempat set data Pelaporan SAP yang baru di-deploy dari deployment versi 5.0 baru berada. Hal ini mungkin
berbeda dari project sumber.
dataset_cdc_processed
Set Data BigQuery CDC:
Set data BigQuery tempat data yang diproses CDC menyimpan
data terbaru yang tersedia. Set data ini mungkin sama dengan set data sumber.
dataset_reporting_tgt
Set data pelaporan BigQuery target: Set data BigQuery tempat model data standar Data Foundation for SAP di-deploy.
k9_datasets_processing
Set data BigQuery K9:
Set data BigQuery tempat K9 (sumber data yang ditingkatkan) di-deploy.
Buat file JSON dengan data input yang diperlukan. Pastikan untuk menghapus
DAG yang tidak ingin Anda migrasikan dari bagian migrate_list:
Periksa file SQL output dan jalankan di BigQuery untuk memigrasikan tabel ke lokasi baru.
Memperbarui dan melanjutkan DAG Airflow
Cadangkan File DAG saat ini di bucket Airflow Anda. Kemudian, ganti dengan
file yang baru dibuat dari deployment Data Foundation Cortex Framework versi 5.0. Untuk petunjuk mendetail, lihat dokumentasi berikut:
Migrasi kini telah selesai. Sekarang Anda dapat memvalidasi bahwa semua tampilan pelaporan
dalam deployment pelaporan v5.0 baru berfungsi dengan benar. Jika semuanya berfungsi dengan baik, lakukan kembali prosesnya, kali ini dengan menargetkan deployment v5.0 ke kumpulan Pelaporan produksi Anda. Setelah itu, hapus semua tabel menggunakan skrip berikut:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eThis guide details the migration process for external Directed Acyclic Graphs (DAGs) when upgrading from Google Cloud Cortex Framework versions 4.2 to 5.0, which involves relocating output tables to the new Cortex Data Foundation v5.0 architecture.\u003c/p\u003e\n"],["\u003cp\u003eCortex Framework Data Foundation v5.0 introduces a new K9 module for managing cross-workload reusable data elements like \u003ccode\u003edate_dimension\u003c/code\u003e, \u003ccode\u003eholiday_calendar\u003c/code\u003e, \u003ccode\u003etrends\u003c/code\u003e, and \u003ccode\u003eweather\u003c/code\u003e in the \u003ccode\u003eK9_PROCESSING\u003c/code\u003e dataset, which replaces the \u003ccode\u003e_GEN_EXT\u003c/code\u003e flag used in prior versions.\u003c/p\u003e\n"],["\u003cp\u003eSAP-dependent DAGs, including \u003ccode\u003ecurrency_conversion\u003c/code\u003e, \u003ccode\u003einventory_snapshots\u003c/code\u003e, and \u003ccode\u003eprod_hierarchy_texts\u003c/code\u003e, are now triggered during the reporting build step and write to the SAP reporting dataset instead of the CDC stage.\u003c/p\u003e\n"],["\u003cp\u003eThe migration process requires deploying Cortex Framework Data Foundation 5.0, using existing RAW and CDC datasets, and creating a new SAP Reporting project, before migrating existing tables using \u003ccode\u003ejinja-cli\u003c/code\u003e and an outputted SQL file.\u003c/p\u003e\n"],["\u003cp\u003eAfter migration, users must update and unpause Airflow DAGs, validate the new v5.0 reporting deployment, and then optionally delete old DAG tables using a provided \u003ccode\u003ejinja\u003c/code\u003e command, ensuring backups are taken beforehand as this step is irreversible.\u003c/p\u003e\n"]]],[],null,["# External DAGs migration from v4.2 to v5.0\n=========================================\n\n| **Warning:** This page contains specific information to update only Google Cloud Cortex Framework versions 4.2 to 5.0. The content might not apply to other versions.\n\nThis guide outlines the steps necessary to relocate output tables from external\nDirected Acyclic Graphs (DAGs) to their new locations within the Cortex Data\nFoundation v5.0 architecture. For example, Weather and Trends. This guide is\nspecifically designed for users who have implemented External DAGs in previous\nCortex Framework Data Foundation versions (4.2 to 5.0) and are now upgrading. If\nyou haven't used External DAGs or haven't deployed SAP, this guide is not\napplicable.\n\nContext\n-------\n\nCortex Framework Data Foundation versions prior to 4.2 used a `_GEN_EXT` flag to manage\nthe deployment of external data sources, with some sources tied to specific\nworkloads (like currency conversion for SAP). However, with version 5.0, this\nflag has been removed. Now, there's a new module dedicated to managing DAGs\nthat can serve multiple workloads. This guide outlines steps to adjust your\nexisting data pipelines to work with this new structure.\n\n### Cross-workload reusable DAGs\n\nCortex Framework Data Foundation v5.0 introduces K9, a new component responsible for\ningesting, processing, and modeling reusable data elements that are shared\nacross various data sources. Reporting views are now reference the\n`K9_PROCESSING` dataset to access these reusable components, streamlining data\naccess and reducing redundancy. The following external data sources are now\ndeployed as a part of K9, into the `K9_PROCESSING` dataset:\n\n- `date_dimension`\n- `holiday_calendar`\n- `trends`\n- `weather`\n\n### SAP-dependent DAGs\n\nThe following SAP-dependent DAGs are still triggered by\n`generate_external_dags.sh` script, but now executes during the reporting build\nstep, and now write into the SAP reporting dataset instead of the CDC\n(Change Data Capture) stage.\n\n- `currency_conversion`\n- `inventory_snapshots`\n- `prod_hierarchy_texts`\n\nMigration Guide\n---------------\n\nThis guide outlines the steps to upgrade your Cortex Framework Data Foundation to version 5.0.\n\n### Deploy Cortex Framework Data Foundation 5.0\n\nFirst, deploy the newest version (v5.0) of Cortex Framework Data Foundation to your\nprojects, with the following guidelines:\n\n1. Use your existing RAW and CDC datasets from prior development or staging deployments as your RAW and CDC datasets of this deployment, as no modification is made to them during deployment.\n2. Set both `testData` and `SAP.deployCDC` to `False` in `config/config.json`.\n3. Create a new SAP Reporting project separate from your existing v4.2 environment for testing purposes. This safely evaluate the upgrade process without impacting your current operations.\n4. Optional. If you have active Airflow DAGs running for your previous Cortex Framework Data Foundation version, pause them before proceeding with the migration. This can be done through the Airflow UI. For detailed instructions see [Open Airflow UI from Composer](/composer/docs/how-to/accessing/airflow-web-interface) and [Pause the DAG](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#dag-pausing-deactivation-and-deletion) documentation.\n\nBy following these steps, you can safely transition to Cortex Framework Data Foundation\nversion 5.0 and validate the new features and functionalities.\n\n### Migrate existing tables\n\nTo migrate your existing tables to their new location, use `jinja-cli` to\nformat the provided migration script template to complete the migration.\n\n1. Install jinja-cli with the following command:\n\n pip install jinja-cli\n\n2. Identify the following parameters from your existing version 4.2 and new\n version 5.0 deployment:\n\n3. Create a JSON file with the required input data. Make sure to remove any\n DAGs you don't want to migrate from the `migrate_list` section:\n\n {\n \"project_id_src\": \"your-source-project\",\n \"project_id_tgt\": \"your-target-project\",\n \"dataset_cdc_processed\": \"your-cdc-processed-dataset\",\n \"dataset_reporting_tgt\": \"your-reporting-target-dataset-OR-SAP_REPORTING\",\n \"k9_datasets_processing\": \"your-k9-processing-dataset-OR-K9_REPORTING\",\n \"migrate_list\":\n [\n \"holiday_calendar\",\n \"trends\",\n \"weather\",\n \"currency_conversion\",\n \"inventory_snapshots\",\n \"prod_hierarchy_texts\"\n ]\n }\n EOF\n\n For example, if you want to remove `weather` and `trends`, the script would\n look like the following: \n\n {\n \"project_id_src\": \"kittycorn-demo\",\n \"project_id_tgt\": \"kittycorn-demo\",\n \"dataset_cdc_processed\": \"CDC_PROCESSED\",\n \"dataset_reporting_tgt\": \"SAP_REPORTING\",\n \"k9_datasets_processing\": \"K9_PROCESSING\",\n \"migrate_list\":\n [\n \"holiday_calendar\",\n \"currency_conversion\",\n \"inventory_snapshots\",\n \"prod_hierarchy_texts\"\n ]\n }\n\n4. Create an output folder with the following command:\n\n mkdir output\n\n5. Generate the parsed migration script with the following command (this command\n assumes you are at the root of the repository):\n\n jinja -d data.json -o output/migrate_external_dags.sql docs/external_dag_migration/scripts/migrate_external_dags.sql\n\n6. Examine the output SQL file and execute in BigQuery to migrate your tables to the new location.\n\n### Update and unpause the Airflow DAGs\n\nBack up the current DAG Files in your Airflow bucket. Then, replace them with\nthe newly generated files from your Cortex Framework Data Foundation version 5.0\ndeployment. For detail instructions, see the following documentation:\n\n- [Open Airflow UI from Composer](/composer/docs/how-to/accessing/airflow-web-interface)\n- [Unpause the DAG](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#dag-pausing-deactivation-and-deletion)\n\n### Validation and cleanup\n\nThe migration is now complete. You can now validate that all reporting views\nin the new v5.0 reporting deployment is working correctly. If everything works\nproperly, go through the process again, this time targeting the v5.0 deployment\nto your production Reporting set. Afterwards, feel free to remove all tables\nusing the following script:\n**Warning:** This step permanently removes your old DAG tables and can't be undone after is applied. Only execute this step after all validation is complete. Consider taking backups of these tables. \n\n jinja -d data.json -o output/delete_old_dag_tables.sql docs/external_dag_migration/scripts/delete_old_dag_tables.sql"]]