Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Halaman ini menjelaskan praktik terbaik untuk kasus penggunaan saat Anda telah menyiapkan replika Datastream ke BigQuery, tetapi mengonfigurasi set data tujuan di region yang salah. Kemudian, Anda ingin memindahkan set data ke
region lain (atau multi-region) tanpa harus menyinkronkan ulang semua data
dari database sumber ke BigQuery.
Sebelum memulai
Sebelum Anda mulai memigrasikan data ke region lain, pertimbangkan hal-hal berikut:
Migrasi memerlukan waktu, dan Anda harus menjeda streaming untuk sementara selama operasi. Untuk menjaga integritas data, database sumber harus mempertahankan log perubahan saat streaming dijeda. Untuk memperkirakan berapa lama streaming dijeda,
gabungkan nilai max_staleness dalam set data dan operasi penggabungan
yang berjalan paling lama:
Jika estimasi jeda terlalu lama untuk didukung database sumber,
Anda dapat mempertimbangkan untuk mengurangi nilai max_staleness sementara
untuk tabel dalam set data.
Pastikan pengguna yang melakukan migrasi memiliki resource BigQuery yang memadai di region tujuan (reservasi kueri dan reservasi latar belakang). Untuk mengetahui informasi selengkapnya tentang reservasi, lihat
Penetapan reservasi.
NEW_REGION: nama region tempat Anda ingin membuat set data. Contoh, region-us.
Pantau progres migrasi, dan tunggu hingga watermark salinan di replika dalam beberapa menit dari primary. Anda dapat menjalankan kueri ini di INFORMATION_SCHEMA BigQuery untuk memeriksa progres migrasi:
DATASET_REPLICA_STALENESS: konfigurasi keusangan tabel di replika set data yang Anda buat.
NEW_REGION: region tempat Anda membuat set data.
Jeda aliran Datastream yang ada. Untuk mengetahui informasi selengkapnya, lihat
Menjeda streaming.
Tunggu hingga streaming habis dan catat waktu saat streaming memasuki
status PAUSED.
Pastikan perubahan CDC terbaru telah diterapkan ke tabel BigQuery dengan memeriksa upsert_stream_apply_watermark untuk tabel. Jalankan kueri berikut dan pastikan stempel waktu watermark
10 menit lebih lambat dari saat streaming dijeda:
Untuk menjalankan kueri hanya untuk tabel tertentu, tambahkan klausa WHERE berikut:
WHEREtable_name='TABLE_NAME'
Ganti kode berikut:
DATASET_NAME: nama set data Anda.
TABLE_NAME: optional. Tabel yang ingin Anda periksa
upsert_stream_apply_watermark-nya.
Gunakan kueri dari langkah 3 untuk memverifikasi bahwa watermark salinan region baru
lebih baru dari upsert_stream_apply_watermark yang diambil pada langkah 6.
Secara opsional, bandingkan beberapa tabel dalam set data utama di
region asli dengan replika di region baru secara manual untuk memverifikasi bahwa semua data
disalin dengan benar.
Promosikan replika set data BigQuery dengan menjalankan perintah berikut di BigQuery Studio:
Atau, jika Anda tidak lagi memerlukan set data asli (sekarang replika), dan tidak ingin dikenai biaya tambahan, buka BigQuery Studio dan hapus set data BigQuery asli:
Buat aliran data baru dengan konfigurasi yang sama persis, tetapi dengan lokasi tujuan BigQuery baru.
Mulai streaming baru.
Untuk mencegah duplikasi peristiwa, mulai streaming dari posisi tertentu:
Untuk sumber MySQL dan Oracle: Anda dapat mengidentifikasi posisi log
dengan memeriksa log aliran asli dan menemukan posisi terakhir
tempat aliran berhasil dibaca. Untuk informasi tentang cara memulai streaming dari posisi tertentu, lihat Mengelola streaming.
Untuk sumber PostgreSQL: streaming baru mulai membaca perubahan dari nomor urut log (LSN) pertama di slot replikasi. Karena aliran asli mungkin telah memproses beberapa perubahan ini, ubah pointer slot replikasi secara manual ke LSN terakhir tempat Datastream membaca.
Anda dapat menemukan LSN ini di log konsumen Datastream.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eThis guide details how to migrate a BigQuery dataset to a new region without a full data re-synchronization when using Datastream replication.\u003c/p\u003e\n"],["\u003cp\u003eThe migration process involves creating a dataset replica in the new region, temporarily pausing the Datastream stream, and monitoring the data transfer progress.\u003c/p\u003e\n"],["\u003cp\u003eBefore initiating the migration, you must estimate the required stream pause duration based on the dataset's \u003ccode\u003emax_staleness\u003c/code\u003e and the merge operation time, while ensuring the source database retains change logs.\u003c/p\u003e\n"],["\u003cp\u003eOnce the replica's data is consistent and the stream is paused, the replica is promoted to the primary dataset and a new stream is created with the correct BigQuery destination.\u003c/p\u003e\n"],["\u003cp\u003eUsers should also ensure sufficient BigQuery resources and permissions are available in the destination region before commencing the dataset migration.\u003c/p\u003e\n"]]],[],null,["# Migrate a CDC table to another region\n\nThis page describes best practices for a use case where you've set up\nDatastream replication to BigQuery but configured the\ndestination dataset in an incorrect region. You then want to move the dataset to\nanother region (or multi-region) without having to re-synchronise all of the data\nfrom the source database to BigQuery.\n\n\u003cbr /\u003e\n\n| Querying the secondary region during the migration procedure might return incorrect or incomplete results. For more information about the limitations related to the migration procedure described on this page, see [Cross-region dataset replication](/bigquery/docs/data-replication#limitations).\n\n\u003cbr /\u003e\n\nBefore you begin\n----------------\n\nBefore you start migrating your data to another region, consider the\nfollowing:\n\n- Migration takes time, and you must temporarily pause the stream during the operation. To maintain data integrity, the source database must retain the change logs when the stream is paused. To estimate how long to pause the stream, combine the value of `max_staleness` in the dataset and the longest-running merge operation:\n - For information about how long it might take for merge operations to finish, see [Recommended table `max_staleness` value](/bigquery/docs/change-data-capture#recommended-max-staleness).\n - To find the maximum `max_staleness` in the dataset, see [Determine the current `max_staleness` value of a table](/bigquery/docs/change-data-capture#determine-max-staleness) and adjust the query to your specific needs.\n - If the estimated pause is too long for your source database to support, you might want to consider temporarily reducing the value of `max_staleness` for the tables in the dataset.\n- Verify that the user performing the migration has sufficient BigQuery resources in the destination region (query reservation and background reservation). For more information about reservations, see [Reservation assignments](/bigquery/docs/reservations-intro#assignments).\n- Verify that the user performing the migration has sufficient permissions to perform this operation, such as [Identity and Access Management (IAM)](/iam) controls or [VPC Service Controls](/security/vpc-service-controls).\n\nMigration steps\n---------------\n\nTo initiate [dataset migration](/bigquery/docs/data-replication#migrate_datasets),\nuse BigQuery data replication:\n\n1. In the Google Cloud console, go to the **BigQuery Studio** page.\n\n [Go to BigQuery Studio](https://console.cloud.google.com/bigquery)\n2. Create a BigQuery dataset replica in the new region:\n\n ALTER SCHEMA \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eDATASET_NAME\u003c/span\u003e\u003c/var\u003e\n ADD REPLICA '\u003cvar translate=\"no\"\u003eNEW_REGION\u003c/var\u003e'\n OPTIONS(location='\u003cvar translate=\"no\"\u003eNEW_REGION\u003c/var\u003e');\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eDATASET_NAME\u003c/var\u003e: the name of the dataset that you want to create.\n - \u003cvar translate=\"no\"\u003eNEW_REGION\u003c/var\u003e: the name of the region where you want to create your dataset. For example, `region-us`.\n3. Monitor the migration progress, and wait until the copy watermark in the\n replica is within a few minutes of the primary. You can run this query on\n the [BigQuery INFORMATION_SCHEMA](/bigquery/docs/information-schema-schemata-replicas#schema)\n to check the migration progress:\n\n SELECT\n catalog_name as project_id,\n schema_name as dataset_name,\n replication_time as dataset_replica_staleness\n FROM\n '\u003cvar translate=\"no\"\u003eNEW_REGION\u003c/var\u003e'.INFORMATION_SCHEMA.SCHEMATA_REPLICAS\n WHERE\n catalog_name = \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003ePROJECT_ID\u003c/span\u003e\u003c/var\u003e\n AND schema_name = \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eDATASET_NAME\u003c/span\u003e\u003c/var\u003e\n AND location = \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eNEW_REGION\u003c/span\u003e\u003c/var\u003e;\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of your Google Cloud project.\n - \u003cvar translate=\"no\"\u003eDATASET_NAME\u003c/var\u003e: the name of your dataset.\n - \u003cvar translate=\"no\"\u003eDATASET_REPLICA_STALENESS\u003c/var\u003e: the staleness configuration of the tables in the dataset replica that you created.\n - \u003cvar translate=\"no\"\u003eNEW_REGION\u003c/var\u003e: the region where you created your dataset.\n4. Pause the existing Datastream stream. For more information, see\n [Pause the stream](/datastream/docs/run-a-stream#pauseastream).\n\n5. Wait for the stream to drain and take note of the time when the stream entered the\n `PAUSED` state.\n\n6. Confirm that the latest CDC changes have been applied to the BigQuery\n table by checking the [`upsert_stream_apply_watermark`](/bigquery/docs/change-data-capture#monitor_table_upsert_operation_progress)\n for the table. Run the following query and ensure that the watermark timestamp\n is 10 minutes later then when the stream was paused:\n\n SELECT table_name, upsert_stream_apply_watermark\n FROM \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eDATASET_NAME\u003c/span\u003e\u003c/var\u003e.INFORMATION_SCHEMA.TABLES\n\n To run the query only for a specific table, add the following `WHERE` clause: \n\n WHERE table_name = '\u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e'\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eDATASET_NAME\u003c/var\u003e: the name of your dataset.\n - \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e: optional. The table for which you want to check the `upsert_stream_apply_watermark`.\n7. Use the query from step 3 to verify that the new region copy watermark is\n later than the `upsert_stream_apply_watermark` captured in step 6.\n\n8. Optionally, manually compare several tables in the primary dataset in the\n original region with the replica in the new region to verify that all data\n is correctly copied.\n\n9. Promote the BigQuery dataset replica by running the following\n command in BigQuery Studio:\n\n ALTER SCHEMA \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eDATASET_NAME\u003c/span\u003e\u003c/var\u003e\n SET OPTIONS(primary_replica = '\u003cvar translate=\"no\"\u003eNEW_REGION\u003c/var\u003e');\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eDATASET_NAME\u003c/var\u003e: the name of your dataset.\n - \u003cvar translate=\"no\"\u003eNEW_REGION\u003c/var\u003e: the region where you created your dataset.\n10. Optionally, if you no longer need the original dataset (now the replica), and\n don't want to incur extra charges, then go to BigQuery Studio and drop\n the original BigQuery dataset:\n\n ALTER SCHEMA \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eDATASET_NAME\u003c/span\u003e\u003c/var\u003e DROP REPLICA IF EXISTS \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eORIGINAL_REGION\u003c/span\u003e\u003c/var\u003e;\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eDATASET_NAME\u003c/var\u003e: the name of the original dataset.\n - \u003cvar translate=\"no\"\u003eORIGINAL_REGION\u003c/var\u003e: the region of the original dataset.\n11. Create a new stream with the exact same configuration but with new BigQuery\n destination location.\n\n12. Start the new stream.\n\n To prevent replicating duplicate events, start\n the stream from a specific position:\n - For MySQL and Oracle sources: you can identify the log position by examining the logs of the original stream and finding the last position from which the stream read successfully. For information about starting the stream from a specific position, see [Manage streams](/datastream/docs/manage-streams#startastreamfromspecific).\n - For PostgreSQL sources: the new stream starts reading changes from the first log sequence number (LSN) in the replication slot. Because the original stream might have already processed some of these changes, manually change the pointer of the replication slot to the last LSN from which Datastream read. You can find this LSN in the Datastream consumer logs.\n13. Optionally, delete the original stream."]]