Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Bagian ini berisi informasi tentang:
Perilaku cara Datastream menangani data yang ditarik dari database MySQL sumber
Versi database MySQL yang didukung Datastream
Batasan yang diketahui untuk menggunakan database MySQL sebagai sumber
Ringkasan cara menyiapkan database MySQL sumber agar data dapat di-streaming dari database tersebut ke tujuan
Perilaku
Bagian ini menjelaskan perilaku sumber MySQL saat Anda mereplikasi data
menggunakan Datastream. Saat memproses data dari database MySQL, Anda dapat menggunakan replikasi berbasis binlog atau replikasi berbasis ID transaksi global (GTID). Anda memilih metode CDC saat
membuat aliran data.
Replikasi berbasis binlog
Datastream dapat menggunakan file
log biner untuk
mencatat perubahan data dalam database MySQL. Informasi yang ada dalam
file log ini kemudian direplikasi ke tujuan untuk mereproduksi perubahan
yang dilakukan pada sumber.
Karakteristik utama replikasi berbasis binlog di Datastream adalah:
Semua database atau database tertentu dari sumber MySQL tertentu, serta semua tabel dari database atau tabel tertentu, dapat dipilih.
Semua data historis direplikasi.
Semua perubahan bahasa pengolahan data (DML), seperti penyisipan, pembaruan, dan penghapusan dari database dan tabel yang ditentukan, direplikasi.
Hanya perubahan yang di-commit yang direplikasi.
Replikasi berbasis ID transaksi global (GTID)
Datastream juga mendukung replikasi berbasis ID global (GTID).
ID transaksi global (GTID) adalah ID unik yang dibuat dan
dikaitkan dengan setiap transaksi yang dilakukan di sumber MySQL. ID ini tidak hanya unik untuk sumber tempat ID tersebut berasal, tetapi juga di semua server dalam topologi replikasi tertentu, berbeda dengan replikasi berbasis log biner di mana setiap node dalam cluster database mempertahankan file binlog-nya sendiri, dengan penomorannya sendiri. Mempertahankan file binlog terpisah dan penomoran
dapat menjadi masalah jika terjadi kegagalan atau periode nonaktif yang direncanakan, karena
kelangsungan binlog terganggu dan replikasi berbasis binlog gagal.
Replikasi berbasis GTID mendukung failover, cluster database yang dikelola sendiri, dan terus berfungsi terlepas dari perubahan dalam cluster database.
Karakteristik utama replikasi berbasis GTID di Datastream adalah:
Semua database atau database tertentu dari sumber MySQL tertentu, serta semua tabel dari database atau tabel tertentu, dapat dipilih.
Semua data historis direplikasi.
Semua perubahan bahasa pengolahan data (DML), seperti penyisipan, pembaruan, dan penghapusan dari database dan tabel yang ditentukan, direplikasi.
Hanya perubahan yang di-commit yang direplikasi.
Dukungan yang lancar untuk pengalihan.
Beralih dari replikasi berbasis binlog ke replikasi berbasis GTID
Jika Anda ingin memperbarui aliran dan beralih dari replikasi berbasis binlog ke berbasis GTID tanpa perlu melakukan pengisian ulang, lakukan langkah-langkah berikut:
Pastikan semua persyaratan untuk replikasi berbasis GTID terpenuhi. Untuk
mengetahui informasi selengkapnya, lihat
Mengonfigurasi database MySQL sumber.
Jika ingin, buat dan jalankan streaming berbasis GTID pengujian. Untuk mengetahui informasi selengkapnya,
lihat Membuat streaming.
Buat aliran berbasis GTID. Jangan mulai dulu.
Hentikan traffic aplikasi ke database sumber.
Jeda aliran berbasis binlog yang ada. Untuk mengetahui informasi selengkapnya, lihat
Menjeda streaming.
Tunggu beberapa menit untuk memastikan Datastream telah menyusul database. Anda dapat memeriksanya menggunakan metrik di tab Monitoring, di
halaman Detail aliran data untuk aliran data Anda. Nilai untuk Keaktualan data dan
Throughput harus 0.
Mulai aliran berbasis GTID. Untuk mengetahui informasi selengkapnya, lihat
Mulai streaming.
Lanjutkan traffic ke database sumber.
Jika pengisian ulang tidak menjadi masalah, Anda dapat memangkas tabel di BigQuery, menghapus aliran lama, dan memulai aliran baru dengan pengisian ulang. Untuk
informasi selengkapnya tentang mengelola pengisian ulang, lihat
Mengelola pengisian ulang untuk objek streaming.
Versi
Datastream mendukung versi database MySQL berikut:
MySQL 5.6
MySQL 5.7
MySQL 8.0
MySQL 8.4 (hanya didukung untuk replikasi berbasis GTID)
Datastream mendukung jenis database MySQL berikut:
Datastream secara berkala mengambil skema terbaru dari sumber saat peristiwa diproses. Jika skema berubah, Datastream akan mendeteksi perubahan skema dan memicu pengambilan skema. Namun, beberapa peristiwa mungkin diproses secara tidak benar atau dihapus di antara pengambilan skema, yang dapat menyebabkan perbedaan data.
Tidak semua perubahan pada skema sumber dapat dideteksi secara otomatis, sehingga dapat menyebabkan kerusakan data. Perubahan skema berikut dapat menyebabkan kerusakan data atau kegagalan memproses peristiwa di hilir:
Melepas kolom
Menambahkan kolom ke tengah tabel
Mengubah jenis data kolom
Mengurutkan ulang kolom
Menghapus tabel (relevan jika tabel yang sama kemudian dibuat ulang dengan data baru yang ditambahkan)
Memangkas tabel
Datastream tidak mendukung replikasi tampilan.
Datastream tidak mendukung kolom jenis data spasial. Nilai dalam kolom ini diganti dengan nilai NULL.
Datastream tidak mendukung nilai nol (0000-00-00 00:00:00) di kolom jenis data DATETIME, DATE, atau TIMESTAMP. Nilai nol diganti dengan nilai NULL.
Datastream tidak mendukung replikasi baris yang menyertakan nilai berikut di kolom JSON: DECIMAL, NEWDECIMAL, TIME, TIME2DATETIME, DATETIME2, DATE, TIMESTAMP, atau TIMESTAMP2. Peristiwa yang berisi nilai tersebut akan dibuang.
Datastream tidak mendukung rangkaian sertifikat SSL di profil koneksi MySQL sumber. Hanya sertifikat tunggal berenkode PEM x509 yang didukung.
Aliran data tidak mendukung penghapusan bertingkat. Peristiwa tersebut tidak ditulis ke log biner, dan akibatnya, tidak disebarkan ke tujuan.
Datastream tidak mendukung operasi DROP PARTITION. Operasi tersebut hanya merupakan operasi metadata dan tidak direplikasi. Peristiwa lain tidak terpengaruh dan streaming berjalan dengan lancar.
Karena Datastream tidak mendukung failover ke replika saat menggunakan replikasi berbasis log biner, sebaiknya gunakan replikasi berbasis GTID untuk sumber Cloud SQL for MySQL Enterprise Plus. Instance Cloud SQL Enterprise Plus tunduk pada pemeliharaan dengan periode nonaktif nyaris nol dan melakukan failover ke replika selama pemeliharaan.
Batasan tambahan untuk replikasi berbasis GTID
Memulihkan aliran yang menggunakan replikasi berbasis GTID hanya tersedia saat menggunakan
Datastream API.
Membuat tabel dari tabel lain menggunakan pernyataan CREATE TABLE ... SELECT tidak didukung.
Datastream tidak mendukung GTID yang diberi tag.
Untuk mengetahui batasan MySQL yang berlaku untuk replikasi berbasis GTID, lihat dokumentasi MySQL.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eDatastream replicates data from MySQL sources using either binlog-based or GTID-based replication, supporting historical data and DML changes for selected databases and tables.\u003c/p\u003e\n"],["\u003cp\u003eSupported MySQL versions include 5.6, 5.7, 8.0, and 8.4 (GTID-based only), with compatibility for self-hosted, Cloud SQL, Amazon RDS, Amazon Aurora, MariaDB, Alibaba Cloud PolarDB, and Percona Server.\u003c/p\u003e\n"],["\u003cp\u003eLimitations exist, including a 10,000-table limit, restrictions on tables with invisible primary keys or more than 500 million rows, and potential data discrepancies from schema changes.\u003c/p\u003e\n"],["\u003cp\u003eSpecific data types like spatial data and zero values in \u003ccode\u003eDATETIME\u003c/code\u003e, \u003ccode\u003eDATE\u003c/code\u003e, or \u003ccode\u003eTIMESTAMP\u003c/code\u003e columns, and certain JSON values are not supported, and are thus replaced with null or discarded.\u003c/p\u003e\n"],["\u003cp\u003eGTID-based replication, currently in preview, offers failover support, but has additional limitations like only being recoverable through the Datastream API and not supporting \u003ccode\u003eCREATE TABLE ... SELECT\u003c/code\u003e statements.\u003c/p\u003e\n"]]],[],null,["# Source MySQL database\n\nThis section contains information about:\n\n- The behavior of how Datastream handles data that's being pulled from a source MySQL database\n- The versions of MySQL database that Datastream supports\n- Known limitations for using MySQL database as a source\n- An overview of how to setup a source MySQL database so that data can be streamed from it to a destination\n\nBehavior\n--------\n\nThis section describes the behavior of MySQL sources when you replicate data\nusing Datastream. When you ingest data from MySQL databases, you can\nuse binlog-based replication or global transaction identifier (GTID)-based\nreplication. You select your CDC method when you\n[create a stream](/datastream/docs/create-a-stream).\n\n### Binlog-based replication\n\nDatastream can use\n[binary log](https://dev.mysql.com/doc/refman/5.6/en/binary-log.html) files to\nkeep a record of data changes in MySQL databases. The information contained in\nthese log files is then replicated to the destination to reproduce the changes\nmade on the source.\n\nThe key characteristics of binlog-based replication in Datastream are:\n\n- All databases or specific databases from a given MySQL source, as well as all tables from the databases or specific tables, can be selected.\n- All historical data is replicated.\n- All data manipulation language (DML) changes, such as inserts, updates, and deletes from the specified databases and tables, are replicated.\n- Only committed changes are replicated.\n\n### Global transaction identifier (GTID)-based replication\n\nDatastream also supports global identifier (GTID)-based replication.\n\nGlobal transaction identifier (GTID) is a unique identifier created and\nassociated with each transaction committed on a MySQL source. This identifier is\nunique not only to the source on which it originated, but also across all servers\nin a given replication topology, as opposed to the binary log-based\nreplication where each node in the database cluster maintains its own binlog\nfiles, with its own numbering. Maintaining separate binlog files and numbering\nmight become an issue in the event of a failure or planned downtime, because the\nbinlog continuity is broken and the binlog-based replication fails.\n\nGTID-based replication supports failovers, self-managed database clusters, and\ncontinues to work irrespective of changes in the database cluster.\n\nThe key characteristics of GTID-based replication in Datastream are:\n\n- All databases or specific databases from a given MySQL source, as well as all tables from the databases or specific tables, can be selected.\n- All historical data is replicated.\n- All data manipulation language (DML) changes, such as inserts, updates, and deletes from the specified databases and tables, are replicated.\n- Only committed changes are replicated.\n- Seamless support for failovers.\n\n### Switch from binlog-based to GTID-based replication\n\nIf you want to update your stream and switch from binlog-based to GTID-based\nreplication without the need to do a backfill, perform the following steps:\n| **Note:** These steps require database downtime. Similar steps might also be useful when you want to upgrade the major version of your MySQL source.\n\n1. Ensure that all requirements for GTID-based replication are satisfied. For more information, see [Configure a source MySQL database](/datastream/docs/configure-your-source-mysql-database).\n2. Optionally, create and run a *test* GTID-based stream. For more information, see [Create a stream](/datastream/docs/create-a-stream#expandable-2).\n3. Create a GTID-based stream. Don't start it yet.\n4. Stop application traffic to the source database.\n5. Pause the existing binlog-based stream. For more information, see [Pause the stream](/datastream/docs/run-a-stream#pauseastream).\n6. Wait for a few minutes to ensure that Datastream has caught up with the database. You can check this using the metrics in the **Monitoring** tab, on the **Stream details** page for your stream. The values for *Data freshness* and *Throughput* need to be `0`.\n7. Start the GTID-based stream. For more information, see [Start the stream](/datastream/docs/run-a-stream#startstream).\n8. Resume traffic to the source database.\n\nIf performing a backfill isn't an issue, you can truncate your tables in\nBigQuery, delete the old stream, and start a new one with backfill. For\nmore information about managing backfill, see\n[Manage backfill for the objects of a stream](/datastream/docs/manage-backfill-for-the-objects-of-a-stream).\n\nVersions\n--------\n\nDatastream supports the following versions of MySQL database:\n\n- MySQL 5.6\n- MySQL 5.7\n- MySQL 8.0\n- MySQL 8.4 (supported only for GTID-based replication)\n\n | Global transaction identifier (GTID)-based replication is only supported for versions 5.7 and later.\n\nDatastream supports the following types of MySQL database:\n\n- [Self-hosted MySQL](/datastream/docs/configure-self-managed-mysql)\n- [Cloud SQL for MySQL](/datastream/docs/configure-cloudsql-mysql) Cloud SQL for MySQL Enterprise Plus sources are supported when using the GTID-based replication.\n- [Amazon RDS for MySQL](/datastream/docs/configure-amazon-rds-mysql)\n- [Amazon Aurora MySQL](/datastream/docs/configure-amazon-aurora-mysql)\n- [MariaDB](/datastream/docs/configure-self-managed-mysql)\n- [Alibaba Cloud PolarDB](/datastream/docs/configure-self-managed-mysql)\n- [Percona Server for MySQL](/datastream/docs/configure-self-managed-mysql)\n\nKnown limitations\n-----------------\n\nKnown limitations for using MySQL database as a source include:\n\n- Streams are limited to 10,000 tables.\n- Tables that have a primary key defined as `INVISIBLE` can't be backfilled.\n- A table that has more than 500 million rows can't be backfilled unless the following conditions are met:\n 1. The table has a unique index.\n 2. None of the columns of the index are nullable.\n 3. The index isn't [descending](https://dev.mysql.com/doc/refman/8.0/en/descending-indexes.html).\n 4. All columns of the index are included in the stream.\n- Datastream periodically fetches the latest schema from the source as events are processed. If a schema changes, Datastream detects the schema change and triggers a schema fetch. However, some events might get processed incorrectly or get dropped between the schema fetches, which can cause data discrepancies.\n- Not all changes to the source schema can be detected automatically, in which case data corruption may occur. The following schema changes may cause data corruption or failure to process the events downstream:\n - Dropping columns\n - Adding columns to the middle of a table\n - Changing the data type of a column\n - Reordering columns\n - Dropping tables (relevant if the same table is then recreated with new data added)\n - Truncating tables\n- Datastream doesn't support replicating views.\n- Datastream doesn't support columns of [spatial data types](https://dev.mysql.com/doc/refman/8.0/en/spatial-type-overview.html). The values in these columns are replaced with `NULL` values.\n- Datastream doesn't support the zero value (`0000-00-00 00:00:00`) in columns of the `DATETIME`, `DATE`, or `TIMESTAMP` data types. The zero value is replaced with the `NULL` value.\n- Datastream doesn't support replicating rows which include the following values in `JSON` columns: `DECIMAL`, `NEWDECIMAL`, `TIME`, `TIME2` `DATETIME`, `DATETIME2`, `DATE`, `TIMESTAMP` or `TIMESTAMP2`. Events containing such values are discarded.\n- Datastream doesn't support [binary log transaction compression](https://dev.mysql.com/doc/refman/8.0/en/binary-log-transaction-compression.html).\n- Datastream doesn't support SSL certificate chains in the source MySQL connection profiles. Only single, x509 PEM-encoded certificates are supported.\n- Datastream doesn't support cascading deletes. Such events aren't written to the binary log, and as a result, aren't propagated to the destination.\n- Datastream doesn't support `DROP PARTITION` operations. Such operations are metadata only operations and aren't replicated. Other events aren't affected and the stream runs successfully.\n- Because Datastream doesn't support failovers to replicas when using the binary log-based replication, we recommend using GTID-based replication for Cloud SQL for MySQL Enterprise Plus sources. Cloud SQL Enterprise Plus instances are subject to [near-zero downtime maintenance](/sql/docs/mysql/maintenance#nearzero) and fail over to a replica during maintenance.\n\nAdditional limitations for the GTID-based replication\n-----------------------------------------------------\n\n- Recovering streams that use GTID-based replication is only available when using the Datastream API.\n- Creating tables from other tables using the `CREATE TABLE ... SELECT` statements isn't supported.\n- Datastream doesn't support tagged GTIDs.\n- For MySQL restrictions that apply to GTID-based replication, see [MySQL documentation](https://dev.mysql.com/doc/mysql-replication-excerpt/5.7/en/replication-gtids-restrictions.html).\n\nWhat's next\n-----------\n\n- Learn how to [configure a MySQL source](/datastream/docs/configure-your-source-mysql-database) for use with Datastream."]]