Datastream은 순서를 보장하지 않지만 각 이벤트에 전체 데이터 행과 데이터가 소스에 기록된 타임스탬프가 포함됩니다. BigQuery에서는 잘못된 순서의 이벤트가 올바른 순서로 자동 병합됩니다. BigQuery는 이벤트 메타데이터 및 내부 변경 시퀀스 넘버(CSN)를 사용하여 이벤트를 올바른 순서로 테이블에 적용합니다. Cloud Storage의 경우 동시에 발생하는 이벤트는 파일 두 개 이상에 걸쳐 있을 수 있습니다.
잘못된 순서로 생성되는 이벤트는 스트림이 시작될 때 만드는 데이터의 초기 백필을 위해 이벤트가 백필되는 경우 발생하도록 설계된 것입니다.
소스별로 순서가 추론될 수 있습니다.
소스
설명
MySQL
초기 백필의 일부인 이벤트는 mysql-backfill로 시작하는 read_method 필드를 갖습니다. 순서에 관계없이 소비될 수 있으므로 백필 내에서 이벤트가 수신되는 순서에는 의미가 없습니다.
source_timestamp 필드와 lsn(로그 시퀀스 번호) 필드의 조합을 통해 순서를 추론할 수 있습니다. 이 조합은 데이터베이스의 작업 순서를 식별하는 고유한 증분 번호를 제공합니다.
SQL Server
초기 백필의 일부인 이벤트는 sqlserver-backfill로 시작하는 read_method 필드를 갖습니다. 순서에 관계없이 소비될 수 있으므로 백필 내에서 이벤트가 수신되는 순서에는 의미가 없습니다.
진행 중인 복제의 일부인 이벤트는 read_method 필드가 sqlserver-cdc로 설정됩니다.
source_timestamp 필드와 lsn(로그 시퀀스 번호) 필드의 조합을 통해 순서를 추론할 수 있습니다. 이 조합은 데이터베이스의 작업 순서를 식별하는 고유한 증분 번호를 제공합니다.
일관성
Datastream은 소스 데이터베이스의 데이터가 적어도 한 번은 대상에 전달되도록 보장합니다. 이벤트가 누락되지는 않지만 스트림에서 이벤트가 중복될 수 있습니다. 중복 이벤트의 기간은 분 단위여야 하며 이벤트 메타데이터에 있는 이벤트의 범용 고유 식별자 (UUID)를 사용하여 중복을 감지할 수 있습니다.
데이터베이스 로그 파일에 커밋되지 않은 트랜잭션이 포함된 경우 트랜잭션이 롤백되면 데이터베이스에서 이를 로그 파일에 '역전' DML 작업으로 반영합니다. 예를 들어 롤백된 INSERT 작업에는 이에 해당하는 DELETE 작업이 있습니다. DataStream이 로그 파일에서 이러한 작업을 읽습니다.
스트림 정보
모든 스트림에는 스트림과 데이터를 가져오는 소스를 모두 설명하는 메타데이터가 있습니다. 이 메타데이터에는 스트림 이름, 소스 및 대상 연결 프로필 등과 같은 정보가 포함됩니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eDatastream's data hierarchy consists of streams, which encompass a data source and destination, objects which represent portions of a stream like a table, and events which are the individual changes made to an object, such as a database insert.\u003c/p\u003e\n"],["\u003cp\u003eEach event in Datastream contains event data, which is the row that was modified; generic metadata, which is consistent across all events; and source-specific metadata, which varies depending on the data source.\u003c/p\u003e\n"],["\u003cp\u003eGeneric metadata fields included in every event provide information such as the stream name, the read method (e.g. CDC or backfill), the object name, the schema key, a unique event identifier, read and source timestamps, and sort keys.\u003c/p\u003e\n"],["\u003cp\u003eSource-specific metadata fields vary by source database (MySQL, Oracle, etc.) and can include details like log file, log position, transaction ID, or table name.\u003c/p\u003e\n"],["\u003cp\u003eWhile Datastream doesn't guarantee strict event ordering, each event contains a full data row and timestamps, and ordering can be inferred using various field combinations like \u003ccode\u003elog_file\u003c/code\u003e and \u003ccode\u003elog_position\u003c/code\u003e for MySQL, or \u003ccode\u003ers_id\u003c/code\u003e and \u003ccode\u003essn\u003c/code\u003e for Oracle.\u003c/p\u003e\n"]]],[],null,["# Events and streams\n\nThe data hierarchy in Datastream is:\n\n- A **stream**, which is comprised of a data source and a destination.\n- An **object**, which is a portion of a stream, such as a table from a specific database.\n- An **event**, which is a single change generated by a specific object, such as a database insert.\n\nStreams, objects, and events have data and metadata associated with them. This data and metadata can be used for different purposes.\n\nAbout events\n------------\n\nEach event consists of three types of data:\n\n- **Event data**: This represents the change to the data itself from the object originating from the stream source. Every event contains the entirety of the row that changed.\n- **Generic metadata**: This metadata appears on every event generated by Datastream which is used for actions, such as removing duplicate data in the destination.\n- **Source-specific metadata**: This metadata appears on every event generated by a specific stream source. This metadata varies by source.\n\n### Event data\n\nEvent data is the payload of every change from a given object originating from a stream source.\n\nEvents are in either the Avro or JSON format.\n\nWhen working with the Avro format, for each column, the event contains the column index and value. Using the column index, the column name and unified type can be retrieved from the schema in the Avro header.\n| **Note:** When Datastream converts data to the Avro format, special characters in column names are replaced with underscores. This can lead to duplicate column names and result in your streams failing. For more information about naming conventions in Avro, see the [Apache Avro documentation](https://avro.apache.org/docs/1.11.1/specification/#names).\n\nWhen working with the JSON format, for each column, the event contains the column name and value.\n\nEvent metadata can be used to collect information about the event's origin, as well as to remove duplicate data in the destination and order events by the downstream consumer.\n\nThe following tables list and describe the fields and data types for generic and source-specific event metadata.\n\n### Generic metadata\n\nThis metadata is consistent across streams of all types.\n\n### Source-specific metadata\n\nThis metadata is associated with CDC and backfill events from a source database. To view this metadata, select a source from the drop-down menu that follows. \nAll sources MySQL Oracle PostgreSQL SQL Server Salesforce MongoDB \n\n### Example of an event flow\n\nThis flow illustrates the events generated by three consecutive operations:\n`INSERT`, `UPDATE`, and `DELETE`, on a single row in a `SAMPLE` table for a source database.\n| **Note:** This sample event flow is for an Oracle database, but it's similar for a MySQL or PostgreSQL database.\n\n#### INSERT (T0)\n\nThe message payload consists of the entirety of the new row. \n\n {\n \"stream_name\": \"projects/myProj/locations/myLoc/streams/Oracle-to-Source\",\n \"read_method\": \"oracle-cdc-logminer\",\n \"object\": \"SAMPLE.TBL\",\n \"uuid\": \"d7989206-380f-0e81-8056-240501101100\",\n \"read_timestamp\": \"2019-11-07T07:37:16.808Z\",\n \"source_timestamp\": \"2019-11-07T02:15:39\", \n \"source_metadata\": {\n \"log_file\": \"\"\n \"scn\": 15869116216871,\n \"row_id\": \"AAAPwRAALAAMzMBABD\",\n \"is_deleted\": false,\n \"database\": \"DB1\",\n \"schema\": \"ROOT\",\n \"table\": \"SAMPLE\"\n \"change_type\": \"INSERT\",\n \"tx_id\": \n \"rs_id\": \"0x0073c9.000a4e4c.01d0\",\n \"ssn\": 67,\n },\n \"payload\": {\n \"THIS_IS_MY_PK\": \"1231535353\",\n \"FIELD1\": \"foo\",\n \"FIELD2\": \"TLV\",\n }\n }\n\n#### UPDATE (T1)\n\nThe message payload consists of the entirety of the new row. It doesn't include previous values. \n\n {\n \"stream_name\": \"projects/myProj/locations/myLoc/streams/Oracle-to-Source\",\n \"read_method\": \"oracle-cdc-logminer\",\n \"object\": \"SAMPLE.TBL\",\n \"uuid\": \"e6067366-1efc-0a10-a084-0d8701101101\",\n \"read_timestamp\": \"2019-11-07T07:37:18.808Z\",\n \"source_timestamp\": \"2019-11-07T02:17:39\", \n \"source_metadata\": {\n \"log_file\": \n \"scn\": 15869150473224,\n \"row_id\": \"AAAGYPAATAAPIC5AAB\",\n \"is_deleted\": false,\n \"database\":\n \"schema\": \"ROOT\",\n \"table\": \"SAMPLE\"\n \"change_type\": \"UPDATE\",\n \"tx_id\":\n \"rs_id\": \"0x006cf4.00056b26.0010\",\n \"ssn\": 0,\n },\n \"payload\": {\n \"THIS_IS_MY_PK\": \"1231535353\",\n \"FIELD1\": null,\n \"FIELD2\": \"TLV\",\n }\n }\n\n#### DELETE (T2)\n\nThe message payload consists of the entirety of the new row. \n\n {\n \"stream_name\": \"projects/myProj/locations/myLoc/streams/Oracle-to-Source\",\n \"read_method\": \"oracle-cdc-logminer\",\n \"object\": \"SAMPLE.TBL\",\n \"uuid\": \"c504f4bc-0ffc-4a1a-84df-6aba382fa651\",\n \"read_timestamp\": \"2019-11-07T07:37:20.808Z\",\n \"source_timestamp\": \"2019-11-07T02:19:39\",\n \"source_metadata\": {\n \"log_file\": \n \"scn\": 158691504732555,\n \"row_id\": \"AAAGYPAATAAPIC5AAC\",\n \"is_deleted\": true,\n \"database\":\n \"schema\": \"ROOT\",\n \"table\": \"SAMPLE\"\n \"change_type\": \"DELETE\",\n \"tx_id\":\n \"rs_id\": \"0x006cf4.00056b26.0011\",\n \"ssn\": 0,\n },\n \"payload\": {\n \"THIS_IS_MY_PK\": \"1231535353\",\n \"FIELD1\": null,\n \"FIELD2\": \"TLV\",\n }\n }\n\nOrdering and consistency\n------------------------\n\nThis section covers how Datastream handles ordering and consistency.\n\n### Ordering\n\nDatastream doesn't guarantee ordering, but each event contains the full row of data and the timestamp of when the data was written to the source. In BigQuery, out-of-order events are merged in the correct sequence automatically. BigQuery uses the event metadata and an internal change sequence number (CSN) to apply the events to the table in the correct order. In Cloud Storage, events from the same time can span more than one file.\n\nEvents that are generated out of order happen by design when events are backfilled for the initial backfill of data that's created when the stream is initiated.\n\nOrdering can be inferred on a source-by-source basis.\n\n\u003cbr /\u003e\n\n### Consistency\n\nDatastream ensures that the data from the source database is delivered to the destination at least once. No event is missed, but there's a possibility of duplicate events in the stream. The window for duplicate events should be on the order of minutes, and the universally unique identifier (UUID) of the event in the [event metadata](#genericmetadata) can be used to detect duplicates.\n\nWhen database log files contain uncommitted transactions, if any transactions are rolled back, then the database reflects this in the log files as \"reverse\" data manipulation language (DML) operations. For example, a rolled-back `INSERT` operation will have a corresponding `DELETE` operation. Datastream reads these operations from the log files.\n\nAbout streams\n-------------\n\nEvery stream has metadata that describes both the stream and the source from which it pulls data. This metadata includes information such as the stream name, the source and destination connection profiles.\n\nTo view the complete definition of the Stream object, see the [API Reference](/datastream/docs/reference/rest/v1/projects.locations.streams) documentation.\n\n### Stream state and status\n\nA stream can be in one of the following states:\n\n- `Not started`\n- `Starting`\n- `Running`\n- `Draining`\n- `Paused`\n- `Failed`\n- `Failed permanently`\n\n| For more information about these states, including how they're used in the lifecycle of a stream, see [Stream lifecycle](/datastream/docs/stream-states-and-actions).\n\nYou can use the logs to find additional status information, such as the tables\nbackfilling or number of rows processed. You can also use the\n[`FetchStreamErrors`](/datastream/docs/manage-streams#fetchstreamerrors) API to\nretrieve errors.\n\nObject metadata available using the discover API\n------------------------------------------------\n\nThe discover API returns objects which represent the structure of the objects\ndefined in the data source or destination that's represented by the connection\nprofile. Each object has metadata on the object itself, as well as for every\nfield of data that it pulls. This metadata is available using the [discover\nAPI](/datastream/docs/reference/rest/v1/projects.locations.connectionProfiles/discover).\n\nWhat's next\n-----------\n\n- To learn more about streams, see [Stream lifecycle](/datastream/docs/stream-states-and-actions).\n- To learn how to create a stream, see [Create a stream](/datastream/docs/create-a-stream)."]]