Apache Flink는 실시간으로 데이터를 조작할 수 있는 스트림 처리 프레임워크입니다. Bigtable 테이블이 있는 경우 Flink Bigtable 커넥터를 사용하여 지정된 데이터 소스의 데이터를 Bigtable로 스트리밍, 직렬화, 쓸 수 있습니다. 이 커넥터를 사용하면 Apache Flink Table API 또는 Datastream API를 사용하여 다음 작업을 할 수 있습니다.
커넥터를 사용하려면 데이터 싱크로 사용할 기존 Bigtable 테이블이 있어야 합니다. 파이프라인을 시작하기 전에 테이블의 column family를 만들어야 합니다. column family는 쓰기 시에 만들 수 없습니다. 자세한 내용은 표 만들기 및 관리를 참고하세요.
RowDataToRowMutationSerializer: Flink RowData 객체의 경우
FunctionRowMutationSerializer: 제공된 함수를 사용하는 맞춤 직렬화 로직
BaseRowMutationSerializer에서 상속받는 자체 맞춤 직렬화 프로그램을 만들 수도 있습니다.
직렬화 모드
Flink 커넥터를 사용할 때는 두 가지 직렬화 모드 중 하나를 선택합니다. 모드는 소스 데이터가 Bigtable column family를 나타내기 위해 직렬화된 후 Bigtable 테이블에 작성되는 방식을 지정합니다. 한 모드 또는 다른 모드를 사용해야 합니다.
Column family 모드
column family 모드에서는 모든 데이터가 지정된 단일 column family에 기록됩니다.
중첩된 필드는 지원되지 않습니다.
중첩된 행 모드
중첩 행 모드에서 각 최상위 필드는 column family를 나타냅니다. 최상위 필드 (RowKeyField)의 값은 다른 필드입니다. 해당 필드의 값에는 Bigtable column family의 각 열에 대한 행 객체가 있습니다. 중첩된 행 모드에서는 최상위 필드를 제외한 모든 필드가 행 객체여야 합니다.
이중으로 중첩된 행은 지원되지 않습니다.
단 한 번 처리
Apache Flink에서 단 한 번은 스트림의 각 데이터 레코드가 정확히 한 번 처리되어 시스템 오류가 발생하더라도 중복 처리나 데이터 손실을 방지한다는 의미입니다.
Bigtable mutateRow 변형은 기본적으로 동일한 row key, column family, column, 타임스탬프, 값이 있는 쓰기 요청이 재시도되더라도 새 셀을 생성하지 않으므로 멱등성입니다. 즉, Bigtable을 Apache Flink 프레임워크의 데이터 싱크로 사용하는 경우 재시도 시 타임스탬프를 변경하지 않고 파이프라인의 나머지 부분도 정확히 한 번 요구사항을 충족하면 정확히 한 번 동작이 자동으로 적용됩니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eThe Flink Bigtable connector enables real-time streaming, serialization, and writing of data from a specified data source to a Bigtable table using either the Apache Flink Table API or the Datastream API.\u003c/p\u003e\n"],["\u003cp\u003eTo use the connector, a pre-existing Bigtable table with predefined column families is required as the data sink.\u003c/p\u003e\n"],["\u003cp\u003eThe connector offers three built-in serializers for converting data into Bigtable mutation entries: \u003ccode\u003eGenericRecordToRowMutationSerializer\u003c/code\u003e, \u003ccode\u003eRowDataToRowMutationSerializer\u003c/code\u003e, and \u003ccode\u003eFunctionRowMutationSerializer\u003c/code\u003e, with the option for custom serializers as well.\u003c/p\u003e\n"],["\u003cp\u003eThere are two serialization modes available, column family mode, where all data is written to a single column family, and nested-rows mode, where each top-level field represents a column family.\u003c/p\u003e\n"],["\u003cp\u003eWhen using Bigtable as a data sink with the connector, exactly-once processing behavior is achieved automatically due to Bigtable's idempotent \u003ccode\u003emutateRow\u003c/code\u003e mutation, provided timestamps aren't changed on retries and the pipeline satisfies exactly-once semantics.\u003c/p\u003e\n"]]],[],null,["Flink Bigtable connector\n\nApache Flink is a stream-processing framework that lets you manipulate data in\nreal time. If you have a Bigtable table, you can use a *Flink\nBigtable connector* to stream, serialize, and write data from your\nspecified data source to Bigtable. The connector lets you do the\nfollowing, using either the Apache Flink Table API or the Datastream API:\n\n1. Create a pipeline\n2. Serialize the values from your data source into Bigtable mutation entries\n3. Write those entries to your Bigtable table\n\nThis document describes the Flink Bigtable connector and what you\nneed to know before you use it. Before you read this document, you should be\nfamiliar with\n[Apache Flink](https://nightlies.apache.org/flink/flink-docs-master/),\nthe\n[Bigtable storage model](/bigtable/docs/overview#storage-model), and\n[Bigtable writes](/bigtable/docs/writes).\n\nTo use the connector, you must have a pre-existing Bigtable table\nto serve as your data sink. You must create the table's column families before\nyou start the pipeline; column families can't be created on write. For more\ninformation, see\n[Create and manage tables](/bigtable/docs/managing-tables).\n\nThe connector is available on GitHub. For information about installing the\nconnector, see the\n[Flink Bigtable Connector](https://github.com/google/flink-connector-gcp/blob/main/connectors/bigtable/README.md)\nrepository. For code samples that demonstrate how to use the connector, see the\n[flink-examples-gcp-bigtable](https://github.com/google/flink-connector-gcp/tree/main/connectors/bigtable/flink-examples-gcp-bigtable)\ndirectory.\n\nSerializers\n\nThe Flink connector has three built-in serializers that you can use to convert\ndata into Bigtable mutation entries:\n\n- `GenericRecordToRowMutationSerializer`: For AVRO `GenericRecord` objects\n- `RowDataToRowMutationSerializer`: For Flink `RowData` objects\n- `FunctionRowMutationSerializer`: For custom serialization logic using a provided function\n\nYou can also choose to create your own custom serializer inheriting from\n`BaseRowMutationSerializer`.\n\nSerialization modes\n\nWhen you use the Flink connector, you choose one of two serialization modes. The\nmode specifies how your source data is serialized to represent your\nBigtable column families and then written your\nBigtable table. You must use either one mode or the other.\n\nColumn family mode\n\nIn column family mode, all data is written to a single specified column family.\nNested fields are not supported.\n\nNested-rows mode\n\nIn nested-rows mode, each top-level field represents a column family. The value\nof the top-level field (RowKeyField) is another field. The value of that field\nhas a row object for each column in the Bigtable column family. In\nnested rows mode, all fields other than the top-level field must be row objects.\nDouble-nested rows are not supported.\n\nExactly-once processing\n\nIn Apache Flink, *exactly once* means that each data record\nin a stream is processed exactly one time, preventing any duplicate processing\nor data loss, even in the event of system failures.\n\nA Bigtable `mutateRow` mutation is idempotent by default, so a\nwrite request that has the same row key, column family, column, timestamp, and\nvalue doesn't create a new cell, even if it's retried. This means that when you\nuse Bigtable as the data sink for an Apache Flink framework, you\nget exactly-once behavior automatically, as long as you don't change the\ntimestamp in retries and the rest of your pipeline also satisfies exactly-once\nrequirements.\n\nFor more information on exactly-once semantics, see\n[An overview of end-to-end exactly-once processing in Apache Flink](https://flink.apache.org/2018/02/28/an-overview-of-end-to-end-exactly-once-processing-in-apache-flink-with-apache-kafka-too/).\n\nWhat's next\n\n- [Bigtable Beam connector](/bigtable/docs/beam-connector)\n- [Bigtable Kafka Connect sink connector](/bigtable/docs/kafka-sink-connector)\n- [Integrations with Bigtable](/bigtable/docs/integrations)\n- [Datastream API reference](/datastream/docs/use-the-datastream-api)"]]