Cloud Storage SequenceFile to Bigtable 템플릿은 Cloud Storage 버킷의 SequenceFiles에서 데이터를 읽고 Bigtable 테이블에 데이터를 쓰는 파이프라인입니다. 템플릿을 사용하여 Cloud Storage에서 Bigtable로 데이터를 복사할 수 있습니다.
파이프라인 요구사항
Bigtable 테이블이 있어야 합니다.
파이프라인을 실행하기 전에 Cloud Storage 버킷에 입력 SequenceFiles가 있어야 합니다.
입력 SequenceFiles는 Bigtable 또는 HBase에서 내보내야 합니다.
템플릿 매개변수
필수 매개변수
bigtableProject: 데이터를 쓰려는 Bigtable 인스턴스가 포함된 Google Cloud 프로젝트의 ID입니다.
bigtableInstanceId: 테이블이 포함된 Bigtable 인스턴스의 ID입니다.
bigtableTableId: 가져올 Bigtable 테이블의 ID입니다.
sourcePattern: 데이터 위치에 대한 Cloud Storage 경로 패턴입니다. 예를 들면 gs://your-bucket/your-path/prefix*입니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-04-21(UTC)"],[[["\u003cp\u003eThis template moves data from SequenceFiles in Cloud Storage to a Bigtable table, effectively copying data between these services.\u003c/p\u003e\n"],["\u003cp\u003eThe Bigtable table and input SequenceFiles within a Cloud Storage bucket must exist prior to initiating the pipeline.\u003c/p\u003e\n"],["\u003cp\u003eRequired parameters for running the template include the Bigtable project ID, instance ID, table ID, and the source pattern for the Cloud Storage data.\u003c/p\u003e\n"],["\u003cp\u003eYou can run the template via the Dataflow console, the gcloud command-line tool, or using the REST API with specific parameters and unique job name.\u003c/p\u003e\n"],["\u003cp\u003eThe template's source code is available in the GoogleCloudPlatform/cloud-bigtable-client GitHub repository, which also allows for the use of the latest template or a specific version.\u003c/p\u003e\n"]]],[],null,["The Cloud Storage SequenceFile to Bigtable template is a pipeline that reads\ndata from SequenceFiles in a Cloud Storage bucket and writes the data to a\nBigtable table. You can use the template to copy data from Cloud Storage\nto Bigtable.\n\nPipeline requirements\n\n- The Bigtable table must exist.\n- The input SequenceFiles must exist in a Cloud Storage bucket before running the pipeline.\n- The input SequenceFiles must have been exported from Bigtable or HBase.\n\nTemplate parameters\n\nRequired parameters\n\n- **bigtableProject**: The ID of the Google Cloud project that contains the Bigtable instance that you want to write data to.\n- **bigtableInstanceId**: The ID of the Bigtable instance that contains the table.\n- **bigtableTableId**: The ID of the Bigtable table to import.\n- **sourcePattern** : The Cloud Storage path pattern to the location of the data. For example, `gs://your-bucket/your-path/prefix*`.\n\nOptional parameters\n\n- **bigtableAppProfileId** : The ID of the Bigtable application profile to use for the import. If you don't specify an application profile, Bigtable uses the instance's default application profile (\u003chttps://cloud.google.com/bigtable/docs/app-profiles#default-app-profile\u003e).\n- **mutationThrottleLatencyMs**: Optional Set mutation latency throttling (enables the feature). Value in milliseconds. Defaults to: 0.\n\nRun the template \n\nConsole\n\n1. Go to the Dataflow **Create job from template** page.\n[Go to Create job from template](https://console.cloud.google.com/dataflow/createjob)\n2. In the **Job name** field, enter a unique job name.\n3. Optional: For **Regional endpoint** , select a value from the drop-down menu. The default region is `us-central1`.\n\n\n For a list of regions where you can run a Dataflow job, see\n [Dataflow locations](/dataflow/docs/resources/locations).\n4. From the **Dataflow template** drop-down menu, select the **SequenceFile Files on Cloud Storage to Cloud Bigtable** template.\n5. In the provided parameter fields, enter your parameter values.\n6. Click **Run job**.\n\ngcloud **Note:** To use the Google Cloud CLI to run classic templates, you must have [Google Cloud CLI](/sdk/docs/install) version 138.0.0 or later.\n\nIn your shell or terminal, run the template: \n\n```bash\ngcloud dataflow jobs run JOB_NAME \\\n --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/GCS_SequenceFile_to_Cloud_Bigtable \\\n --region REGION_NAME \\\n --parameters \\\nbigtableProject=BIGTABLE_PROJECT_ID,\\\nbigtableInstanceId=INSTANCE_ID,\\\nbigtableTableId=TABLE_ID,\\\nbigtableAppProfileId=APPLICATION_PROFILE_ID,\\\nsourcePattern=SOURCE_PATTERN\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e: a unique job name of your choice\n- \u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e: the version of the template that you want to use\n\n You can use the following values:\n - `latest` to use the latest version of the template, which is available in the **non-dated** parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/latest/](https://console.cloud.google.com/storage/browser/dataflow-templates/latest)\n - the version name, like `2023-09-12-00_RC00`, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/](https://console.cloud.google.com/storage/browser/dataflow-templates)\n\n | **Caution:** The **latest** version of templates might update with breaking changes. Your production environments should use templates kept in the most recent **dated** parent folder to prevent these breaking changes from affecting your production workflows.\n- \u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e: the [region](/dataflow/docs/resources/locations) where you want to deploy your Dataflow job---for example, `us-central1`\n- \u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project of the Bigtable instance that you want to read data from\n- \u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e: the ID of the Bigtable instance that contains the table\n- \u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e: the ID of the Bigtable table to export\n- \u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e: the ID of the Bigtable application profile to be used for the export\n- \u003cvar translate=\"no\"\u003eSOURCE_PATTERN\u003c/var\u003e: the Cloud Storage path pattern where data is located, for example, `gs://mybucket/somefolder/prefix*`\n\nAPI\n\nTo run the template using the REST API, send an HTTP POST request. For more information on the\nAPI and its authorization scopes, see\n[`projects.templates.launch`](/dataflow/docs/reference/rest/v1b3/projects.templates/launch). \n\n```json\nPOST https://dataflow.googleapis.com/v1b3/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/templates:launch?gcsPath=gs://dataflow-templates-\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/\u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e/GCS_SequenceFile_to_Cloud_Bigtable\n{\n \"jobName\": \"\u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e\",\n \"parameters\": {\n \"bigtableProject\": \"\u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e\",\n \"bigtableInstanceId\": \"\u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e\",\n \"bigtableTableId\": \"\u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e\",\n \"bigtableAppProfileId\": \"\u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e\",\n \"sourcePattern\": \"\u003cvar translate=\"no\"\u003eSOURCE_PATTERN\u003c/var\u003e\",\n },\n \"environment\": { \"zone\": \"us-central1-f\" }\n}\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the Google Cloud project ID where you want to run the Dataflow job\n- \u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e: a unique job name of your choice\n- \u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e: the version of the template that you want to use\n\n You can use the following values:\n - `latest` to use the latest version of the template, which is available in the **non-dated** parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/latest/](https://console.cloud.google.com/storage/browser/dataflow-templates/latest)\n - the version name, like `2023-09-12-00_RC00`, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/](https://console.cloud.google.com/storage/browser/dataflow-templates)\n\n | **Caution:** The **latest** version of templates might update with breaking changes. Your production environments should use templates kept in the most recent **dated** parent folder to prevent these breaking changes from affecting your production workflows.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the [region](/dataflow/docs/resources/locations) where you want to deploy your Dataflow job---for example, `us-central1`\n- \u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project of the Bigtable instance that you want to read data from\n- \u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e: the ID of the Bigtable instance that contains the table\n- \u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e: the ID of the Bigtable table to export\n- \u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e: the ID of the Bigtable application profile to be used for the export\n- \u003cvar translate=\"no\"\u003eSOURCE_PATTERN\u003c/var\u003e: the Cloud Storage path pattern where data is located, for example, `gs://mybucket/somefolder/prefix*`\n\nTemplate source code \n\nJava\n\nThis template's source code is in the [GoogleCloudPlatform/cloud-bigtable-client repository](https://github.com/GoogleCloudPlatform/cloud-bigtable-client/tree/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles) on GitHub.\n\nWhat's next\n\n- Learn about [Dataflow templates](/dataflow/docs/concepts/dataflow-templates).\n- See the list of [Google-provided templates](/dataflow/docs/guides/templates/provided-templates).\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e"]]