Restez organisé à l'aide des collections
Enregistrez et classez les contenus selon vos préférences.
Le modèle "Bigtable vers Cloud Storage SequenceFile" est un pipeline qui lit les données d'une table Bigtable et les écrit dans un bucket Cloud Storage au format SequenceFile. Vous pouvez utiliser ce modèle pour copier des données de Bigtable vers Cloud Storage.
Conditions requises pour ce pipeline
La table Bigtable doit exister.
Le bucket Cloud Storage de sortie doit exister avant l'exécution du pipeline.
Paramètres de modèle
Paramètres obligatoires
bigtableProject : ID du projet Google Cloud contenant l'instance Bigtable dont vous souhaitez lire les données.
bigtableInstanceId : ID de l'instance Bigtable qui contient la table.
bigtableTableId : ID de la table Bigtable à exporter.
destinationPath : chemin d'accès Cloud Storage où les données sont écrites. Exemple :gs://your-bucket/your-path/
filenamePrefix : préfixe du nom de fichier SequenceFile. Exemple :output-
Le nom de la version, par exemple 2023-09-12-00_RC00, pour utiliser une version spécifique du modèle, qui est imbriqué dans le dossier parent daté respectif dans le bucket : gs://dataflow-templates-REGION_NAME/
REGION_NAME : région dans laquelle vous souhaitez déployer votre job Dataflow, par exemple us-central1
BIGTABLE_PROJECT_ID : ID du projet Google Cloud de l'instance Bigtable dont vous souhaitez lire les données
INSTANCE_ID : ID de l'instance Bigtable qui contient la table
TABLE_ID : ID de la table Bigtable à exporter
APPLICATION_PROFILE_ID : ID du profil d'application Bigtable à utiliser pour l'exportation
DESTINATION_PATH : chemin d'accès à Cloud Storage où les données sont écrites, par exemple, gs://mybucket/somefolder
FILENAME_PREFIX : préfixe du nom de fichier SequenceFile (par exemple, output-)
API
Pour exécuter le modèle à l'aide de l'API REST, envoyez une requête HTTP POST. Pour en savoir plus sur l'API, ses autorisations et leurs champs d'application, consultez la section projects.templates.launch.
Le nom de la version, par exemple 2023-09-12-00_RC00, pour utiliser une version spécifique du modèle, qui est imbriqué dans le dossier parent daté respectif dans le bucket : gs://dataflow-templates-REGION_NAME/
LOCATION : région dans laquelle vous souhaitez déployer votre job Dataflow, par exemple us-central1
BIGTABLE_PROJECT_ID : ID du projet Google Cloud de l'instance Bigtable dont vous souhaitez lire les données
INSTANCE_ID : ID de l'instance Bigtable qui contient la table
TABLE_ID : ID de la table Bigtable à exporter
APPLICATION_PROFILE_ID : ID du profil d'application Bigtable à utiliser pour l'exportation
DESTINATION_PATH : chemin d'accès à Cloud Storage où les données sont écrites, par exemple, gs://mybucket/somefolder
FILENAME_PREFIX : préfixe du nom de fichier SequenceFile (par exemple, output-)
Sauf indication contraire, le contenu de cette page est régi par une licence Creative Commons Attribution 4.0, et les échantillons de code sont régis par une licence Apache 2.0. Pour en savoir plus, consultez les Règles du site Google Developers. Java est une marque déposée d'Oracle et/ou de ses sociétés affiliées.
Dernière mise à jour le 2025/09/10 (UTC).
[[["Facile à comprendre","easyToUnderstand","thumb-up"],["J'ai pu résoudre mon problème","solvedMyProblem","thumb-up"],["Autre","otherUp","thumb-up"]],[["Difficile à comprendre","hardToUnderstand","thumb-down"],["Informations ou exemple de code incorrects","incorrectInformationOrSampleCode","thumb-down"],["Il n'y a pas l'information/les exemples dont j'ai besoin","missingTheInformationSamplesINeed","thumb-down"],["Problème de traduction","translationIssue","thumb-down"],["Autre","otherDown","thumb-down"]],["Dernière mise à jour le 2025/09/10 (UTC)."],[[["\u003cp\u003eThis pipeline template copies data from a Bigtable table to a Cloud Storage bucket in SequenceFile format.\u003c/p\u003e\n"],["\u003cp\u003eThe template requires the Bigtable table and the output Cloud Storage bucket to exist before running.\u003c/p\u003e\n"],["\u003cp\u003eYou need to specify the Bigtable project, instance, table IDs, the destination path, and a filename prefix to use the template.\u003c/p\u003e\n"],["\u003cp\u003eThe template can be run through the Google Cloud console, the gcloud CLI, or the REST API using a variety of parameters that are optional and required.\u003c/p\u003e\n"],["\u003cp\u003eThe template source code can be found on GitHub in the GoogleCloudPlatform/cloud-bigtable-client repository, and specific versions of the template are available.\u003c/p\u003e\n"]]],[],null,["The Bigtable to Cloud Storage SequenceFile template is a pipeline that reads\ndata from a Bigtable table and writes the data to a Cloud Storage bucket\nin SequenceFile format. You can use the template to copy data from Bigtable to\nCloud Storage.\n\nPipeline requirements\n\n- The Bigtable table must exist.\n- The output Cloud Storage bucket must exist before running the pipeline.\n\nTemplate parameters\n\nRequired parameters\n\n- **bigtableProject**: The ID of the Google Cloud project that contains the Bigtable instance that you want to read data from.\n- **bigtableInstanceId**: The ID of the Bigtable instance that contains the table.\n- **bigtableTableId**: The ID of the Bigtable table to export.\n- **destinationPath** : The Cloud Storage path where data is written. For example, `gs://your-bucket/your-path/`.\n- **filenamePrefix** : The prefix of the SequenceFile filename. For example, `output-`.\n\nOptional parameters\n\n- **bigtableAppProfileId** : The ID of the Bigtable application profile to use for the export. If you don't specify an app profile, Bigtable uses the instance's default app profile: \u003chttps://cloud.google.com/bigtable/docs/app-profiles#default-app-profile\u003e.\n- **bigtableStartRow**: The row where to start the export from, defaults to the first row.\n- **bigtableStopRow**: The row where to stop the export, defaults to the last row.\n- **bigtableMaxVersions**: Maximum number of cell versions. Defaults to: 2147483647.\n- **bigtableFilter** : Filter string. See: \u003chttp://hbase.apache.org/book.html#thrift\u003e. Defaults to empty.\n\nRun the template \n\nConsole\n\n1. Go to the Dataflow **Create job from template** page.\n[Go to Create job from template](https://console.cloud.google.com/dataflow/createjob)\n2. In the **Job name** field, enter a unique job name.\n3. Optional: For **Regional endpoint** , select a value from the drop-down menu. The default region is `us-central1`.\n\n\n For a list of regions where you can run a Dataflow job, see\n [Dataflow locations](/dataflow/docs/resources/locations).\n4. From the **Dataflow template** drop-down menu, select the **Cloud Bigtable to SequenceFile Files on Cloud Storage** template .\n5. In the provided parameter fields, enter your parameter values.\n6. Click **Run job**.\n\ngcloud **Note:** To use the Google Cloud CLI to run classic templates, you must have [Google Cloud CLI](/sdk/docs/install) version 138.0.0 or later.\n\nIn your shell or terminal, run the template: \n\n```bash\ngcloud dataflow jobs run JOB_NAME \\\n --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/Cloud_Bigtable_to_GCS_SequenceFile \\\n --region REGION_NAME \\\n --parameters \\\nbigtableProject=BIGTABLE_PROJECT_ID,\\\nbigtableInstanceId=INSTANCE_ID,\\\nbigtableTableId=TABLE_ID,\\\nbigtableAppProfileId=APPLICATION_PROFILE_ID,\\\ndestinationPath=DESTINATION_PATH,\\\nfilenamePrefix=FILENAME_PREFIX\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e: a unique job name of your choice\n- \u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e: the version of the template that you want to use\n\n You can use the following values:\n - `latest` to use the latest version of the template, which is available in the **non-dated** parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/latest/](https://console.cloud.google.com/storage/browser/dataflow-templates/latest)\n - the version name, like `2023-09-12-00_RC00`, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/](https://console.cloud.google.com/storage/browser/dataflow-templates)\n\n | **Caution:** The **latest** version of templates might update with breaking changes. Your production environments should use templates kept in the most recent **dated** parent folder to prevent these breaking changes from affecting your production workflows.\n- \u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e: the [region](/dataflow/docs/resources/locations) where you want to deploy your Dataflow job---for example, `us-central1`\n- \u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project of the Bigtable instance that you want to read data from\n- \u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e: the ID of the Bigtable instance that contains the table\n- \u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e: the ID of the Bigtable table to export\n- \u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e: the ID of the Bigtable application profile to be used for the export\n- \u003cvar translate=\"no\"\u003eDESTINATION_PATH\u003c/var\u003e: the Cloud Storage path where data is written, for example, `gs://mybucket/somefolder`\n- \u003cvar translate=\"no\"\u003eFILENAME_PREFIX\u003c/var\u003e: the prefix of the SequenceFile filename, for example, `output-`\n\nAPI\n\nTo run the template using the REST API, send an HTTP POST request. For more information on the\nAPI and its authorization scopes, see\n[`projects.templates.launch`](/dataflow/docs/reference/rest/v1b3/projects.templates/launch). \n\n```json\nPOST https://dataflow.googleapis.com/v1b3/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/templates:launch?gcsPath=gs://dataflow-templates-\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/\u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e/Cloud_Bigtable_to_GCS_SequenceFile\n{\n \"jobName\": \"\u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e\",\n \"parameters\": {\n \"bigtableProject\": \"\u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e\",\n \"bigtableInstanceId\": \"\u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e\",\n \"bigtableTableId\": \"\u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e\",\n \"bigtableAppProfileId\": \"\u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e\",\n \"destinationPath\": \"\u003cvar translate=\"no\"\u003eDESTINATION_PATH\u003c/var\u003e\",\n \"filenamePrefix\": \"\u003cvar translate=\"no\"\u003eFILENAME_PREFIX\u003c/var\u003e\",\n },\n \"environment\": { \"zone\": \"us-central1-f\" }\n}\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the Google Cloud project ID where you want to run the Dataflow job\n- \u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e: a unique job name of your choice\n- \u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e: the version of the template that you want to use\n\n You can use the following values:\n - `latest` to use the latest version of the template, which is available in the **non-dated** parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/latest/](https://console.cloud.google.com/storage/browser/dataflow-templates/latest)\n - the version name, like `2023-09-12-00_RC00`, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/](https://console.cloud.google.com/storage/browser/dataflow-templates)\n\n | **Caution:** The **latest** version of templates might update with breaking changes. Your production environments should use templates kept in the most recent **dated** parent folder to prevent these breaking changes from affecting your production workflows.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the [region](/dataflow/docs/resources/locations) where you want to deploy your Dataflow job---for example, `us-central1`\n- \u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project of the Bigtable instance that you want to read data from\n- \u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e: the ID of the Bigtable instance that contains the table\n- \u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e: the ID of the Bigtable table to export\n- \u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e: the ID of the Bigtable application profile to be used for the export\n- \u003cvar translate=\"no\"\u003eDESTINATION_PATH\u003c/var\u003e: the Cloud Storage path where data is written, for example, `gs://mybucket/somefolder`\n- \u003cvar translate=\"no\"\u003eFILENAME_PREFIX\u003c/var\u003e: the prefix of the SequenceFile filename, for example, `output-`\n\nTemplate source code \n\nJava\n\nThis template's source code is in the [GoogleCloudPlatform/cloud-bigtable-client repository](https://github.com/GoogleCloudPlatform/cloud-bigtable-client/tree/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles) on GitHub.\n\nWhat's next\n\n- Learn about [Dataflow templates](/dataflow/docs/concepts/dataflow-templates).\n- See the list of [Google-provided templates](/dataflow/docs/guides/templates/provided-templates).\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e"]]