Template Cloud Storage Text to Datastore adalah pipeline batch yang membaca dari file teks yang tersimpan di Cloud Storage dan menulis Entity berenkode JSON ke Datastore.
Setiap baris dalam file teks input harus dalam
format JSON yang ditentukan.
Persyaratan pipeline
Datastore harus diaktifkan di project tujuan.
Parameter template
Parameter
Deskripsi
textReadPattern
Pola jalur Cloud Storage yang menentukan lokasi file data teks Anda.
Misalnya, gs://mybucket/somepath/*.json.
javascriptTextTransformGcsPath
(Opsional)
URI Cloud Storage dari file .js yang menentukan fungsi yang ditentukan pengguna (UDF) JavaScript yang ingin Anda gunakan. Misalnya, gs://my-bucket/my-udfs/my_file.js.
javascriptTextTransformFunctionName
(Opsional)
Nama fungsi yang ditentukan pengguna (UDF) JavaScript yang ingin Anda gunakan.
Misalnya, jika kode fungsi JavaScript Anda adalah myTransform(inJson) { /*...do stuff...*/ }, nama fungsi adalah myTransform. Untuk contoh UDF JavaScript, lihat
Contoh UDF.
datastoreWriteProjectId
ID project Google Cloud tempat menulis entity Datastore
datastoreHintNumWorkers
(Opsional) Petunjuk untuk jumlah pekerja yang diperkirakan dalam langkah throttling peningkatan Datastore. Default-nya adalah 500.
errorWritePath
File output log error yang akan digunakan untuk kegagalan tulis yang terjadi selama pemrosesan. Misalnya, gs://bucket-name/errors.txt.
nama versi, seperti 2023-09-12-00_RC00, untuk menggunakan versi template
tertentu, yang dapat ditemukan bertingkat di folder induk bertanggal masing-masing dalam bucket—
gs://dataflow-templates-REGION_NAME/
REGION_NAME: region tempat Anda ingin men-deploy tugas Dataflow, misalnya us-central1
PATH_TO_INPUT_TEXT_FILES: pola file input di Cloud Storage
JAVASCRIPT_FUNCTION:
nama fungsi yang ditentukan pengguna (UDF) JavaScript yang ingin Anda gunakan
Misalnya, jika kode fungsi JavaScript Anda adalah myTransform(inJson) { /*...do stuff...*/ }, nama fungsi adalah myTransform. Untuk contoh UDF JavaScript, lihat
Contoh UDF.
PATH_TO_JAVASCRIPT_UDF_FILE:
URI Cloud Storage dari file .js yang menentukan fungsi yang ditentukan pengguna (UDF) JavaScript yang ingin Anda gunakan—misalnya, gs://my-bucket/my-udfs/my_file.js
ERROR_FILE_WRITE_PATH: jalur yang Anda inginkan ke file error di Cloud Storage
API
Untuk menjalankan template menggunakan REST API, kirim permintaan HTTP POST. Untuk informasi selengkapnya tentang API dan cakupan otorisasinya, lihat projects.templates.launch.
nama versi, seperti 2023-09-12-00_RC00, untuk menggunakan versi template
tertentu, yang dapat ditemukan bertingkat di folder induk bertanggal masing-masing dalam bucket—
gs://dataflow-templates-REGION_NAME/
LOCATION: region tempat Anda ingin men-deploy tugas Dataflow, misalnya us-central1
PATH_TO_INPUT_TEXT_FILES: pola file input di Cloud Storage
JAVASCRIPT_FUNCTION:
nama fungsi yang ditentukan pengguna (UDF) JavaScript yang ingin Anda gunakan
Misalnya, jika kode fungsi JavaScript Anda adalah myTransform(inJson) { /*...do stuff...*/ }, nama fungsi adalah myTransform. Untuk contoh UDF JavaScript, lihat
Contoh UDF.
PATH_TO_JAVASCRIPT_UDF_FILE:
URI Cloud Storage dari file .js yang menentukan fungsi yang ditentukan pengguna (UDF) JavaScript yang ingin Anda gunakan—misalnya, gs://my-bucket/my-udfs/my_file.js
ERROR_FILE_WRITE_PATH: jalur yang Anda inginkan ke file error di Cloud Storage
Kode sumber template
Java
/*
* Copyright (C) 2018 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You may obtain a copy of
* the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations under
* the License.
*/
package com.google.cloud.teleport.templates;
import com.google.cloud.teleport.metadata.MultiTemplate;
import com.google.cloud.teleport.metadata.Template;
import com.google.cloud.teleport.metadata.TemplateCategory;
import com.google.cloud.teleport.templates.TextToDatastore.TextToDatastoreOptions;
import com.google.cloud.teleport.templates.common.DatastoreConverters.DatastoreWriteOptions;
import com.google.cloud.teleport.templates.common.DatastoreConverters.WriteJsonEntities;
import com.google.cloud.teleport.templates.common.ErrorConverters.ErrorWriteOptions;
import com.google.cloud.teleport.templates.common.ErrorConverters.LogErrors;
import com.google.cloud.teleport.templates.common.FirestoreNestedValueProvider;
import com.google.cloud.teleport.templates.common.JavascriptTextTransformer.JavascriptTextTransformerOptions;
import com.google.cloud.teleport.templates.common.JavascriptTextTransformer.TransformTextViaJavascript;
import com.google.cloud.teleport.templates.common.TextConverters.FilesystemReadOptions;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.ValueProvider;
import org.apache.beam.sdk.values.TupleTag;
/**
* Dataflow template which reads from a Text Source and writes JSON encoded Entities into Datastore.
* The JSON is expected to be in the format of: <a
* href="https://cloud.google.com/datastore/docs/reference/rest/v1/Entity">Entity</a>
*
* <p>Check out <a
* href="https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v1/README_GCS_Text_to_Datastore.md">README
* Datastore</a> or <a
* href="https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v1/README_GCS_Text_to_Firestore.md">README
* Firestore</a> for instructions on how to use or modify this template.
*/
@MultiTemplate({
@Template(
name = "GCS_Text_to_Datastore",
category = TemplateCategory.LEGACY,
displayName = "Text Files on Cloud Storage to Datastore [Deprecated]",
description =
"The Cloud Storage Text to Datastore template is a batch pipeline that reads from text files stored in "
+ "Cloud Storage and writes JSON encoded Entities to Datastore. "
+ "Each line in the input text files must be in the <a href=\"https://cloud.google.com/datastore/docs/reference/rest/v1/Entity\">specified JSON format</a>.",
optionsClass = TextToDatastoreOptions.class,
skipOptions = {
"firestoreWriteProjectId",
"firestoreWriteEntityKind",
"firestoreWriteNamespace",
"firestoreHintNumWorkers",
"javascriptTextTransformReloadIntervalMinutes",
// This template doesn't use neither firestoreWriteEntityKind/Namespace
// nor datastoreWriteEntityKind/Namespace pipeline options.
"datastoreWriteEntityKind",
"datastoreWriteNamespace"
},
documentation =
"https://cloud.google.com/dataflow/docs/guides/templates/provided/cloud-storage-to-datastore",
contactInformation = "https://cloud.google.com/support",
preview = true,
requirements = {"Datastore must be enabled in the destination project."}),
@Template(
name = "GCS_Text_to_Firestore",
category = TemplateCategory.BATCH,
displayName = "Text Files on Cloud Storage to Firestore (Datastore mode)",
description =
"The Cloud Storage Text to Firestore template is a batch pipeline that reads from text files stored in "
+ "Cloud Storage and writes JSON encoded Entities to Firestore. "
+ "Each line in the input text files must be in the <a href=\"https://cloud.google.com/datastore/docs/reference/rest/v1/Entity\">specified JSON format</a>.",
optionsClass = TextToDatastoreOptions.class,
skipOptions = {
"datastoreWriteProjectId",
"datastoreWriteEntityKind",
"datastoreWriteNamespace",
"datastoreHintNumWorkers",
"javascriptTextTransformReloadIntervalMinutes",
// This template doesn't use neither firestoreWriteEntityKind/Namespace
// nor datastoreWriteEntityKind/Namespace pipeline options.
"firestoreWriteEntityKind",
"firestoreWriteNamespace"
},
documentation =
"https://cloud.google.com/dataflow/docs/guides/templates/provided/cloud-storage-to-firestore",
contactInformation = "https://cloud.google.com/support",
requirements = {"Firestore must be enabled in the destination project."})
})
public class TextToDatastore {
public static <T> ValueProvider<T> selectProvidedInput(
ValueProvider<T> datastoreInput, ValueProvider<T> firestoreInput) {
return new FirestoreNestedValueProvider(datastoreInput, firestoreInput);
}
/** TextToDatastore Pipeline Options. */
public interface TextToDatastoreOptions
extends PipelineOptions,
FilesystemReadOptions,
JavascriptTextTransformerOptions,
DatastoreWriteOptions,
ErrorWriteOptions {}
/**
* Runs a pipeline which reads from a Text Source, passes the Text to a Javascript UDF, writes the
* JSON encoded Entities to a TextIO sink.
*
* <p>If your Text Source does not contain JSON encoded Entities, then you'll need to supply a
* Javascript UDF which transforms your data to be JSON encoded Entities.
*
* @param args arguments to the pipeline
*/
public static void main(String[] args) {
TextToDatastoreOptions options =
PipelineOptionsFactory.fromArgs(args).withValidation().as(TextToDatastoreOptions.class);
TupleTag<String> errorTag = new TupleTag<String>("errors") {};
Pipeline pipeline = Pipeline.create(options);
pipeline
.apply(TextIO.read().from(options.getTextReadPattern()))
.apply(
TransformTextViaJavascript.newBuilder()
.setFileSystemPath(options.getJavascriptTextTransformGcsPath())
.setFunctionName(options.getJavascriptTextTransformFunctionName())
.build())
.apply(
WriteJsonEntities.newBuilder()
.setProjectId(
selectProvidedInput(
options.getDatastoreWriteProjectId(), options.getFirestoreWriteProjectId()))
.setHintNumWorkers(
selectProvidedInput(
options.getDatastoreHintNumWorkers(), options.getFirestoreHintNumWorkers()))
.setErrorTag(errorTag)
.build())
.apply(
LogErrors.newBuilder()
.setErrorWritePath(options.getErrorWritePath())
.setErrorTag(errorTag)
.build());
pipeline.run();
}
}
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2024-03-29 UTC."],[],[]]