Google Cloud Dataflow SDK for Java, version 1.9.1
Class AvroIO
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.io.AvroIO
-
public class AvroIO extends Object
PTransform
s for reading and writing Avro files.To read a
PCollection
from one or more Avro files, useAvroIO.Read
, specifyingAvroIO.Read.from(java.lang.String)
to specify the path of the file(s) to read from (e.g., a local filename or filename pattern if running locally, or a Google Cloud Storage filename or filename pattern of the form"gs://<bucket>/<filepath>"
), and optionallyAvroIO.Read.named(java.lang.String)
to specify the name of the pipeline step.It is required to specify
AvroIO.Read.withSchema(java.lang.Class<T>)
. To read specific records, such as Avro-generated classes, provide an Avro-generated class type. To readGenericRecords
, provide either aSchema
object or an Avro schema in a JSON-encoded string form. An exception will be thrown if a record doesn't match the specified schema.For example:
Pipeline p = ...; // A simple Read of a local file (only runs locally): PCollection<AvroAutoGenClass> records = p.apply(AvroIO.Read.from("/path/to/file.avro") .withSchema(AvroAutoGenClass.class)); // A Read from a GCS file (runs locally and via the Google Cloud // Dataflow service): Schema schema = new Schema.Parser().parse(new File("schema.avsc")); PCollection<GenericRecord> records = p.apply(AvroIO.Read.named("ReadFromAvro") .from("gs://my_bucket/path/to/records-*.avro") .withSchema(schema));
To write a
PCollection
to one or more Avro files, useAvroIO.Write
, specifyingAvroIO.Write.to(java.lang.String)
to specify the path of the file to write to (e.g., a local filename or sharded filename pattern if running locally, or a Google Cloud Storage filename or sharded filename pattern of the form"gs://<bucket>/<filepath>"
), and optionallyAvroIO.Write.named(java.lang.String)
to specify the name of the pipeline step.It is required to specify
AvroIO.Write.withSchema(java.lang.Class<T>)
. To write specific records, such as Avro-generated classes, provide an Avro-generated class type. To writeGenericRecords
, provide either aSchema
object or a schema in a JSON-encoded string form. An exception will be thrown if a record doesn't match the specified schema.For example:
// A simple Write to a local file (only runs locally): PCollection<AvroAutoGenClass> records = ...; records.apply(AvroIO.Write.to("/path/to/file.avro") .withSchema(AvroAutoGenClass.class)); // A Write to a sharded GCS file (runs locally and via the Google Cloud // Dataflow service): Schema schema = new Schema.Parser().parse(new File("schema.avsc")); PCollection<GenericRecord> records = ...; records.apply(AvroIO.Write.named("WriteToAvro") .to("gs://my_bucket/path/to/numbers") .withSchema(schema) .withSuffix(".avro"));
Permissions
Permission requirements depend on thePipelineRunner
that is used to execute the Dataflow job. Please refer to the documentation of correspondingPipelineRunner
s for more details.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
AvroIO.Read
A rootPTransform
that reads from an Avro file (or multiple Avro files matching a pattern) and returns aPCollection
containing the decoding of each record.static class
AvroIO.Write
A rootPTransform
that writes aPCollection
to an Avro file (or multiple Avro files matching a sharding pattern).
-