AvroIO (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class AvroIO



  • public class AvroIO
    extends Object
    PTransforms for reading and writing Avro files.

    To read a PCollection from one or more Avro files, use AvroIO.Read, specifying AvroIO.Read.from(java.lang.String) to specify the path of the file(s) to read from (e.g., a local filename or filename pattern if running locally, or a Google Cloud Storage filename or filename pattern of the form "gs://<bucket>/<filepath>"), and optionally AvroIO.Read.named(java.lang.String) to specify the name of the pipeline step.

    It is required to specify AvroIO.Read.withSchema(java.lang.Class<T>). To read specific records, such as Avro-generated classes, provide an Avro-generated class type. To read GenericRecords, provide either a Schema object or an Avro schema in a JSON-encoded string form. An exception will be thrown if a record doesn't match the specified schema.

    For example:

     
     Pipeline p = ...;
    
     // A simple Read of a local file (only runs locally):
     PCollection<AvroAutoGenClass> records =
         p.apply(AvroIO.Read.from("/path/to/file.avro")
                            .withSchema(AvroAutoGenClass.class));
    
     // A Read from a GCS file (runs locally and via the Google Cloud
     // Dataflow service):
     Schema schema = new Schema.Parser().parse(new File("schema.avsc"));
     PCollection<GenericRecord> records =
         p.apply(AvroIO.Read.named("ReadFromAvro")
                            .from("gs://my_bucket/path/to/records-*.avro")
                            .withSchema(schema));
      

    To write a PCollection to one or more Avro files, use AvroIO.Write, specifying AvroIO.Write.to(java.lang.String) to specify the path of the file to write to (e.g., a local filename or sharded filename pattern if running locally, or a Google Cloud Storage filename or sharded filename pattern of the form "gs://<bucket>/<filepath>"), and optionally AvroIO.Write.named(java.lang.String) to specify the name of the pipeline step.

    It is required to specify AvroIO.Write.withSchema(java.lang.Class<T>). To write specific records, such as Avro-generated classes, provide an Avro-generated class type. To write GenericRecords, provide either a Schema object or a schema in a JSON-encoded string form. An exception will be thrown if a record doesn't match the specified schema.

    For example:

     
     // A simple Write to a local file (only runs locally):
     PCollection<AvroAutoGenClass> records = ...;
     records.apply(AvroIO.Write.to("/path/to/file.avro")
                               .withSchema(AvroAutoGenClass.class));
    
     // A Write to a sharded GCS file (runs locally and via the Google Cloud
     // Dataflow service):
     Schema schema = new Schema.Parser().parse(new File("schema.avsc"));
     PCollection<GenericRecord> records = ...;
     records.apply(AvroIO.Write.named("WriteToAvro")
                               .to("gs://my_bucket/path/to/numbers")
                               .withSchema(schema)
                               .withSuffix(".avro"));
      

    Permissions

    Permission requirements depend on the PipelineRunner that is used to execute the Dataflow job. Please refer to the documentation of corresponding PipelineRunners for more details.


이 페이지가 도움이 되었나요? 평가를 부탁드립니다.

다음에 대한 의견 보내기...

도움이 필요하시나요? 지원 페이지를 방문하세요.