AvroCoder (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1


Class AvroCoder<T>

  • Type Parameters:
    T - the type of elements handled by this coder
    All Implemented Interfaces:
    Coder<T>, Serializable

    public class AvroCoder<T>
    extends StandardCoder<T>
    A Coder using Avro binary format.

    Each instance of AvroCoder<T> encapsulates an Avro schema for objects of type T.

    The Avro schema may be provided explicitly via of(Class, Schema) or omitted via of(Class), in which case it will be inferred using Avro's ReflectData.

    For complete details about schema generation and how it can be controlled please see the org.apache.avro.reflect package. Only concrete classes with a no-argument constructor can be mapped to Avro records. All inherited fields that are not static or transient are included. Fields are not permitted to be null unless annotated by Nullable or a Union schema containing "null".

    To use, specify the Coder type on a PCollection:

     PCollection<MyCustomElement> records =

    or annotate the element class using @DefaultCoder.

     public class MyCustomElement {

    The implementation attempts to determine if the Avro encoding of the given type will satisfy the criteria of Coder.verifyDeterministic() by inspecting both the type and the Schema provided or generated by Avro. Only coders that are deterministic can be used in GroupByKey operations.

    See Also:
    Serialized Form
    • Constructor Detail

      • AvroCoder

        protected AvroCoder(Class<T> type,
                            Schema schema)
    • Method Detail

      • of

        public static <T> AvroCoder<T> of(TypeDescriptor<T> type)
        Returns an AvroCoder instance for the provided element type.
        Type Parameters:
        T - the element type
      • of

        public static <T> AvroCoder<T> of(Class<T> clazz)
        Returns an AvroCoder instance for the provided element class.
        Type Parameters:
        T - the element type
      • of

        public static AvroCoder<GenericRecord> of(Schema schema)
        Returns an AvroCoder instance for the Avro schema. The implicit type is GenericRecord.
      • of

        public static <T> AvroCoder<T> of(Class<T> type,
                                          Schema schema)
        Returns an AvroCoder instance for the provided element type using the provided Avro schema.

        If the type argument is GenericRecord, the schema may be arbitrary. Otherwise, the schema must correspond to the type provided.

        Type Parameters:
        T - the element type
      • getEncodingId

        public String getEncodingId()
        The encoding identifier is designed to support evolution as per the design of Avro In order to use this class effectively, carefully read the Avro documentation at Schema Resolution to ensure that the old and new schema match.

        In particular, this encoding identifier is guaranteed to be the same for AvroCoder instances of the same principal class, and otherwise distinct. The schema is not included in the identifier.

        When modifying a class to be encoded as Avro, here are some guidelines; see the above link for greater detail.

        • Avoid changing field names.
        • Never remove a required field.
        • Only add optional fields, with sensible defaults.
        • When changing the type of a field, consult the Avro documentation to ensure the new and old types are interchangeable.

        Code consuming this message class should be prepared to support all versions of the class until it is certain that no remaining serialized instances exist.

        If backwards incompatible changes must be made, the best recourse is to change the name of your class.

        Specified by:
        getEncodingId in interface Coder<T>
        getEncodingId in class StandardCoder<T>
      • getType

        public Class<T> getType()
        Returns the type this coder encodes/decodes.
      • encode

        public void encode(T value,
                           OutputStream outStream,
                           Coder.Context context)
                    throws IOException
        Description copied from interface: Coder
        Encodes the given value of type T onto the given output stream in the given context.
        IOException - if writing to the OutputStream fails for some reason
        CoderException - if the value could not be encoded for some reason
      • decode

        public T decode(InputStream inStream,
                        Coder.Context context)
                 throws IOException
        Description copied from interface: Coder
        Decodes a value of type T from the given input stream in the given context. Returns the decoded value.
        IOException - if reading from the InputStream fails for some reason
        CoderException - if the value could not be decoded for some reason
      • getCoderArguments

        public List<? extends Coder<?>> getCoderArguments()
        Description copied from interface: Coder
        If this is a Coder for a parameterized type, returns the list of Coders being used for each of the parameters, or returns null if this cannot be done or this is not a parameterized type.
      • asCloudObject

        public com.google.cloud.dataflow.sdk.util.CloudObject asCloudObject()
        Description copied from interface: Coder
        Returns the CloudObject that represents this Coder.
        Specified by:
        asCloudObject in interface Coder<T>
        asCloudObject in class StandardCoder<T>
      • verifyDeterministic

        public void verifyDeterministic()
                                 throws Coder.NonDeterministicException
        Description copied from interface: Coder
        Throw Coder.NonDeterministicException if the coding is not deterministic.

        In order for a Coder to be considered deterministic, the following must be true:

        • two values that compare as equal (via Object.equals() or Comparable.compareTo(), if supported) have the same encoding.
        • the Coder always produces a canonical encoding, which is the same for an instance of an object even if produced on different computers at different times.

        NonDeterministicException - when the type may not be deterministically encoded using the given Schema, the directBinaryEncoder, and the ReflectDatumWriter or GenericDatumWriter.
        Coder.NonDeterministicException - if this coder is not deterministic.
      • createDatumReader

        public DatumReader<T> createDatumReader()
        Deprecated. For AvroCoder internal use only.
        Returns a new DatumReader that can be used to read from an Avro file directly. Assumes the schema used to read is the same as the schema that was used when writing.
      • createDatumWriter

        public DatumWriter<T> createDatumWriter()
        Deprecated. For AvroCoder internal use only.
        Returns a new DatumWriter that can be used to write to an Avro file directly.
      • getSchema

        public Schema getSchema()
        Returns the schema used by this coder.