ProtoCoder (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

Class ProtoCoder<T extends>

  • Type Parameters:
    T - the Protocol Buffers Message handled by this Coder.
    All Implemented Interfaces:
    Coder<T>, Serializable

    public class ProtoCoder<T extends>
    extends AtomicCoder<T>
    A Coder using Google Protocol Buffers binary format. ProtoCoder supports both Protocol Buffers syntax versions 2 and 3.

    To learn more about Protocol Buffers, visit:

    ProtoCoder is registered in the global CoderRegistry as the default Coder for any Message object. Custom message extensions are also supported, but these extensions must be registered for a particular ProtoCoder instance and that instance must be registered on the PCollection that needs the extensions:

     import MyProtoFile;
     import MyProtoFile.MyMessage;
     Coder<MyMessage> coder = ProtoCoder.of(MyMessage.class).withExtensionsFrom(MyProtoFile.class);
     PCollection<MyMessage> records =  input.apply(...).setCoder(coder);


    ProtoCoder supports both versions 2 and 3 of the Protocol Buffers syntax. However, the Java runtime version of the library must match exactly the version of protoc that was used to produce the JAR files containing the compiled .proto messages.

    For more information, see the Protocol Buffers documentation.

    ProtoCoder and Determinism

    In general, Protocol Buffers messages can be encoded deterministically within a single pipeline as long as:

    • The encoded messages (and any transitively linked messages) do not use map fields.
    • Every Java VM that encodes or decodes the messages use the same runtime version of the Protocol Buffers library and the same compiled .proto file JAR.

    ProtoCoder and Encoding Stability

    When changing Protocol Buffers messages, follow the rules in the Protocol Buffers language guides for proto2 and proto3 syntaxes, depending on your message type. Following these guidelines will ensure that the old encoded data can be read by new versions of the code.

    Generally, any change to the message type, registered extensions, runtime library, or compiled proto JARs may change the encoding. Thus even if both the original and updated messages can be encoded deterministically within a single job, these deterministic encodings may not be the same across jobs.

    See Also:
    Serialized Form
    • Method Detail

      • of

        public static <T extends> ProtoCoder<T> of(Class<T> protoMessageClass)
        Returns a ProtoCoder for the given Protocol Buffers Message.
      • withExtensionsFrom

        public ProtoCoder<T> withExtensionsFrom(Iterable<Class<?>> moreExtensionHosts)
        Returns a ProtoCoder like this one, but with the extensions from the given classes registered.

        Each of the extension host classes must be an class automatically generated by the Protocol Buffers compiler, protoc, that contains messages.

        Does not modify this object.

      • encode

        public void encode(T value,
                           OutputStream outStream,
                           Coder.Context context)
                    throws IOException
        Description copied from interface: Coder
        Encodes the given value of type T onto the given output stream in the given context.
        IOException - if writing to the OutputStream fails for some reason
        CoderException - if the value could not be encoded for some reason
      • decode

        public T decode(InputStream inStream,
                        Coder.Context context)
                 throws IOException
        Description copied from interface: Coder
        Decodes a value of type T from the given input stream in the given context. Returns the decoded value.
        IOException - if reading from the InputStream fails for some reason
        CoderException - if the value could not be decoded for some reason
      • equals

        public boolean equals(Object other)
        Description copied from class: StandardCoder
        equals in class StandardCoder<T extends>
        true if the two StandardCoder instances have the same class and equal components.
      • hashCode

        public int hashCode()
        hashCode in class StandardCoder<T extends>
      • getEncodingId

        public String getEncodingId()
        The encoding identifier is designed to support evolution as per the design of Protocol Buffers. In order to use this class effectively, carefully follow the advice in the Protocol Buffers documentation at Updating A Message Type.

        In particular, the encoding identifier is guaranteed to be the same for ProtoCoder instances of the same principal message class, with the same registered extension host classes, and otherwise distinct. Note that the encoding ID does not encode any version of the message or extensions, nor does it include the message schema.

        When modifying a message class, here are the broadest guidelines; see the above link for greater detail.

        • Do not change the numeric tags for any fields.
        • Never remove a required field.
        • Only add optional or repeated fields, with sensible defaults.
        • When changing the type of a field, consult the Protocol Buffers documentation to ensure the new and old types are interchangeable.

        Code consuming this message class should be prepared to support all versions of the class until it is certain that no remaining serialized instances exist.

        If backwards incompatible changes must be made, the best recourse is to change the name of your Protocol Buffers message class.

        Specified by:
        getEncodingId in interface Coder<T extends>
        getEncodingId in class StandardCoder<T extends>
      • getMessageType

        public Class<T> getMessageType()
        Returns the Protocol Buffers Message type this ProtoCoder supports.
      • getExtensionRegistry

        public getExtensionRegistry()
        Returns the ExtensionRegistry listing all known Protocol Buffers extension messages to T registered with this ProtoCoder.
      • asCloudObject

        public asCloudObject()
        Description copied from interface: Coder
        Returns the CloudObject that represents this Coder.
        Specified by:
        asCloudObject in interface Coder<T extends>
        asCloudObject in class StandardCoder<T extends>

Send feedback about...

Cloud Dataflow