XmlSink (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class XmlSink



  • public class XmlSink
    extends Object
    A Sink that outputs records as XML-formatted elements. Writes a PCollection of records from JAXB-annotated classes to a single file location.

    Given a PCollection containing records of type T that can be marshalled to XML elements, this Sink will produce a single file consisting of a single root element that contains all of the elements in the PCollection.

    XML Sinks are created with a base filename to write to, a root element name that will be used for the root element of the output files, and a class to bind to an XML element. This class will be used in the marshalling of records in an input PCollection to their XML representation and must be able to be bound using JAXB annotations (checked at pipeline construction time).

    XML Sinks can be written to using the Write transform:

     p.apply(Write.to(
          XmlSink.ofRecordClass(Type.class)
              .withRootElementName(root_element)
              .toFilenamePrefix(output_filename)));
     

    For example, consider the following class with JAXB annotations:

       @XmlRootElement(name = "word_count_result")
       @XmlType(propOrder = {"word", "frequency"})
      public class WordFrequency {
        private String word;
        private long frequency;
    
        public WordFrequency() { }
    
        public WordFrequency(String word, long frequency) {
          this.word = word;
          this.frequency = frequency;
        }
    
        public void setWord(String word) {
          this.word = word;
        }
    
        public void setFrequency(long frequency) {
          this.frequency = frequency;
        }
    
        public long getFrequency() {
          return frequency;
        }
    
        public String getWord() {
          return word;
        }
      }
     

    The following will produce XML output with a root element named "words" from a PCollection of WordFrequency objects:

     p.apply(Write.to(
      XmlSink.ofRecordClass(WordFrequency.class)
          .withRootElement("words")
          .toFilenamePrefix(output_file)));
     

    The output of which will look like:

     
     <words>
    
      <word_count_result>
        <word>decreased</word>
        <frequency>1</frequency>
      </word_count_result>
    
      <word_count_result>
        <word>War</word>
        <frequency>4</frequency>
      </word_count_result>
    
      <word_count_result>
        <word>empress'</word>
        <frequency>14</frequency>
      </word_count_result>
    
      <word_count_result>
        <word>stoops</word>
        <frequency>6</frequency>
      </word_count_result>
    
      ...
     </words>
     


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow
Need help? Visit our support page.