Google Cloud Dataflow SDK for Java, version 1.9.1
Package com.google.cloud.dataflow.sdk.io
Defines transforms for reading and writing common storage formats, including
AvroIO
,
BigQueryIO
, and
TextIO
.See: Description
-
Interface Summary Interface Description CompressedSource.DecompressingChannelFactory Factory interface for creating channels that decompress the content of an underlying channel.UnboundedSource.CheckpointMark A marker representing the progress and state of anUnboundedSource.UnboundedReader
. -
Class Summary Class Description AvroIO PTransform
s for reading and writing Avro files.AvroIO.Read A rootPTransform
that reads from an Avro file (or multiple Avro files matching a pattern) and returns aPCollection
containing the decoding of each record.AvroIO.Read.Bound<T> APTransform
that reads from an Avro file (or multiple Avro files matching a pattern) and returns a boundedPCollection
containing the decoding of each record.AvroIO.Write A rootPTransform
that writes aPCollection
to an Avro file (or multiple Avro files matching a sharding pattern).AvroIO.Write.Bound<T> APTransform
that writes a boundedPCollection
to an Avro file (or multiple Avro files matching a sharding pattern).AvroSource<T> AFileBasedSource
for reading Avro files.AvroSource.AvroReader<T> ABlockBasedSource.BlockBasedReader
for reading blocks from Avro files.BigQueryIO PTransform
s for reading and writing BigQuery tables.BigQueryIO.Read APTransform
that reads from a BigQuery table and returns aPCollection
ofTableRows
containing each of the rows of the table.BigQueryIO.Read.Bound BigQueryIO.Write BigQueryIO.Write.Bound APTransform
that can write either a bounded or unboundedPCollection
ofTableRows
to a BigQuery table.BlockBasedSource<T> ABlockBasedSource
is aFileBasedSource
where a file consists of blocks of records.BlockBasedSource.Block<T> ABlock
represents a block of records that can be read.BlockBasedSource.BlockBasedReader<T> AReader
that reads records from aBlockBasedSource
.BoundedSource<T> ASource
that reads a finite amount of input and, because of that, supports some additional operations.BoundedSource.BoundedReader<T> AReader
that reads a bounded amount of input and supports some additional operations, such as progress estimation and dynamic work rebalancing.CompressedSource<T> A Source that reads from compressed files.CompressedSource.CompressedReader<T> Reader for aCompressedSource
.CountingInput APTransform
that produces longs.CountingInput.BoundedCountingInput APTransform
that will produce a specified number ofLongs
starting from 0.CountingInput.UnboundedCountingInput CountingSource A source that produces longs.CountingSource.CounterMark The checkpoint for an unboundedCountingSource
is simply the last value produced.DatastoreIO Deprecated replaced byDatastoreIO
DatastoreIO.DatastoreReader ASource.Reader
over the records from a query of the datastore.DatastoreIO.Sink DatastoreIO.Source ASource
that reads the result rows of a Datastore query asEntity
objects.FileBasedSink<T> AbstractSink
for file-based output.FileBasedSink.FileBasedWriteOperation<T> AbstractSink.WriteOperation
that manages the process of writing to aFileBasedSink
.FileBasedSink.FileBasedWriter<T> AbstractSink.Writer
that writes a bundle to aFileBasedSink
.FileBasedSink.FileResult Result of a single bundle write.FileBasedSource<T> A common base class for all file-basedSource
s.FileBasedSource.FileBasedReader<T> Areader
that implements code common to readers ofFileBasedSource
s.OffsetBasedSource<T> ABoundedSource
that uses offsets to define starting and ending positions.OffsetBasedSource.OffsetBasedReader<T> ASource.Reader
that implements code common to readers of allOffsetBasedSource
s.PubsubIO Read and WritePTransform
s for Cloud Pub/Sub streams.PubsubIO.PubsubSubscription Class representing a Cloud Pub/Sub Subscription.PubsubIO.PubsubTopic Class representing a Cloud Pub/Sub Topic.PubsubIO.Read APTransform
that continuously reads from a Cloud Pub/Sub stream and returns aPCollection
ofStrings
containing the items from the stream.PubsubIO.Read.Bound<T> APTransform
that reads from a Cloud Pub/Sub source and returns a unboundedPCollection
containing the items from the stream.PubsubIO.Write PubsubIO.Write.Bound<T> PubsubUnboundedSink<T> A PTransform which streams messages to Pubsub.PubsubUnboundedSource<T> A PTransform which streams messages from Pubsub.Read APTransform
for reading from aSource
.Read.Bounded<T> PTransform
that reads from aBoundedSource
.Read.Builder Helper class for buildingRead
transforms.Read.Unbounded<T> PTransform
that reads from aUnboundedSource
.ShardNameTemplate Standard shard naming templates.Sink<T> ASink
represents a resource that can be written to using theWrite
transform.Sink.WriteOperation<T,WriteT> ASink.WriteOperation
defines the process of a parallel write of objects to a Sink.Sink.Writer<T,WriteT> A Writer writes a bundle of elements from a PCollection to a sink.Source<T> Base class for defining input formats and creating aSource
for reading the input.Source.Reader<T> The interface that readers of custom input sources must implement.TextIO PTransform
s for reading and writing text files.TextIO.Read APTransform
that reads from a text file (or multiple text files matching a pattern) and returns aPCollection
containing the decoding of each of the lines of the text file(s).TextIO.Read.Bound<T> APTransform
that reads from one or more text files and returns a boundedPCollection
containing one element for each line of the input files.TextIO.Write APTransform
that writes aPCollection
to text file (or multiple text files matching a sharding pattern), with each element of the input collection encoded into its own line.TextIO.Write.Bound<T> A PTransform that writes a bounded PCollection to a text file (or multiple text files matching a sharding pattern), with each PCollection element being encoded into its own line.UnboundedSource<OutputT,CheckpointMarkT extends UnboundedSource.CheckpointMark> ASource
that reads an unbounded amount of input and, because of that, supports some additional operations such as checkpointing, watermarks, and record ids.UnboundedSource.UnboundedReader<OutputT> AReader
that reads an unbounded amount of input.Write APTransform
that writes to aSink
.Write.Bound<T> APTransform
that writes to aSink
.XmlSink ASink
that outputs records as XML-formatted elements.XmlSink.Bound<T> AFileBasedSink
that writes objects as XML elements.XmlSink.XmlWriteOperation<T> Sink.WriteOperation
for XMLSink
s.XmlSink.XmlWriter<T> ASink.Writer
that can write objects as XML elements.XmlSource<T> A source that can be used to read XML files. -
Enum Summary Enum Description BigQueryIO.Write.CreateDisposition An enumeration type for the BigQuery create disposition strings.BigQueryIO.Write.WriteDisposition An enumeration type for the BigQuery write disposition strings.CompressedSource.CompressionMode Default compression types supported by theCompressedSource
.FileBasedSink.FileBasedWriteOperation.TemporaryFileRetention Options for handling of temporary output files.FileBasedSource.Mode A givenFileBasedSource
represents a file resource of one of these types.TextIO.CompressionType Possible text file compression types.
Package com.google.cloud.dataflow.sdk.io Description
Defines transforms for reading and writing common storage formats, including
AvroIO
,
BigQueryIO
, and
TextIO
.
The classes in this package provide Read
transforms that create PCollections
from existing storage:
PCollection<TableRow> inputData = pipeline.apply(
BigQueryIO.Read.named("Read")
.from("clouddataflow-readonly:samples.weather_stations");
and Write
transforms that persist PCollections to external storage:
PCollection<Integer> numbers = ...;
numbers.apply(TextIO.Write.named("WriteNumbers")
.to("gs://my_bucket/path/to/numbers"));