Release Notes: Dataflow SDK 2.x for Java

In January 2016, Google announced the donation of the Cloud Dataflow SDKs to the Apache Software Foundation as part of the Apache Beam project. The Dataflow SDKs are now based on Apache Beam.

The Dataflow SDK for Java 2.0.0 is the first stable 2.x release of the Dataflow SDK for Java, based on a subset of the Apache Beam code base. For information about version 1.x of the Dataflow SDK for Java, see the Dataflow SDK 1.x for Java release notes.

The support page contains information about the support status of each release of the Dataflow SDK.

To install and use the Dataflow SDK, see the Dataflow SDK installation guide.

Warning for users upgrading from Dataflow SDK for Java 1.x:
This is a new major version, and therefore comes with the following caveats.
* Breaking Changes: The Dataflow SDK 2.x for Java has a number of breaking changes from the 1.x series of releases. Please see below for details.
* Update Incompatibility: The Dataflow SDK 2.x for Java is update-incompatible with Dataflow 1.x. Streaming jobs using a Dataflow 1.x SDK cannot be updated to use a Dataflow 2.x SDK. Dataflow 2.x pipelines may only be updated across versions starting with SDK version 2.0.0.

Cloud Dataflow SDK distribution contents

The Cloud Dataflow SDK distribution contains a subset of the Apache Beam ecosystem. This subset includes the necessary components to define your pipeline and execute it locally and on the Cloud Dataflow service, such as:

  • The core SDK
  • DirectRunner and DataflowRunner
  • I/O components for other Google Cloud Platform services

The Cloud Dataflow SDK distribution does not include other Beam components, such as:

  • Runners for other distributed processing engines (such as Apache Spark or Apache Flink)
  • I/O components for non-Cloud Platform services (such as Apache Kafka)

If your use case requires any components that are not included, you can use the appropriate Apache Beam modules directly and still run your pipeline on the Cloud Dataflow service.

Release notes

This section provides each version's most relevant changes for Cloud Dataflow customers.

2.4.0 (March 30, 2018)

Version 2.4.0 is based on a subset of Apache Beam 2.4.0. See the Apache Beam 2.4.0 release announcement for additional change information.

Added new Wait.on() transform which allows general-purpose sequencing of transforms. Wait.on() is usable with both batch and streaming pipelines.

Increased scalability of watching for new files with I/O transforms such as TextIO.read().watchForNewFiles(). Tests successfully read up to 1 million files.

BigQueryIO.write() now supports column-based partitioning, which enables cheaper and faster loading of historical data into a time-partitioned table. Column-based partitioning uses one load job total, instead of one load job per partition.

Updated SpannerIO to use Cloud Spanner's BatchQuery API. The Cloud Spanner documentation contains more information about using the Cloud Dataflow Connector.

Updated Cloud Dataflow containers to support the Java Cryptography Extension (JCE) unlimited policy.

Removed deprecated method WindowedValue.valueInEmptyWindows().

Fixed a premature data discarding issue with ApproximateUnique.

Improved several stability, performance, and documentation issues.

2.3.0 (February 28, 2018)

Version 2.3.0 is based on a subset of Apache Beam 2.3.0. See the Apache Beam 2.3.0 release announcement for additional change information.

Dropped support for Java 7.

Added template support (ValueProvider parameters) to BigtableIO.

Added new general-purpose fluent API for writing to files: FileIO.write() and FileIO.writeDynamic(). DynamicDestinations APIs in TextIO and AvroIO are now deprecated.

MapElements and FlatMapElements now support side inputs.

Fixed issues in BigQueryIO when writing to tables with partition decorators.

2.2.0 (December 8, 2017)

Version 2.2.0 is based on a subset of Apache Beam 2.2.0. See the Apache Beam 2.2.0 release notes for additional change information.

Issues

Known issue: When running in batch mode, Gauge metrics are not reported.

Known issue: SQL support is not included in this release because it is experimental and not tested on Cloud Dataflow. Using SQL is not recommended.

Updates and improvements

Added the ability to set job labels in DataflowPipelineOptions.

Added support for reading Avro GenericRecords with BigQueryIO.

Added support for multi-byte custom separators to TextIO.

Added support for watching new files (TextIO.watchForNewFiles) to TextIO.

Added support for reading a PCollection of filenames (ReadAll) to TextIO and AvroIO.

Added support for disabling validation in BigtableIO.write().

Added support for using the setTimePartitioning method in BigQueryIO.write().

Improved several stability, performance, and documentation issues.

2.1.0 (September 1, 2017)

Version 2.1.0 is based on a subset of Apache Beam 2.1.0. See the Apache Beam 2.1.0 release notes for additional change information.

Issues

Known issue: When running in batch mode, Gauge metrics are not reported.

Updates and improvements

Added detection for potentially stuck code, which results in Processing lulls log entries.

Added Metrics support forDataflowRunner in streaming mode.

Added OnTimeBehavior to WindowinStrategy to control emitting of ON_TIME panes.

Added default file name policy for windowed file FileBasedSinks which consume windowed input.

Fixed an issue in which processing time timers for expired windows were ignored.

Fixed an issue in which DatastoreIO failed to make progress when Datastore was slow to respond.

Fixed an issue in which bzip2 files were being partially read; added support for concatenated bzip2 files.

Improved several stability, performance, and documentation issues.

2.0.0 (May 23, 2017)

Version 2.0.0 is based on a subset of Apache Beam 2.0.0. See the Apache Beam 2.0.0 release notes for additional change information.

Issues

Identified issue: When running in streaming mode, metrics are not reported in the Dataflow UI.

Identified issue: When running in batch mode, Gauge metrics are not reported.

Updates and improvements

Added support for using the Stackdriver Error Reporting Interface.

Added new API in BigQueryIO for writing into multiple tables, possibly with different schemas, based on data. See BigQueryIO.Write.to(SerializableFunction) and BigQueryIO.Write.to(DynamicDestinations).

Added new API for writing windowed and unbounded collections to TextIO and AvroIO. For example, see TextIO.Write.withWindowedWrites() and TextIO.Write.withFilenamePolicy(FilenamePolicy).

Added TFRecordIO to read and write TensorFlow TFRecord files.

Added the ability to automatically register CoderProviders in the default CoderRegistry. CoderProviders are registered by a ServiceLoader via concrete implementations of a CoderProviderRegistrar.

Added an additional Google API requirement. You must now also enable the Cloud Resource Manager API.

Changed order of parameters for ParDo with side inputs and outputs.

Changed order of parameters for MapElements and FlatMapElements transforms when specifying an output type.

Changed the pattern for reading and writing custom types to PubsubIO and KafkaIO.

Changed the syntax for reading to and writing from TextIO, AvroIO, TFRecordIO, KinesisIO, BigQueryIO.

Changed syntax for configuring windowing parameters other than the WindowFn itself using the Window transform.

Consolidated XmlSource and XmlSink into XmlIO.

Renamed CountingInput to GenerateSequence and unified the syntax for producing bounded and unbounded sequences.

Renamed BoundedSource#splitIntoBundles to #split.

Renamed UnboundedSource#generateInitialSplits to #split.

Output from @StartBundle is no longer possible. Instead of accepting a parameter of type Context, this method may optionally accept an argument of type StartBundleContext to access PipelineOptions.

Output from @FinishBundle now always requires an explicit timestamp and window. Instead of accepting a parameter of type Context, this method may optionally accept an argument of type FinishBundleContext to access PipelineOptions and emit output to specific windows.

XmlIO is no longer part of the SDK core. It must be added manually using the new xml-io package.

Replaced Aggregator API with the new Metrics API to create user-defined metrics/counters.


Note: All 2.0.0-beta versions are DEPRECATED.

2.0.0-beta3 (March 17, 2017)

Version 2.0.0-beta3 is based on a subset of Apache Beam 0.6.0.

Changed TextIO to only operate on strings.

Changed KafkaIO to specify type parameters explicitly.

Renamed factory functions of ToString.

Changed Count, Latest, Sample, SortValues transforms.

Renamed Write.Bound to Write.

Renamed Flatten transform classes.

Split GroupByKey.create method into create and createWithFewKeys methods.

2.0.0-beta2 (February 2, 2017)

Version 2.0.0-beta2 is based on a subset of Apache Beam 0.5.0.

Added PubsubIO functionality that allows the Read and Write transforms to provide access to Cloud Pub/Sub message attributes.

Added support for stateful pipelines via the new State API.

Added support for timer via the new Timer API. Support for this feature is limited to the DirectRunner in this release.

2.0.0-beta1 (January 7, 2017)

Version 2.0.0-beta1 is based on a subset of Apache Beam 0.4.0.

Improved compression: CompressedSource supports reading ZIP-compressed files. TextIO.Write and AvroIO.Write support compressed output.

Added functionality to AvroIO: Write supports the addition of custom user metadata.

Added functionality to BigQueryIO: Write splits large (> 12 TiB) bulk imports into multiple BigQuery load jobs, allowing it to handle very large datasets.

Added functionality to BigtableIO: Write supports unbounded PCollections and can be used in streaming mode in the DataflowRunner.

For complete details, please see the release notes for Apache Beam 0.3.0-incubating, 0.4.0, and 0.5.0.

Other Apache Beam modules from the corresponding version can be used with this distribution, including additional I/O connectors like Java Message Service (JMS), Apache Kafka, Java Database Connectivity (JDBC), MongoDB, and Amazon Kinesis. Please see the Apache Beam site for details.

Major changes from Dataflow SDK 1.x for Java

This release has a number of significant changes from the 1.x series of releases.

NOTE: All users need to read and understand these changes in order to upgrade to 2.x versions.

Package rename and restructuring

As part of generalizing Apache Beam to work well with environments beyond Google Cloud Platform, the SDK code has been renamed and restructured.

Renamed com.google.cloud.dataflow to org.apache.beam

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-78

The SDK is now declared in the package org.apache.beam instead of com.google.cloud.dataflow. You need to update all your import statements with this change.

New subpackages: runners.dataflow, runners.direct, and io.gcp

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-77

Runners have been reorganized into their own packages, so many things from com.google.cloud.dataflow.sdk.runners have moved into either org.apache.beam.runners.direct or org.apache.beam.runners.dataflow.

Pipeline options specific to running on the Dataflow service have moved from com.google.cloud.dataflow.sdk.options to org.apache.beam.runners.dataflow.options.

Most I/O connectors to Google Cloud Platform services have been moved into subpackages. For example, BigQueryIO has moved from com.google.cloud.dataflow.sdk.io to org.apache.beam.sdk.io.gcp.bigquery.

Most IDEs will be able to help identify the new locations. To verify the new location for specific files, you can use t to search the code on GitHub. The Dataflow SDK 1.x for Java releases are built from the GoogleCloudPlatform/DataflowJavaSDK repository (master-1.x branch). The Dataflow SDK 2.x for Java releases correspond to code from the apache/beam repository.

Runners

Removed Pipeline from Runner names

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1185

The names of all Runners have been shortened by removing Pipeline from the names. For example, DirectPipelineRunner is now DirectRunner, and DataflowPipelineRunner is now DataflowRunner.

Require setting --tempLocation to a Google Cloud Storage path

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-430

Instead of allowing you to specify only one of --stagingLocation or --tempLocation and then Dataflow inferring the other, the Dataflow Service now requires --gcpTempLocation to be set to a Google Cloud Storage path, but it can be inferred from the more general --tempLocation. Unless overridden, this will also be used for the --stagingLocation.

DirectRunner supports unbounded PCollections

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-243

The DirectRunner continues to run on a user's local machine, but now additionally supports multithreaded execution, unbounded PCollections, and triggers for speculative and late outputs. It more closely aligns to the documented Beam model, and may (correctly) cause additional unit tests failures.

As this functionality is now in the DirectRunner, the InProcessPipelineRunner (Dataflow SDK 1.6+ for Java) has been removed.

Replaced BlockingDataflowPipelineRunner with PipelineResult.waitToFinish()

Users affected: All | Impact: Compile error

The BlockingDataflowPipelineRunner is now removed. If your code programmatically expects to run a pipeline and wait until it has terminated, then it should use the DataflowRunner and explicitly call pipeline.run().waitToFinish().

If you used --runner BlockingDataflowPipelineRunner on the command line to interactively induce your main program to block until the pipeline has terminated, then this is a concern of the main program; it should provide an option such as --blockOnRun that will induce it to call waitToFinish().

Replaced TemplatingDataflowPipelineRunner with --templateLocation

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-551

The functionality in TemplatingDataflowPipelineRunner (Dataflow SDK 1.9+ for Java) has been replaced by using the --templateLocation with DataflowRunner.

ParDo and DoFn

DoFns use annotations instead of method overrides

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-37

In order to allow for more flexibility and customization, DoFn now uses method annotations to customize processing instead of requiring users to override specific methods.

The differences between the new DoFn and the old are demonstrated in the following code sample. Previously (with Dataflow SDK 1.x for Java), your code would look like this:

new DoFn<Foo, Baz>() {
  @Override
  public void processElement(ProcessContext c) { … }
}

Now (with Dataflow SDK 2.x for Java), your code will look like this:

new DoFn<Foo, Baz>() {
  @ProcessElement   // <-- This is the only difference
  public void processElement(ProcessContext c) { … }
}

If your DoFn accessed ProcessContext#window(), then there is a further change. Instead of this:

public class MyDoFn extends DoFn<Foo, Baz> implements RequiresWindowAccess {
  @Override
  public void processElement(ProcessContext c) {
    … (MyWindowType) c.window() …
  }
}

you will write this:

public class MyDoFn extends DoFn<Foo, Baz> {
  @ProcessElement
  public void processElement(ProcessContext c, MyWindowType window) {
    … window …
  }
}

or:

return new DoFn<Foo, Baz>() {
  @ProcessElement
  public void processElement(ProcessContext c, MyWindowType window) {
    … window …
  }
}

The runtime will automatically provide the window to your DoFn.

DoFns are reused across multiple bundles

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-378

In order to allow for performance improvements, the same DoFn may now be reused to process multiple bundles of elements, instead of guaranteeing a fresh instance per bundle. Any DoFn that keeps local state (e.g. instance variables) beyond the end of a bundle may encounter behavioral changes, as the next bundle will now start with that state instead of a fresh copy.

To manage the lifecycle, new @Setup and @Teardown methods have been added. The full lifecycle is as follows (while a failure may truncate the lifecycle at any point):

  • @Setup: Per-instance initialization of the DoFn, such as opening reusable connections.
  • Any number of the sequence:
    • @StartBundle: Per-bundle initialization, such as resetting the state of the DoFn.
    • @ProcessElement: The usual element processing.
    • @FinishBundle: Per-bundle concluding steps, such as flushing side effects.
  • @Teardown: Per-instance teardown of the resources held by the DoFn, such as closing reusable connections.

Note: This change is expected to have limited impact in practice. However, it does not generate a compile-time error and has the potential to silently cause unexpected results.

Changed order of parameters when specifying side inputs or outputs

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1422

The DoFn now always has to be specified first when applying a `ParDo. Instead of this:

foos.apply(ParDo
    .withSideInputs(sideA, sideB)
    .withOutputTags(mainTag, sideTag)
    .of(new MyDoFn()))

you will write this:

foos.apply(ParDo
    .of(new MyDoFn())
    .withSideInputs(sideA, sideB)
    .withOutputTags(mainTag, sideTag))

PTransforms

Removed .named()

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-370

Remove the .named() methods from PTransforms and sub-classes. Instead use PCollection.apply(“name”, PTransform).

Renamed PTransform.apply() to PTransform.expand()

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-438

PTransform.apply() was renamed to PTransform.expand() to avoid confusion with PCollection.apply(). All user-written composite transforms will need to rename the overridden apply() method to expand(). There is no change to how pipelines are constructed.

Additional breaking changes

The following is a list of additional breaking changes and upcoming changes.

Individual API changes

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-725

Removed the following GcpOptions: TokenServerUrl, CredentialDir, CredentialId, SecretsFile, ServiceAccountName, ServiceAccountKeyFile.

Use GoogleCredentials.fromStream(InputStream for credential). The stream can contain a Service Account key file in JSON format from the Google Developers Console or a stored user credential using the format supported by the Cloud SDK.

Changed --enableProfilingAgent to --saveProfilesToGcs

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1122

Moved --update to DataflowPipelineOptions

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-81

Move --update PipelineOption to DataflowPipelineOptions from DataflowPipelineDebugOptions.

Removed BoundedSource.producesSortedKeys()

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1201

Remove producesSortedKeys() from BoundedSource.

Changed PubsubIO API

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-974, BEAM-1415

Starting with 2.0.0-beta2, PubsubIO.Read and PubsubIO.Write must be instantiated using PubsubIO.<T>read() and PubsubIO.<T>write() instead of the static factory methods such as PubsubIO.Read.topic(String).

Methods for configuring PubsubIO have been renamed, e.g. PubsubIO.read().topic(String) was renamed to PubsubIO.read().fromTopic(). Likewise: subscription() to fromSubscription(), timestampLabel and idLabel respectively to withTimestampAttribute and withIdAttribute, PubsubIO.write().topic() to PubsubIO.write().to().

Instead of specifying Coder for parsing the message payload, PubsubIO exposes functions for reading and writing strings, Avro messages and Protobuf messages, e.g. PubsubIO.readStrings(), PubsubIO.writeAvros(). For reading and writing custom types, use PubsubIO.read/writeMessages() (and PubsubIO.readMessagesWithAttributes if message attributes should be included) and use ParDo or MapElements to convert between your custom type and PubsubMessage.

Removed DatastoreIO support for the unsupported v1beta2 API

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-354

DatastoreIO is now based on the Cloud Datastore API v1.

Changed DisplayData.Builder

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-745

DisplayData.Builder.include(..) has a new required path parameter for registering sub-component display data. Builder APIs now return a DisplayData.ItemSpec<>, rather than DisplayData.Item.

Required FileBasedSink.getWriterResultCoder()

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-887

Transformed FileBasedSink.getWriterResultCoder into an abstract method, which must be provided.

Renamed Filter.byPredicate() to Filter.by()

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-342

Removed IntraBundleParallelization

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-414

Renamed RemoveDuplicates to Distinct

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-239

Changed TextIO to use a different syntax and to only operate on strings

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1354

TextIO.Read.from() is changed to TextIO.read().from(), likewise TextIO.Write.to() is changed to TextIO.write().to().

TextIO.Read now always returns a PCollection<String> and does not take .withCoder() to parse the strings. Instead, parse the strings by applying a ParDo or MapElements to the collection. Likewise, TextIO.Write now always takes a PCollection<String>, and to write something else to TextIO, convert it to String using a ParDo or MapElements.

Changed AvroIO to use a different syntax

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1402

For reading and writing Avro generated types, AvroIO.Read.from().withSchema(Foo.class) is changed to AvroIO.read(Foo.class).from(), likewise AvroIO.Write.

For reading and writing Avro generic records using a specified schema, AvroIO.Read.from().withSchema(Schema or String) is changed to AvroIO.readGenericRecords().from(), likewise AvroIO.Write.

Changed KafkaIO to specify type parameters explicitly and use Kafka serializers/deserializers

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1573, BEAM-2221

In KafkaIO, you now have to specify the key and value type parameters explicitly: e.g. KafkaIO.<Foo, Bar>read() and KafkaIO.<Foo, Bar>write().

Instead of using Coder for interpreting key and value bytes, use the standard Kafka Serializer and Deserializer classes. E.g.: instead of KafkaIO.read().withKeyCoder(StringUtf8Coder.of()), use KafkaIO.read().withKeyDeserializer(StringDeserializer.class), likewise for KafkaIO.write().

Changed syntax for BigQueryIO

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1427

Instead of BigQueryIO.Read.from() and BigQueryIO.Write.to(), use BigQueryIO.read().from() and BigQueryIO.write().to().

Changed syntax for KinesisIO.Read

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1428

Instead of KinesisIO.Read.from().using(), use KinesisIO.read().from().withClientProvider().

Changed syntax for TFRecordIO

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1913

Instead of TFRecordIO.Read.from() and TFRecordIO.Write.to(), use TFRecordIO.read().from() and TFRecordIO.write().to().

Consolidated XmlSource and XmlSink under XmlIO

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1914

Instead of using XmlSource and XmlSink directly, use XmlIO.

E.g.: instead of Read.from(XmlSource.from()), use XmlIO.read().from(); instead of Write.to(XmlSink.writeOf()), use XmlIO.write().to().

Renamed CountingInput to GenerateSequence and generalized it

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1414

Instead of CountingInput.unbounded(), use GenerateSequence.from(0). Instead of CountingInput.upTo(n), use GenerateSequence.from(0).to(n).

Changed Count, Latest, Sample

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1417, BEAM-1421, BEAM-1423

Classes Count.PerElement, Count.PerKey, Count.Globally are now private, so you have to use the factory functions such as Count.perElement() (whereas previously you could use new Count.PerElement()). Additionally, if you want to e.g. use .withHotKeyFanout() on the result of the transform, then you can't do that directly on a result of .apply(Count.perElement()) and such anymore - instead Count exposes its combine function as Count.combineFn() and you should apply Combine.globally(Count.combineFn()) yourself.

Similar changes apply to the Latest and Sample transforms.

Changed order of parameters for MapElements and FlatMapElements

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1418

When using MapElements and FlatMapElements.via(SerializableFunction).withOutputType(TypeDescriptor), now the descriptor has to be specified first: e.g. FlatMapElements.into(descriptor).via(fn).

Changed Window when configuring additional parameters

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1425

When using Window to configure something other than the WindowFn itself (Window.into()), use Window.configure(). E.g.: instead of Window.triggering(...), use Window.configure().triggering(...).

Renamed Write.Bound to Write

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1416

Class Write.Bound is now simply Write. This matters only if you were extracting applications of Write.to(Sink) into a variable - its type used to be Write.Bound<...>, now it'll be Write<...>.

Renamed Flatten transform classes

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1419

Classes Flatten.FlattenIterables and Flatten.FlattenPCollectionList are renamed respectively to Flatten.Iterables and Flatten.PCollections.

Split GroupByKey.create(boolean) in two methods

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1420

GroupByKey.create(boolean fewKeys) is now simply GroupByKey.create() and GroupByKey.createWithFewKeys().

Changed SortValues

Users affected: All | Impact: Compile error | JIRA Issue: BEAM-1426

BufferedExternalSorter.Options setter methods are renamed from setSomeProperty to withSomeProperty.

Additional Google API dependency

Starting with SDK version 2.0.0, you must also enable the Cloud Resource Manager API.

Dependency upgrades

This release upgrades the pinned versions of most dependencies, including Avro, protobuf, and gRPC. Some of these dependencies may have made breaking changes of their own, which may cause issues if your code also directly depends on the dependency. Versions used in 2.0.0 can be found in the pom.xml or by using mvn dependency:tree.

Internal refactorings

There have been significant changes to the internal structure of the SDK. Any users who were relying on things beyond the public API (such as classes or methods ending in Internal or in util packages) may find that things have changed significantly.

  • If you were using StateInternals or TimerInternals: These internal APIs have been removed. You may now use the experimental State and Timer APIs for DoFn.
  • If you were using other Internal APIs: Please reach out to us at dataflow-feedback@google.com to discuss your use case.

Send feedback about...

Cloud Dataflow