SDK and Worker Dependencies

The Apache Beam SDKs and Dataflow workers depend on common third-party components which then import additional dependencies. The dependencies of the Apache Beam SDKs are preinstalled in the default Dataflow runtime environments.

Some data processing use cases benefit from using additional libraries or classes. In these cases, you might need to manage your pipeline dependencies. For more information about managing dependencies, see Manage pipeline dependencies in Dataflow.

This page contains dependency and worker package information for Apache Beam and Dataflow SDK releases:

Apache Beam 2.x SDKs

SDK for Go

Dependency information for Apache Beam SDKs for Go is listed on the Apache Beam SDK for Go dependencies page.

SDK for Java

Dependency information for Apache Beam SDKs for Java is listed on the Apache Beam SDK for Java dependencies page.

SDK for Python

Dependency information for Apache Beam SDKs for Python is listed on the Apache Beam SDK for Python dependencies page.

Worker dependencies

This section applies to Apache Beam 2.49.0 and earlier. The following tables provide information about the Python dependencies installed on the Dataflow-built workers.

Dataflow 2.x SDKs

SDK for Java

To determine if your JAR is using a conflicting version, inspect the dependency tree of your project. You can generate the dependency tree with various tools, such as Maven.

Avoid specifying "latest" in your pom.xml for the libraries in the following table.

SDK for Python

Dataflow 1.x SDKs

SDK for Java