Stay organized with collections
Save and categorize content based on your preferences.
Data processing in Dataflow can be highly parallelized. Much of this
parallelism is handled automatically by Dataflow. I/O connectors
sit at the boundary between your pipeline and other parts of your architecture,
such as file storage, databases, and messaging systems. As such, I/O connectors
often have specific considerations for achieving parallelism.
General best practices
The following list describes general best practices for using I/O connectors in
Dataflow.
Read the Javadoc, Pydoc, or Go documentation for the connectors in your
pipeline. For more information, see
I/O connectors
in the Apache Beam documentation.
Use the latest version of the Apache Beam SDK. I/O connectors are
continually being improved, adding features and fixing known issues.
When developing a pipeline, it's important to balance the parallelism of the
job. If a job has too little parallelism, it can be slow, and data can build
up in the source. However, too much parallelism can overwhelm a sink with too
many requests.
Don't rely on the ordering of elements. In general, Dataflow
does not guarantee the order of elements in a collection.
If an I/O connector isn't available in your SDK of choice, consider using the
cross-language framework
to use an I/O connector from another SDK. In addition, connectors don't always
have feature parity between SDKs. If a connector from another SDK provides a
feature that you need, you can use it as a cross-language transform.
In general, writing custom I/O connectors is challenging. Use an existing
connector whenever possible. If you need to implement a custom I/O connector,
read
Developing a new I/O connector.
When performing writes from Dataflow to a connector, consider using
an ErrorHandler
to handle any failed writes or malformed reads. This type of error handling is
supported for the following Java I/Os in Apache Beam versions 2.55.0 and later: BigQueryIO,
BigtableIO, PubSubIO, KafkaIO, FileIO, TextIO, and AvroIO.
Best practices for individual I/O connectors
The following topics list best practices for individual I/O connectors:
The following table lists the Apache Beam I/O connectors supported by
Dataflow. For a full list of Apache Beam I/O connectors,
including those developed by the Apache Beam community and supported
by other runners, see
I/O connectors
in the Apache Beam documentation.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-26 UTC."],[[["\u003cp\u003eDataflow handles parallelism automatically, but I/O connectors require specific considerations for optimal parallel performance when interacting with external systems.\u003c/p\u003e\n"],["\u003cp\u003eUsing the latest version of the Apache Beam SDK is advised to leverage ongoing improvements in features and fixes for I/O connectors.\u003c/p\u003e\n"],["\u003cp\u003eBalancing job parallelism is crucial; too little can cause delays and data buildup, while too much can overwhelm data sinks with excess requests.\u003c/p\u003e\n"],["\u003cp\u003eDataflow does not guarantee the order of elements in a collection, so ordering should not be relied upon.\u003c/p\u003e\n"],["\u003cp\u003eAn ErrorHandler can be utilized when writing data to a connector to manage failed writes or malformed reads, and it is supported for several Java I/Os in Apache Beam versions 2.55.0 and later.\u003c/p\u003e\n"]]],[],null,["# Apache Beam I/O connector best practices\n\nData processing in Dataflow can be highly parallelized. Much of this\nparallelism is handled automatically by Dataflow. I/O connectors\nsit at the boundary between your pipeline and other parts of your architecture,\nsuch as file storage, databases, and messaging systems. As such, I/O connectors\noften have specific considerations for achieving parallelism.\n\nGeneral best practices\n----------------------\n\nThe following list describes general best practices for using I/O connectors in\nDataflow.\n\n- Read the Javadoc, Pydoc, or Go documentation for the connectors in your\n pipeline. For more information, see\n [I/O connectors](https://beam.apache.org/documentation/io/connectors/)\n in the Apache Beam documentation.\n\n- Use the latest version of the Apache Beam SDK. I/O connectors are\n continually being improved, adding features and fixing known issues.\n\n- When developing a pipeline, it's important to balance the parallelism of the\n job. If a job has too little parallelism, it can be slow, and data can build\n up in the source. However, too much parallelism can overwhelm a sink with too\n many requests.\n\n- Don't rely on the ordering of elements. In general, Dataflow\n does not guarantee the order of elements in a collection.\n\n- If an I/O connector isn't available in your SDK of choice, consider using the\n [cross-language framework](https://beam.apache.org/documentation/programming-guide/#use-x-lang-transforms)\n to use an I/O connector from another SDK. In addition, connectors don't always\n have feature parity between SDKs. If a connector from another SDK provides a\n feature that you need, you can use it as a cross-language transform.\n\n- In general, writing custom I/O connectors is challenging. Use an existing\n connector whenever possible. If you need to implement a custom I/O connector,\n read\n [Developing a new I/O connector](https://beam.apache.org/documentation/io/developing-io-overview/).\n\n- If a pipeline fails, check for errors logged by I/O connectors. See\n [Troubleshoot Dataflow errors](/dataflow/docs/guides/common-errors).\n\n- When performing writes from Dataflow to a connector, consider using\n an [ErrorHandler](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/errorhandling/ErrorHandler.html)\n to handle any failed writes or malformed reads. This type of error handling is\n supported for the following Java I/Os in Apache Beam versions 2.55.0 and later: BigQueryIO,\n BigtableIO, PubSubIO, KafkaIO, FileIO, TextIO, and AvroIO.\n\nBest practices for individual I/O connectors\n--------------------------------------------\n\nThe following topics list best practices for individual I/O connectors:\n\nGoogle-supported I/O connectors\n-------------------------------\n\nThe following table lists the Apache Beam I/O connectors supported by\nDataflow. For a full list of Apache Beam I/O connectors,\nincluding those developed by the Apache Beam community and supported\nby other runners, see\n[I/O connectors](https://beam.apache.org/documentation/io/connectors/)\nin the Apache Beam documentation.\n\nWhat's next\n-----------\n\n- Read the Apache Beam documentation for [I/O connectors](https://beam.apache.org/documentation/io/connectors/)."]]