[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[[["\u003cp\u003eThe Redshift source connector in Cloud Data Fusion allows users to sync tables from an Amazon Redshift dataset to destinations like BigQuery, and supports configurable SQL queries for data import.\u003c/p\u003e\n"],["\u003cp\u003eCloud Data Fusion versions 6.9.0 and later support the Redshift source connector, allowing users to select existing reusable connections or create new one-time connections.\u003c/p\u003e\n"],["\u003cp\u003eWhen configuring the Redshift connector, users can input connection details, including JDBC driver name, host, port, username, password, and database name, or use a pre-existing connection to auto populate credentials and schema.\u003c/p\u003e\n"],["\u003cp\u003eThe connector properties include options for specifying an import query, bounding query, split column, and number of splits, allowing for detailed customization of the data import process.\u003c/p\u003e\n"],["\u003cp\u003eBest practices for using the Redshift connector include enabling IP address allowlists for security and using bounding queries to manage multi-node Redshift clusters for efficient data distribution across multiple nodes.\u003c/p\u003e\n"]]],[],null,["# Redshift batch source\n\nThis page describes how to load data from an Amazon Redshift instance into\nGoogle Cloud with Cloud Data Fusion. The Redshift source connector lets you sync\ntables from your Redshift dataset to your destination, such as\nBigQuery. The connector also lets you create a configurable SQL query.\n\nBefore you begin\n----------------\n\n- Cloud Data Fusion versions 6.9.0 and later support the Redshift source.\n- When you configure the Redshift source connector, you can select an\n existing, reusable connection, or create a new, one-time connection. For\n more information, see [Manage connections](/data-fusion/docs/how-to/managing-connections). When you reuse a connection, note the\n following:\n\n - You don't have to provide credentials.\n - The existing connection provides the schema and table name information that's used to generate the import query.\n\nConfigure the plugin\n--------------------\n\n1. [Go to the Cloud Data Fusion web interface](/data-fusion/docs/create-data-pipeline#navigate_the_web_interface)\n and click **Studio**.\n\n2. Check that **Data Pipeline - Batch** is selected (not **Realtime**).\n\n3. In the **Source** menu, click **Redshift** . The Redshift node appears in\n your pipeline. If you don't see the Redshift source on the **Studio** page,\n [deploy the Redshift source connector from the Cloud Data Fusion Hub](/data-fusion/docs/how-to/deploy-a-plugin).\n\n4. To configure the source, go to the Redshift node and click **Properties**.\n\n5. Enter the following properties. For a complete list, see\n [Properties](#properties).\n\n 1. Enter a label for the Redshift node---for example, `Redshift\n tables`.\n 2. Enter the connection details. You can set up a new, one-time connection,\n or an existing, reusable connection.\n\n ### New connection\n\n\n To add a one-time connection to Redshift, follow these steps:\n 1. Keep **Use connection** turned off.\n 2. In the **JDBC driver name** field, enter the name of the driver. Redshift supports two types of JDBC drivers: CData and [Amazon](https://docs.aws.amazon.com/redshift/latest/mgmt/configuring-connections.html). For more information, see [Upload a JDBC driver](/data-fusion/docs/how-to/using-jdbc-drivers).\n 3. In the **Host** field, enter the endpoint of the Redshift cluster---for example, `cdf-redshift-new.example-endpoint.eu-west-1.redshift.amazonaws.com`.\n 4. Optional: In the **Port** field, enter a database port number---for example, `5439`.\n 5. If your Redshift database requires authentication, do the\n following:\n\n 1. In the **Username** field, enter the name for the database.\n 2. In the **Password** field, enter the password for the database.\n 3. Optional: In the **Arguments** field, enter key value arguments. To use the CData driver, provide the connection arguments, such as RTK or OEMKey, if applicable.\n 4. In the **Name** field, enter a name---for example, `SN-PC-Source-01-01-2024`.\n 5. Enter the target database name in the **Database** field---for example, `datafusiondb`.\n\n ### Reusable connection\n\n\n To reuse an existing connection, follow these steps:\n 1. Turn on **Use connection**.\n 2. Click **Browse connections**.\n 3. Click the connection name.\n\n | **Note:** For more information about adding, importing, and editing the connections that appear when you browse connections, see [Manage connections](/data-fusion/docs/how-to/managing-connections).\n 4. Optional: If a connection doesn't exist and you want to create a\n new reusable connection, click **Add connection** and refer to the\n steps in the **New connection** tab on this\n page.\n\n 3. In the **Import query** field, enter a query using the schema and table\n names from your Redshift source---for example, `Select * from\n \"public\".\"users\"`.\n\n 4. Optional: Enter **Advanced** properties, such as a bounding\n query or number of splits. For all property descriptions, see\n [Properties](#properties).\n\n6. Optional: Click **Validate** and address any errors found.\n\n7. Click\n close\n **Close**. Properties are saved and you can continue to build your data\n pipeline in the Cloud Data Fusion web interface.\n\nProperties\n----------\n\nData type mappings\n------------------\n\nThe following table is a list of Redshift data types with corresponding CDAP\ntypes:\n\nBest practices\n--------------\n\nThe following best practices apply when you connect to a Redshift cluster from\nGoogle Cloud.\n\n### Use IP address allowlists\n\nTo prevent access from unauthorized sources and restrict access to specific IP\naddresses, enable access controls on the Redshift cluster.\n\nIf you use Redshift access controls, to access the cluster in\nCloud Data Fusion, follow these steps:\n\n1. Obtain the external IP addresses of the services or machines on Google Cloud that must connect to the Redshift cluster, such as the Proxy Server IP (see [Viewing IP\n addresses](/compute/docs/instances/view-network-properties#view_ip_addresses)). For Dataproc clusters, obtain the IP addresses of all master and child nodes.\n2. Add the IP addresses to an allowlist in the security groups by creating\n the inbound rules for the Google Cloud machine IP addresses.\n\n3. Add the connection properties in Wrangler and test them:\n\n 1. Open the Cloud Data Fusion instance in the web interface.\n 2. Click **Wrangler \\\u003e Add connection** and create the new connection for Redshift.\n 3. Enter all connection properties.\n 4. Click **Test connection** and resolve any issues.\n\n### To create multiple splits, use bounding queries\n\nFor multiple splits, use bounding queries to manage the multi-node cluster. In\nscenarios where you extract data from Redshift and distribute the load uniformly\nacross each node, configure a bounding query in the Redshift source connector\nproperties.\n\n1. In your Cloud Data Fusion pipeline on the **Studio** page, go to the Redshift node and click **Properties**.\n2. In the **Advanced** properties, specify the following:\n\n 1. Enter the number of splits to create.\n 2. Enter the fetch size for each split.\n 3. Enter a bounding query to apply to the multi-node Redshift cluster.\n 4. Enter the **Split column** field name.\n\nFor example, assume you have the following use case:\n\n- You have a table that contains 10 million records.\n- It has a unique ID column called `id`.\n- The Redshift cluster has 4 nodes.\n- **Objective**: To take advantage of the cluster's potential, you plan to\n generate multiple splits. To achieve this, use the following property\n configurations:\n\n - In the **Bounding query** field, enter the following query:\n\n SELECT MIN(id), MAX(id) FROM tableName\n\n In this query, `id` is the name of the column where the splits are\n applied.\n - In the **Split column** field, enter the column name, `id`.\n\n - Enter the number of splits and fetch size. These properties are\n interconnected, letting you calculate splits based on a fetch size, or the\n other way around. For this example, enter the following.\n\n | **Important:** A larger fetch size can lead to a faster import. The tradeoff is the increased memory usage.\n - In the **Number of splits** field, enter `40`. In this example, where the\n table has ten million records, creating 40 splits results in each\n split containing 250,000 records.\n\n - In the **Fetch size** field, enter `250,000`.\n\n| **Note:** A bounding query doesn't guarantee absolute uniformity. Some splits might contain more records, based on the values in the corresponding column.\n\nWhat's next\n-----------\n\n- Look through the [Cloud Data Fusion plugins](/data-fusion/plugins)."]]