This page describes how to connect a private Cloud Data Fusion instance to a private Cloud SQL instance that hosts a MySQL database. A similar setup works for Postgres and SQL Server instances. Public connections are possible.
Recommended: For security reasons, use private instances of Cloud SQL and Cloud Data Fusion.
Create private instances in Cloud Data Fusion and Cloud SQL
To connect to a private Cloud SQL instance from a private Cloud Data Fusion instance, you use a proxy Compute Engine VM. A proxy is required because the Cloud SQL network is not directly peered with the Cloud Data Fusion network, and transitive peers cannot communicate with each other (see the VPC Network Peering overview).
To create the private instances, follow these steps:
If you don't already have one, create a private Cloud Data Fusion instance. This includes:
- Setting up a VPC network
- Allocating an IP range
Create a private Cloud SQL instance in the same VPC network as your Cloud Data Fusion instance.
After the instance is created, go to the Cloud SQL Instances page and expand the Instance details. In the Connect to this instance section, copy the IP and the connection name.
Create a private Compute Engine VM
To create the VM, you can enter the following commands in Cloud Shell or another environment where the Google Cloud SDK is installed.
The commands create a VM with both internal and public IP addresses
(--no-address
isn't specified in the sample VM creation command) so that the
VM can access the Container Registry to get the proxy image. You can use internal
IPs, but you also must
enable Private Google Access for
the subnet where the VM is located.
Export the following environmental variables:
export PROJECT=CUSTOMER_PROJECT export REGION=VM_REGION export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit 1 --uri --project=${PROJECT}| sed 's/.*\///'` export NETWORK=CUSTOMER_VPC_NETWORK_NAME export SUBNET=CUSTOMER_VPC_SUBNET_NAME export INSTANCE_NAME=COMPUTE_ENGINE_VM_NAME export SQL_CONN=SQL_INSTANCE_CONNECTION_NAME export CDF_IP_RANGE=CLOUD_DATA_FUSION_IP_RANGE export VM_IMAGE=$(gcloud compute images list --project=$PROJECT --filter=family:cos-stable --format='value(selfLink.scope())' --limit=1) export SQL_PORT=DB_PORT # MySQL 3306 # PostgreSQL 5432 # SQLServer 1433
Replace the following:
CUSTOMER_PROJECT
: Your project's nameVM_REGION
: The region where the Compute Engine VM is locatedCUSTOMER_VPC_NETWORK_NAME
: Your network's nameCUSTOMER_VPC_SUBNET_NAME
: Your subnetwork's nameCOMPUTE_ENGINE_VM_NAME
: The name of the Compute Engine VMSQL_INSTANCE_CONNECTION_NAME
: The connection name from the previous stepCLOUD_DATA_FUSION_IP_RANGE
: The IP address rangeDB_PORT
: The port number of the Cloud SQL database
Create a firewall rule to allow Cloud Data Fusion ingress traffic with the following gcloud CLI command:
gcloud compute firewall-rules create allow-private-cdf \ --allow=tcp:22,tcp:${SQL_PORT} \ --source-ranges=$CDF_IP_RANGE --network=$NETWORK --project=$PROJECT
Create the VM with the following gcloud CLI command:
gcloud compute --project=${PROJECT} instances create ${INSTANCE_NAME} \ --zone=${ZONE} \ --machine-type=g1-small \ --subnet=${SUBNET} \ --metadata=startup-script="docker run -d -p 0.0.0.0:${SQL_PORT}:${SQL_PORT} gcr.io/cloudsql-docker/gce-proxy:latest /cloud_sql_proxy -instances=${SQL_CONN}=tcp:0.0.0.0:${SQL_PORT}" \ --maintenance-policy=MIGRATE \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --image=${VM_IMAGE} \ --image-project=cos-cloud
Get the VM internal IP with the following gcloud CLI command:
export IP=`gcloud compute \ --project=${PROJECT} instances describe ${INSTANCE_NAME} \ --zone ${ZONE} | grep "networkIP" | awk '{print $2}'`
Promote the VM internal IP to a static IP with the following gcloud CLI command:
gcloud compute --project=${PROJECT} addresses create mysql-proxy \ --addresses ${IP} --region ${REGION} --subnet ${SUBNET}
Get the IP to be used in Cloud Data Fusion MySQL JDBC connection string when you set up the connection:
echo ${IP}
After the VM is created with a static IP, use the IP as the host or enter it in the JDBC connection string to access the MySQL database from Cloud Data Fusion.
Access MySQL from within Cloud Data Fusion
Before you can connect to the MySQL instance from the Cloud Data Fusion instance, install the MySQL JDBC driver from the Cloud Data Fusion Hub (or directly from MySQL community downloads).
Enter the connection details for your MySQL database in Cloud Data Fusion. You can enter the details and test the connection on the Wrangler page.
Optional: Other ways to connect to MySQL
You can access MySQL in the following ways in Cloud Data Fusion:
- On the Studio page, click the MySQL source plugin and enter the connection details.
On the Studio page, instead use the Database source plugin, which has similar properties (Connection string, Username, Password, and Connection Arguments).
Other connections
Although connecting private instances is recommended, the following connections are possible:
- A public Cloud Data Fusion instance to a public Cloud SQL
instance (by using a public Cloud SQL proxy or allowlisting the
0.0.0.0/0
range in an authorized network). See the public IP connection option for Cloud SQL. - A private Cloud Data Fusion instance and public Cloud SQL instance (by using a public Cloud SQL proxy in an authorized network).
A public Cloud Data Fusion instance and a private Cloud SQL instance.