Overview
To use Datastream to create a stream from the source database to the destination, you must establish connectivity to the source database.
Datastream supports the IP allowlist, forward SSH tunnel, and VPC peering network connectivity methods.
Use the information in the following table to help you decide which method works best for you for your specific workload.
Networking method | Description | Advantages | Disadvantages |
---|---|---|---|
IP allowlist | Works by configuring the source database server to allow incoming connections from Datastream's external IP addresses. To find out the IP addresses for your regions, see IP allowlists and regions. |
|
|
Forward SSH tunnel |
Establish an encrypted connection over public networks between Datastream and the source, through a forward-SSH tunnel. Learn more about SSH tunnels. |
|
|
VPC peering | Works by creating a private connectivity configuration. Datastream uses this configuration to communicate with the data source over a private network. This communication happens through a Virtual Private Cloud (VPC) peering connection. |
|
|
Configure connectivity using IP allowlists
For Datastream to transfer data from a source database to a destination, Datastream first needs to connect to this database.
One way to configure this connectivity is through IP allowlists. Public IP connectivity is most appropriate when the source database is external to Google Cloud and has an externally accessible IPv4 address and TCP port.
If your source database is external to Google Cloud, then add Datastream's public IP addresses as an inbound firewall rule on the source network. In generic terms (your specific network settings may differ), do the following:
Open your source database machine's network firewall rules.
Create an inbound rule.
Set the IP address of the source database to Datastream's IP addresses.
Set the protocol to
TCP
.Set the port associated with the
TCP
protocol. The default values are:1521
for an Oracle database3306
for a MySQL database5432
for a PostgreSQL database1433
for a SQL Server database
Save the firewall rule, and then exit.
Use an SSH tunnel
The following steps describe how to set up connectivity to a source database using a forward SSH tunnel.
Step 1: Choose a host on which to terminate the tunnel
The first step to set up SSH tunnel access for your database is to choose the host that will be used to terminate the tunnel. The tunnel can be terminated on either the database host itself, or on a separate host (the tunnel server).
Use the database server
Terminating the tunnel on the database has the advantage of simplicity. There's one fewer host involved, so there are no additional machines and their associated costs. The disadvantage is that your database server might be on a protected network that doesn't have direct access from the internet.
Use a tunnel server
Terminating the tunnel on a separate server has the advantage of keeping your database server inaccessible from the internet. If the tunnel server is compromised, then it's one step removed from the database server. We recommend that you remove all non-essential software and users from the tunnel server and closely monitor it with tools, such as an intrusion detection system (IDS).
The tunnel server can be any Unix or Linux host that:
- Can be accessed from the internet using SSH.
- Can access the database.
Step 2: Create an IP allowlist
The second step to set up SSH tunnel access for your database is to allow network traffic to reach the tunnel server or the database host using SSH, which is generally on TCP port 22.
Allow network traffic from each of the IP addresses for the region where Datastream resources are created.
Step 3: Use the SSH tunnel
Provide the tunnel details in the connection profile configuration. For more information, see Create a connection profile.
To authenticate the SSH tunnel session, Datastream requires either the password for the tunnel account, or a unique private key. To use a unique private key, you can use OpenSSH or OpenSSL command-line tools to generate keys.
Datastream stores the private key securely as part of the Datastream connection profile configuration. You must add the public key manually to the bastion host's ~/.ssh/authorized_keys
file.
Generate private and public keys
You can generate SSH keys using the following method:
ssh-keygen
: An OpenSSH command-line tool to generate SSH key pairs.Useful flags:
-t
: Specifies the type of key to create, for example:ssh-keygen -t rsa
ssh-keygen -t ed25519
-b
: Specifies the key length in the key to create, for example:ssh-keygen -t rsa -b 2048
-y
: Reads a private OpenSSH format file and prints an OpenSSH public key to standard output.-f
: Specifies the filename of the key file, for example:ssh-keygen -y [-f KEY_FILENAME]
For more information about supported flags, see OpenBSD documentation.
You can generate a private PEM key using the following method:
openssl genpkey
: An OpenSSL command-line tool to generate a PEM private key.Useful flags:
algorithm
: Specifies the public key algorithm to use, for example:openssl genpkey -algorithm RSA
-out
: Specifies the filename to which to output the key, for example:openssl genpkey -algorithm RSA -out PRIVATE_KEY_FILENAME.pem
For more information about supported flags, see OpenSSL documentation.
Use private connectivity
Private connectivity is a connection between your VPC network and Datastream's private network, enabling Datastream to communicate with internal resources by using internal IP addresses. Using private connectivity establishes a dedicated connection on the Datastream network, meaning no other customers can share it.
If your source database is external to Google Cloud, then private connectivity enables Datastream to communicate with your database over VPN or Interconnect.
After a private connectivity configuration is created, a single configuration can service all streams in a project within a single region.
At a high-level, establishing private connectivity requires:
- An existing Virtual Private Cloud (VPC)
- An available IP range with a CIDR block of /29
If your project is using a shared VPC, then you'll also need to enable the Datastream and Google Compute Engine APIs, as well as grant permissions to Datastream's service account on the host project.
Learn more about how to create a private connectivity configuration.
What's next
- Learn more about private connecivity.
- Learn how to create a private connectivity configuration.