PostgreSQL transfers
The BigQuery Data Transfer Service for PostgreSQL connector helps users create on-demand and recurring data transfer jobs to transfer data from their PostgreSQL instance into BigQuery. The connector supports PostgreSQL instances hosted in your on-premises environment, Cloud SQL, as well as other public cloud providers such as Amazon Web Services (AWS) and Microsoft Azure.
Limitations
PostgreSQL data transfers are subject to following limitations:
- The maximum number of simultaneous transfer runs to a single PostgreSQL database is determined by the maximum number of concurrent connections supported by the PostgreSQL database. The number of concurrent transfer jobs should be limited to a value less than the maximum number of concurrent connections supported by the PostgreSQL database.
- In PostgreSQL, some data types are mapped to the string type in BigQuery to avoid any data loss. For example, numeric types defined in PostgreSQL that don't have a defined precision and scale are mapped to the string type in BigQuery.
Before you begin
- Create a user in the PostgreSQL database.
- Verify that you have completed all the actions that are required to enable the BigQuery Data Transfer Service.
- Create a BigQuery dataset to store your data.
- Ensure you have the required roles to complete the tasks in this document.
Required roles
If you intend to set up transfer run notifications for Pub/Sub,
ensure that you have the pubsub.topics.setIamPolicy
Identity and Access Management (IAM)
permission. Pub/Sub permissions are not required if you only set up
email notifications. For more information, see
BigQuery Data Transfer Service run notifications.
To get the permissions that you need to create a PostgreSQL data transfer,
ask your administrator to grant you the
BigQuery Admin (roles/bigquery.admin
) IAM role on your project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to create a PostgreSQL data transfer. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to create a PostgreSQL data transfer:
-
bigquery.transfers.update
-
bigquery.datasets.get
You might also be able to get these permissions with custom roles or other predefined roles.
Network connections
If a public IP address is not available for the PostgreSQL database connection, you must set up a network attachment.
For detailed instructions on the required network setup, refer to the following documents:
- If you're transferring from Cloud SQL, see Configure Cloud SQL instance access.
- If you're transferring from AWS, see Set up the AWS-Google Cloud VPN and network attachment.
- If you're transferring from Azure, see Set up the Azure-Google Cloud VPN and network attachment.
Set up a PostgreSQL data transfer
Select one of the following options:
Console
Go to the Data transfers page.
Click
Create transfer.In the Source type section, for Source, select PostgreSQL.
In the Transfer config name section, for Display name, enter a name for the transfer. The transfer name can be any value that lets you identify the transfer if you need to modify it later.
In the Schedule options section, do the following:
- Select a repeat frequency. If you select the Hours, Days (default), Weeks, or Months option, you must also specify a frequency. You can also select the Custom option to create a more specific repeat frequency. If you select the On-demand option, this data transfer only runs when you manually trigger the transfer.
- If applicable, select either the Start now or Start at a set time option and provide a start date and run time.
In the Destination settings section, for Dataset, select the dataset that you created to store your data, or click Create new dataset and create one to use as the destination dataset.
In the Data source details section, do the following:
- For Network attachment, select an existing network attachment or click Create Network Attachment. For more information, see the Network connections section of this document.
- For Host, enter the hostname or IP address of the PostgreSQL database server.
- For Port number, enter the port number for the PostgreSQL database server.
- For Database name, enter the name of the PostgreSQL database.
- For Username, enter the username of the PostgreSQL user initiating the PostgreSQL database connection.
- For Password, enter the password of the PostgreSQL user initiating the PostgreSQL database connection.
For PostgreSQL objects to transfer, do one of the following:
- Click Browse to select the PostgreSQL tables that are required for the transfer, and then click Select.
- Manually enter the names of the tables in the PostgreSQL objects to transfer.
Optional: In the Service account menu, specify a custom service account to authorize the transfer. Ensure that the service account that is used has all the necessary roles and permissions. For more information, see Transfer owner as a service account.
Optional: In the Notification options section, do the following:
- To enable email notifications, click the Email notifications toggle to the on position. When you enable this option, the transfer administrator receives an email notification when a transfer run fails.
- To configure Pub/Sub run notifications for your transfer, click the Pub/Sub notifications toggle to the on position. You can select your topic name or click Create a topic to create one.
Click Save.
bq
Enter the bq mk
command
and supply the transfer creation flag --transfer_config
:
bq mk \ --transfer_config \ --project_id=PROJECT_ID \ --data_source=DATA_SOURCE \ --display_name=DISPLAY_NAME \ --target_dataset=DATASET \ --params='PARAMETERS'
Replace the following:
- PROJECT_ID (optional): your Google Cloud project ID.
If the
--project_id
flag isn't supplied to specify a particular project, the default project is used. - DATA_SOURCE: the data source, which is
postgresql
. - DISPLAY_NAME: the display name for the data transfer configuration. The transfer name can be any value that lets you identify the transfer if you need to modify it later.
- DATASET: the target dataset for the data transfer configuration.
PARAMETERS: the parameters for the created transfer configuration in JSON format. For example:
--params='{"param":"param_value"}'
. The following are the parameters for a PostgreSQL transfer:networkAttachment
(optional): the name of the network attachment to connect to the PostgreSQL database.connector.database
: the name of the PostgreSQL database.connector.endpoint.host
: the hostname or IP address of the database.connector.endpoint.port
: the port number of the database.connector.authentication.username
: the username of the database user.connector.authentication.password
: the password of the database user.assets
: a list of the names of the PostgreSQL tables to be transferred from the PostgreSQL database as part of the transfer.
For example, the following command creates a PostgreSQL
transfer called My Transfer
:
bq mk \ --transfer_config \ --target_dataset=mydataset \ --data_source=postgresql \ --display_name='My Transfer' \ --params='{"assets":["DB1/PUBLIC/DEPARTMENT","DB1/PUBLIC/EMPLOYEES"], \ "connector.authentication.username": "User1", \ "connector.authentication.password":"ABC12345", \ "connector.database":"DB1", \ "Connector.endpoint.host":"192.168.0.1", \ "Connector.endpoint.port":"1520"}'
API
Use the
projects.locations.transferConfigs.create
method
and supply an instance of the
TransferConfig
resource.
Data type mapping
The following table maps PostgreSQL data types to the corresponding BigQuery data types.
PostgreSQL data type | BigQuery data type |
---|---|
bigint |
INTEGER |
bigserial |
INTEGER |
bit(n) |
STRING |
bit varying(n) |
STRING |
boolean |
BOOLEAN |
bytea |
BYTES |
character |
STRING |
character varying |
STRING |
double precision |
FLOAT |
integer |
INTEGER |
numeric(precision, scale)/decimal(precision, scale) |
NUMERIC |
real |
FLOAT |
smallint |
INTEGER |
smallserial |
INTEGER |
serial |
INTEGER |
text |
INTEGER |
date |
DATE |
time [ (p) ] [ without timezone ] |
TIMESTAMP |
time [ (p) ] with time zone |
TIMESTAMP |
timestamp [ (p) ] [ without timezone ] |
TIMESTAMP |
timestamp [ (p) ] with time zone |
TIMESPAMP |
xml |
STRING |
tsquery |
STRING |
tsvector |
STRING |
uuid |
STRING |
box |
STRING |
cidr |
STRING |
circle |
STRING |
inet |
STRING |
interval |
STRING |
json |
STRING |
jsonb |
STRING |
line |
STRING |
lseg |
STRING |
macaddr |
STRING |
macaddr8 |
STRING |
money |
STRING |
path |
STRING |
point |
STRING |
polygon |
STRING |
Troubleshoot
If you are having issues setting up your data transfer, see PostgreSQL transfer issues.
What's next
- For an overview of the BigQuery Data Transfer Service, see What is BigQuery Data Transfer Service?.
- For information on using transfers, including getting information about a transfer configuration, listing transfer configurations, and viewing a transfer's run history, see Manage transfers.
- Learn how to load data with cross-cloud operations.