GoogleSQL for BigQuery supports the following federated query functions.
Function list
Name | Summary |
---|---|
EXTERNAL_QUERY
|
Executes a query on an external database and returns the results as a temporary table. |
EXTERNAL_QUERY
EXTERNAL_QUERY('connection_id', '''external_database_query'''[, 'options'])
Description
Executes a query on an external database and returns the results as a temporary table. The external database data type is converted to a GoogleSQL data type in the temporary result table with these data type mappings.
external_database_query
: The query to run on the external database.connection_id
: The ID of the connection resource. The connection resource contains settings for the connection between the external database and BigQuery. If you do not have a default project configured, prepend the project ID to the connection ID in following format:projects/PROJECT_ID/locations/LOCATION/connections/CONNECTION_ID
Replace the following:
- PROJECT_ID: The project ID.
- LOCATION: The location of the connection.
- CONNECTION_ID: The connection ID.
For example,
projects/example-project/locations/us/connections/sql-bq
.options
: An optional string of a JSON format map with key value pairs of option name and value (both are case sensitive).For example:
'{"default_type_for_decimal_columns":"numeric"}'
Supported options:
Option Name Description "default_type_for_decimal_columns" Can be "float64", "numeric", "bignumeric" or "string". With this option, the MySQL Decimal type or PostgreSQL Numeric type will be mapped to the provided BigQuery type. When this option is not provided, the MySQL Decimal type or PostgreSQL Numeric type will be mapped to BigQuery NUMERIC type. "query_execution_priority" Can be "low", "medium" or "high". Only supported in Spanner. Specifies priority for execution of the query. Execution priority is "medium" by default.
Additional notes:
- The
EXTERNAL_QUERY
function is usually used in aFROM
clause. - You can use the
EXTERNAL_QUERY()
function to access metadata about the external database. EXTERNAL_QUERY()
won't honor the ordering of the external query result, even if your external query includesORDER BY
.
Return Data Type
BigQuery table
Examples
Suppose you need the date of the first order for each of your customers to include in a report. This data is not currently in BigQuery but is available in your operational PostgreSQL database in . The following federated query example accomplishes this and includes 3 parts:
- Run the external query
SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id
in the operational PostgreSQL database to get the first order date for each customer through theEXTERNAL_QUERY()
function. - Join external query result table with customers table in
BigQuery by
customer_id
. - Select customer information and first order date.
SELECT
c.customer_id, c.name, SUM(t.amount) AS total_revenue, rq.first_order_date
FROM customers AS c
INNER JOIN transaction_fact AS t ON c.customer_id = t.customer_id
LEFT OUTER JOIN
EXTERNAL_QUERY(
'connection_id',
'''SELECT customer_id, MIN(order_date) AS first_order_date
FROM orders
GROUP BY customer_id'''
) AS rq
ON rq.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, rq.first_order_date;
You can use the EXTERNAL_QUERY()
function to query information_schema tables
to access database metadata, such as list all tables in the database or show
table schema. The following example information_schema queries work in both
MySQL and
PostgreSQL.
-- List all tables in a database.
SELECT *
FROM
EXTERNAL_QUERY(
'connection_id',
'''SELECT * FROM information_schema.tables'''
);
-- List all columns in a table.
SELECT *
FROM
EXTERNAL_QUERY(
'connection_id',
'''SELECT * FROM information_schema.columns WHERE table_name='x';'''
);
EXTERNAL_QUERY()
won't honor the ordering of the external query result, even
if your external query includes ORDER BY
. The following example query orders
rows by customer ID in the external database, but BigQuery
will not output the result rows in that order.
-- ORDER BY will not order rows.
SELECT *
FROM
EXTERNAL_QUERY(
'connection_id',
'''SELECT * FROM customers AS c ORDER BY c.customer_id'''
);
Data type mappings
When you execute a federated query, the data from the external database are converted to GoogleSQL types. Below are the data type mappings from MySQL to BigQuery and PostgreSQL to BigQuery.
Things to know about mapping:
- Most MySQL data types can be matched to the same BigQuery data
type, with a few exceptions such as
decimal
,timestamp
, andtime
. - PostgreSQL supports many non-standard data types which are not supported
in BigQuery, for example
money
,path
,uuid
,boxer
, and others. - The numeric data types in MySQL and PostgreSQL will be mapped to
BigQuery
NUMERIC
value by default. The BigQueryNUMERIC
value range is smaller than in MySQL and PostgreSQL. It can also be mapped toBIGNUMERIC
,FLOAT64
, orSTRING
with "default_type_for_decimal_columns" inEXTERNAL_QUERY
options.
Error handling
If your external query contains a data type that is unsupported in BigQuery, the query will fail immediately. You can cast the unsupported data type to a different MySQL / PostgreSQL data type that is supported. See unsupported data types for more information on how to cast.
MySQL to BigQuery type mapping
MySQL type | MySQL Description | BigQuery type | Type difference |
---|---|---|---|
Integer | |||
INT | 4 bytes, 2^32 - 1 | INT64 | |
TINYINT | 1 byte, 2^8 - 1 | INT64 | |
SMALLINT | 2 bytes, 2^16 - 1 | INT64 | |
MEDIUMINT | 3 bytes, 2^24 - 1 | INT64 | |
BIGINT | 8 bytes, 2^64 - 1 | INT64 | |
UNSIGNED BIGINT | 8 bytes, 2^64 - 1 | NUMERIC | |
Exact numeric | |||
DECIMAL (M,D) | A decimal represents by (M,D) where M is the total number of digits and D is the number of decimals. M <= 65 | NUMERIC, BIGNUMERIC, FLOAT64, or STRING |
DECIMAL (M,D) will to mapped to NUMERIC by default, or can be mapped to BIGNUMERIC, FLOAT64, or STRING with default_type_for_decimal_columns. |
Approximate numeric | |||
FLOAT (M,D) | 4 bytes, M <= 23 | FLOAT64 | |
DOUBLE (M,D) | 8 bytes, M <= 53 | FLOAT64 | |
Date and time | |||
TIMESTAMP | '1970-01-01 00:00:01'UTC to '2038-01-19 03:14:07' UTC. | TIMESTAMP | MySQL TIMESTAMP is retrieved as UTC timezone no matter where user call BigQuery |
DATETIME | '1000-01-01 00:00:00' to '9999-12-31 23:59:59' | DATETIME | |
DATE | '1000-01-01' to '9999-12-31'. | DATE | |
TIME | Time in 'HH:MM:SS' format '-838:59:59' to '838:59:59'. |
TIME |
BigQuery TIME range is smaller, from 00:00:00 to 23:59:59 |
YEAR | INT64 | ||
Character and strings | |||
ENUM | string object with a value chosen from a list of permitted values | STRING | |
CHAR (M) | A fixed-length string between 1 and 255 characters | STRING | |
VARCHAR (M) | A variable-length string between 1 and 255 characters in length. | STRING | |
TEXT | A field with a maximum length of 65535 characters. | STRING | |
TINYTEXT | TEXT column with a maximum length of 255 characters. | STRING | |
MEDIUMTEXT | TEXT column with a maximum length of 16777215 characters. | STRING | |
LONGTEXT | TEXT column with a maximum length of 4294967295 characters. | STRING | |
Binary | |||
BLOB | A binary large object with a maximum length of 65535 characters. | BYTES | |
MEDIUM_BLOB | A BLOB with a maximum length of 16777215 characters. | BYTES | |
LONG_BLOB | A BLOB with a maximum length of 4294967295 characters. | BYTES | |
TINY_BLOB | A BLOB with a maximum length of 255 characters. | BYTES | |
BINARY | A fixed-length binary string between 1 and 255 characters. | BYTES | |
VARBINARY | A variable-length binary string between 1 and 255 characters. | BYTES | |
Other | |||
SET | when declare SET column, predefine some values. Then INSERT any set of predefined values into this column | STRING |
|
GEOMETRY | GEOGRAPHY | NOT YET SUPPORTED | |
BIT | INT64 | NOT YET SUPPORTED |
PostgreSQL to BigQuery type mapping
Name | Description | BigQuery type | Type difference |
---|---|---|---|
Integer | |||
smallint | 2 bytes, -32768 to +32767 | INT64 | |
smallserial | See smallint | INT64 | |
integer | 4 bytes, -2147483648 to +2147483647 | INT64 | |
serial | See integer | INT64 | |
bigint | 8 bytes, -9223372036854775808 to 9223372036854775807 | INT64 | |
bigserial | See bigint | INT64 | |
Exact numeric | |||
numeric [ (p, s) ] | Precision up to 1,000. | NUMERIC, BIGNUMERIC, FLOAT64, or STRING | numeric [ (p, s) ] will to mapped to NUMERIC by default, or can be mapped to BIGNUMERIC, FLOAT64, or STRING with default_type_for_decimal_columns. |
Decimal [ (p, s) ] | See numeric | NUMERIC | See numeric |
money | 8 bytes, 2 digit scale, -92233720368547758.08 to +92233720368547758.07 | NOT SUPPORTED | |
Approximate numeric | |||
real | 4 bytes, single precision floating-point number | FLOAT64 | |
double precision | 8 bytes, double precision floating-point number | FLOAT64 | |
Date and time | |||
date | calendar date (year, month, day) | DATE | |
time [ (p) ] [ without time zone ] | time of day (no time zone) | TIME | |
time [ (p) ] with time zone | time of day, including time zone | NOT SUPPORTED | |
timestamp [ (p) ] [ without time zone ] | date and time (no time zone) | DATETIME | |
timestamp [ (p) ] with time zone | date and time, including time zone | TIMESTAMP | PostgreSQL TIMESTAMP is retrieved as UTC timezone no matter where user call BigQuery |
interval | A time duration | NOT SUPPORTED | |
Character and strings | |||
character [ (n) ] | fixed-length character string | STRING | |
character varying [ (n) ] | variable-length character string | STRING | |
text | variable-length character string | STRING | |
Binary | |||
bytea | binary data ("byte array") | BYTES | |
bit [ (n) ] | fixed-length bit string | BYTES | |
bit varying [ (n) ] | variable-length bit string | BYTES | |
Other | |||
boolean | logical Boolean (true/false) | BOOL | |
inet | IPv4 or IPv6 host address | NOT SUPPORTED | |
path | geometric path on a plane | NOT SUPPORTED | |
pg_lsn | PostgreSQL Log Sequence Number | NOT SUPPORTED | |
point | geometric point on a plane | NOT SUPPORTED | |
polygon | closed geometric path on a plane | NOT SUPPORTED | |
tsquery | text search query | NOT SUPPORTED | |
tsvector | text search document | NOT SUPPORTED | |
txid_snapshot | user-level transaction ID snapshot | NOT SUPPORTED | |
uuid | universally unique identifier | NOT SUPPORTED | |
xml | XML data | STRING | |
box | rectangular box on a plane | NOT SUPPORTED | |
cidr | IPv4 or IPv6 network address | NOT SUPPORTED | |
circle | circle on a plane | NOT SUPPORTED | |
interval [ fields ] [ (p) ] | time span | NOT SUPPORTED | |
json | textual JSON data | STRING | |
jsonb | binary JSON data, decomposed | NOT SUPPORTED | |
line | infinite line on a plane | NOT SUPPORTED | |
lseg | line segment on a plane | NOT SUPPORTED | |
macaddr | MAC (Media Access Control) address | NOT SUPPORTED | |
macaddr8 | MAC (Media Access Control) address (EUI-64 format) | NOT SUPPORTED |
Unsupported MySQL and PostgreSQL data types
If your external query contains a data type that is unsupported in BigQuery, the query will fail immediately. You can cast the unsupported data type to a different supported MySQL / PostgreSQL data type.
- Unsupported MySQL data type
- Error message:
Invalid table-valued function external_query Found unsupported MySQL type in BigQuery. at [1:15]
- Unsupported type:
GEOMETRY
,BIT
- Resolution: Cast the unsupported data type to STRING.
- Example:
SELECT ST_AsText(ST_GeomFromText('POINT(1 1)'));
This command casts the unsupported data typeGEOMETRY
toSTRING
.
- Error message:
- Unsupported PostgreSQL data type
- Error message:
Invalid table-valued function external_query Postgres type (OID = 790) is not supported now at [1:15]
- Unsupported type:
money, time with time zone, inet, path, pg_lsn, point, polygon, tsquery, tsvector, txid_snapshot, uuid, box, cidr, circle, interval, jsonb, line, lseg, macaddr, macaddr8
- Resolution: Cast the unsupported data type to STRING.
- Example:
SELECT CAST('12.34'::float8::numeric::money AS varchar(30));
This command casts the unsupported data typemoney
tostring
.
- Error message:
Spanner to BigQuery type mapping
When you execute a Spanner federated query, the data from Spanner is converted to GoogleSQL types.
Spanner GoogleSQL type | Spanner PostgreSQL type | BigQuery type |
---|---|---|
ARRAY |
- | ARRAY |
BOOL |
bool |
BOOL |
BYTES |
bytea |
BYTES |
DATE |
date |
DATE |
FLOAT64 |
float8 |
FLOAT64 |
INT64 |
bigint |
INT64 |
JSON |
JSONB |
JSON |
NUMERIC |
numeric * |
NUMERIC |
STRING |
varchar |
STRING |
STRUCT
|
- | Not supported for Spanner federated queries |
TIMESTAMP |
timestamptz |
TIMESTAMP with nanoseconds truncated |
* PostgreSQL numeric values with a precision that is greater than
the precision that BigQuery supports are rounded. Values
that are larger than the maximum value generate an Invalid NUMERIC value
error.
If your external query contains a data type that is unsupported for federated queries, the query fails immediately. You can cast the unsupported data type to a supported data type.
SAP Datasphere to BigQuery type mapping
When you execute a SAP Datasphere federated query, the data from SAP Datasphere is converted to the following GoogleSQL types.
SAP Datasphere type | SAP Datasphere description | BigQuery type |
---|---|---|
Integer | ||
Integer | Standard signed integer. | INT64 |
Integer64 | Signed 64-bit integer. | BIGNUMERIC |
hana.SMALLINT | Signed 16-bit integer supporting the values -32,768 to 32,767. | INT64 |
hana.TINYINT | Unsigned 8-bit integer supporting the values 0 to 255. | INT64 |
Exact numeric | ||
Decimal (p, s) | Precision (p) defines the number of total digits and can be between 1 and 38. Scale (s) defines the number of digits after the decimal point and can be between 0 and p. |
BIGNUMERIC |
DecimalFloat | Decimal floating-point number with 34 mantissa digits. | BIGNUMERIC |
hana.SMALLDECIMAL | 64-bit decimal floating-point number, where (p) can be between 1 and 16 and s can be between -369 and 368. | BIGNUMERIC |
Approximate numeric | ||
Double | Double-precision, 64-bit floating-point number. | FLOAT64 |
hana.REAL | 32-bit binary floating-point number. | FLOAT64 |
Date and time | ||
Date | Default format YYYY-MM-DD. | DATE |
Datetime | Default format YYYY-MM-DD HH24:MI:SS. | TIMESTAMP |
Time | Default format HH24:MI:SS. | TIME |
Timestamp | Default format YYYY-MM-DD HH24:MI:SS. | TIMESTAMP |
Character and strings | ||
LargeString | Variable length string of up to 2GB. | STRING |
String (n) | Variable-length Unicode string of up to 5000 characters. | STRING |
Binary | ||
Binary (n) | Variable length byte string of up to 4000 bytes. | BYTES |
LargeBinary | Variable length byte string of up to 2GB. | BYTES |
hana.BINARY (n) | Byte string of fixed length (n). | STRING |
Other | ||
Boolean | TRUE, FALSE and UNKNOWN, where UNKNOWN is a synonym of NULL. | BOOL |
UUID | Universally unique identifier encoded as a 128-bit integer. | STRING |
hana.ST_GEOMETRY | Spatial data in any form, including 0-dimensional points, lines, multi-lines, and polygons. | NOT SUPPORTED |
hana.ST_POINT | Spatial data in the form of 0-dimensional points that represents a single location in coordinate space. | NOT SUPPORTED |