Federated Query Functions in GoogleSQL

BigQuery supports the following federated query functions.

EXTERNAL_QUERY

EXTERNAL_QUERY('connection_id', '''external_database_query'''[, 'options'])

Description

Executes a query on an external database and returns the results as a temporary table. The external database data type is converted to a GoogleSQL data type in the temporary result table with these data type mappings.

  • external_database_query: The query to run on the external database.
  • connection_id: The ID of the connection resource. The connection resource contains settings for the connection between the external database and BigQuery. If you do not have a default project configured, prepend the project ID to the connection ID in following format:

    projects/PROJECT_ID/location/LOCATION/connections/CONNECTION_ID
    

    Replace the following:

    • PROJECT_ID: The project ID.
    • LOCATION: The location of the connection.
    • CONNECTION_ID: The connection ID.

    For example, projects/example-project/locations/us/connections/sql-bq. For more information, see Create a connection resource.

  • options: An optional string of a JSON format map with key value pairs of option name and value (both are case sensitive).

    For example:: '{"default_type_for_decimal_columns":"numeric"}'

    Supported options:

    Option Name Description
    "default_type_for_decimal_columns" Can be "float64", "numeric", "bignumeric" or "string". With this option, the MySQL Decimal type or PostgreSQL Numeric type will be mapped to the provided BigQuery type. When this option is not provided, the MySQL Decimal type or PostgreSQL Numeric type will be mapped to BigQuery NUMERIC type.
    "query_execution_priority" Can be "low", "medium" or "high". Only supported in Spanner. Specifies priority for execution of the query. Execution priority is "medium" by default.

Additional notes:

  • The EXTERNAL_QUERY function is usually used in a FROM clause.
  • You can use the EXTERNAL_QUERY() function to access metadata about the external database.
  • EXTERNAL_QUERY() won't honor the ordering of the external query result, even if your external query includes ORDER BY.

Return Data Type

BigQuery table

Examples

Suppose you need the date of the first order for each of your customers to include in a report. This data is not currently in BigQuery but is available in your operational PostgreSQL database in Cloud SQL. The following federated query example accomplishes this and includes 3 parts:

  1. Run the external query SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id in the operational PostgreSQL database to get the first order date for each customer through the EXTERNAL_QUERY() function.
  2. Join external query result table with customers table in BigQuery by customer_id.
  3. Select customer information and first order date.
SELECT
  c.customer_id, c.name, SUM(t.amount) AS total_revenue, rq.first_order_date
FROM customers AS c
INNER JOIN transaction_fact AS t ON c.customer_id = t.customer_id
LEFT OUTER JOIN
  EXTERNAL_QUERY(
    'connection_id',
    '''SELECT customer_id, MIN(order_date) AS first_order_date
       FROM orders
       GROUP BY customer_id'''
  ) AS rq
  ON rq.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, rq.first_order_date;

You can use the EXTERNAL_QUERY() function to query information_schema tables to access database metadata, such as list all tables in the database or show table schema. The following example information_schema queries work in both MySQL and PostgreSQL.

-- List all tables in a database.
SELECT *
FROM
  EXTERNAL_QUERY(
    'connection_id',
    '''SELECT * FROM information_schema.tables'''
  );
-- List all columns in a table.
SELECT *
FROM
  EXTERNAL_QUERY(
    'connection_id',
    '''SELECT * FROM information_schema.columns WHERE table_name='x';'''
  );

EXTERNAL_QUERY() won't honor the ordering of the external query result, even if your external query includes ORDER BY. The following example query orders rows by customer ID in the external database, but BigQuery will not output the result rows in that order.

-- ORDER BY will not order rows.
SELECT *
FROM
  EXTERNAL_QUERY(
    'connection_id',
    '''SELECT * FROM customers AS c ORDER BY c.customer_id'''
  );

Data type mappings

When you execute a federated query, the data from the external database are converted to GoogleSQL types. Below are the data type mappings from MySQL to BigQuery and PostgreSQL to BigQuery.

Things to know about mapping:

  • Most MySQL data types can be matched to the same BigQuery data type, with a few exceptions such as decimal, timestamp, and time.
  • PostgreSQL supports many non-standard data types which are not supported in BigQuery, for example money, path, uuid, boxer, and others.
  • The numeric data types in MySQL and PostgreSQL will be mapped to BigQuery NUMERIC value by default. The BigQuery NUMERIC value range is smaller than in MySQL and PostgreSQL. It can also be mapped to BIGNUMERIC, FLOAT64, or STRING with "default_type_for_decimal_columns" in EXTERNAL_QUERY options.

Error handling

If your external query contains a data type that is unsupported in BigQuery, the query will fail immediately. You can cast the unsupported data type to a different MySQL / PostgreSQL data type that is supported. See unsupported data types for more information on how to cast.

MySQL to BigQuery type mapping

MySQL type MySQL Description BigQuery type Type difference
Integer
INT 4 bytes, 2^32 - 1 INT64
TINYINT 1 byte, 2^8 - 1 INT64
SMALLINT 2 bytes, 2^16 - 1 INT64
MEDIUMINT 3 bytes, 2^24 - 1 INT64
BIGINT 8 bytes, 2^64 - 1 INT64
UNSIGNED BIGINT 8 bytes, 2^64 - 1 NUMERIC
Exact numeric
DECIMAL (M,D) A decimal represents by (M,D) where M is the total number of digits and D is the number of decimals. M <= 65 NUMERIC, BIGNUMERIC, FLOAT64, or STRING

DECIMAL (M,D) will to mapped to NUMERIC by default, or can be mapped to BIGNUMERIC, FLOAT64, or STRING with default_type_for_decimal_columns.
Approximate numeric
FLOAT (M,D) 4 bytes, M <= 23 FLOAT64
DOUBLE (M,D) 8 bytes, M <= 53 FLOAT64
Date and time
TIMESTAMP '1970-01-01 00:00:01'UTC to '2038-01-19 03:14:07' UTC. TIMESTAMP MySQL TIMESTAMP is retrieved as UTC timezone no matter where user call BigQuery
DATETIME '1000-01-01 00:00:00' to '9999-12-31 23:59:59' DATETIME
DATE '1000-01-01' to '9999-12-31'. DATE
TIME Time in 'HH:MM:SS' format
'-838:59:59' to '838:59:59'.
TIME
BigQuery TIME range is smaller, from 00:00:00 to 23:59:59
YEAR INT64
Character and strings
ENUM string object with a value chosen from a list of permitted values STRING
CHAR (M) A fixed-length string between 1 and 255 characters STRING
VARCHAR (M) A variable-length string between 1 and 255 characters in length. STRING
TEXT A field with a maximum length of 65535 characters. STRING
TINYTEXT TEXT column with a maximum length of 255 characters. STRING
MEDIUMTEXT TEXT column with a maximum length of 16777215 characters. STRING
LONGTEXT TEXT column with a maximum length of 4294967295 characters. STRING
Binary
BLOB A binary large object with a maximum length of 65535 characters. BYTES
MEDIUM_BLOB A BLOB with a maximum length of 16777215 characters. BYTES
LONG_BLOB A BLOB with a maximum length of 4294967295 characters. BYTES
TINY_BLOB A BLOB with a maximum length of 255 characters. BYTES
BINARY A fixed-length binary string between 1 and 255 characters. BYTES
VARBINARY A variable-length binary string between 1 and 255 characters. BYTES
Other
SET when declare SET column, predefine some values. Then INSERT any set of predefined values into this column STRING
GEOMETRY GEOGRAPHY NOT YET SUPPORTED
BIT INT64 NOT YET SUPPORTED

PostgreSQL to BigQuery type mapping

Name Description BigQuery type Type difference
Integer
smallint 2 bytes, -32768 to +32767 INT64
smallserial See smallint INT64
integer 4 bytes, -2147483648 to +2147483647 INT64
serial See integer INT64
bigint 8 bytes, -9223372036854775808 to 9223372036854775807 INT64
bigserial See bigint INT64
Exact numeric
numeric [ (p, s) ] Precision up to 1,000. NUMERIC, BIGNUMERIC, FLOAT64, or STRING numeric [ (p, s) ] will to mapped to NUMERIC by default, or can be mapped to BIGNUMERIC, FLOAT64, or STRING with default_type_for_decimal_columns.
Decimal [ (p, s) ] See numeric NUMERIC See numeric
money 8 bytes, 2 digit scale, -92233720368547758.08 to +92233720368547758.07 NOT SUPPORTED
Approximate numeric
real 4 bytes, single precision floating-point number FLOAT64
double precision 8 bytes, double precision floating-point number FLOAT64
Date and time
date calendar date (year, month, day) DATE
time [ (p) ] [ without time zone ] time of day (no time zone) TIME
time [ (p) ] with time zone time of day, including time zone NOT SUPPORTED
timestamp [ (p) ] [ without time zone ] date and time (no time zone) DATETIME
timestamp [ (p) ] with time zone date and time, including time zone TIMESTAMP PostgreSQL TIMESTAMP is retrieved as UTC timezone no matter where user call BigQuery
interval A time duration NOT SUPPORTED
Character and strings
character [ (n) ] fixed-length character string STRING
character varying [ (n) ] variable-length character string STRING
text variable-length character string STRING
Binary
bytea binary data ("byte array") BYTES
bit [ (n) ] fixed-length bit string BYTES
bit varying [ (n) ] variable-length bit string BYTES
Other
boolean logical Boolean (true/false) BOOL
inet IPv4 or IPv6 host address NOT SUPPORTED
path geometric path on a plane NOT SUPPORTED
pg_lsn PostgreSQL Log Sequence Number NOT SUPPORTED
point geometric point on a plane NOT SUPPORTED
polygon closed geometric path on a plane NOT SUPPORTED
tsquery text search query NOT SUPPORTED
ts