Federated Query Functions in Standard SQL

BigQuery supports the following federated query functions.

EXTERNAL_QUERY

EXTERNAL_QUERY('connection_id', '''external_database_query''')

Description

Executes a query on an external database and returns the results as a temporary table. The external database data type is converted to a Standard SQL data type in the temporary result table with these data type mappings.

  • external_database_query: The query to run on the external database.
  • connection_id: The ID of the connection resource. The connection resource contains settings for the connection between the external database and BigQuery.

Additional notes:

  • The EXTERNAL_QUERY function is usually used in a FROM clause.
  • You can use the EXTERNAL_QUERY() function to access metadata about the external database.
  • EXTERNAL_QUERY() won't honor the ordering of the external query result, even if your external query includes ORDER BY.

Return Data Type

BigQuery table

Examples

Suppose you need the date of the first order for each of your customers to include in a report. This data is not currently in BigQuery but is available in your operational PostgreSQL database in Cloud SQL. The following federated query example accomplishes this and includes 3 parts:

  1. Run the external query SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id in the operational PostgreSQL database to get the first order date for each customer through the EXTERNAL_QUERY() function.
  2. Join external query result table with customers table in BigQuery by customer_id.
  3. Select customer information and first order date.
SELECT
  c.customer_id, c.name, SUM(t.amount) AS total_revenue, rq.first_order_date
FROM customers AS c
INNER JOIN transaction_fact AS t ON c.customer_id = t.customer_id
LEFT OUTER JOIN
  EXTERNAL_QUERY(
    'connection_id',
    '''SELECT customer_id, MIN(order_date) AS first_order_date
       FROM orders
       GROUP BY customer_id'''
  ) AS rq
  ON rq.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, rq.first_order_date;

You can use the EXTERNAL_QUERY() function to query information_schema tables to access database metadata, such as list all tables in the database or show table schema. The following example information_schema queries work in both MySQL and PostgreSQL.

-- List all tables in a database.
SELECT *
FROM
  EXTERNAL_QUERY(
    'connection_id',
    '''SELECT * FROM information_schema.tables'''
  );
-- List all columns in a table.
SELECT *
FROM
  EXTERNAL_QUERY(
    'connection_id',
    '''SELECT * FROM information_schema.columns WHERE table_name='x';'''
  );

EXTERNAL_QUERY() won't honor the ordering of the external query result, even if your external query includes ORDER BY. The following example query orders rows by customer ID in the external database, but BigQuery will not output the result rows in that order.

-- ORDER BY will not order rows.
SELECT *
FROM
  EXTERNAL_QUERY(
    'connection_id',
    '''SELECT * FROM customers AS c ORDER BY c.customer_id'''
  );

Data type mappings

When you execute a federated query, the data from the external database are converted to BigQuery standard SQL types. Below are the data type mappings from MySQL to BigQuery and PostgreSQL to BigQuery.

Things to know about mapping:

  • Most MySQL data types can be matched to the same BigQuery data type, with a few exceptions such as decimal, timestamp, and time.
  • PostgreSQL supports many non-standard data types which are not supported in BigQuery, for example money, path, uuid, boxer, and others.
  • The BigQuery numeric value range is smaller than in MySQL and PostgreSQL.

Error handling

If your external query contains a data type that is unsupported in BigQuery, the query will fail immediately. You can cast the unsupported data type to a different MySQL / PostgreSQL data type that is supported. See unsupported data types for more information on how to cast.

MySQL to BigQuery type mapping

MySQL type MySQL Description BigQuery type Type difference
Integer
INT 4 bytes, 2^32 - 1 INT64
TINYINT 1 byte, 2^8 - 1 INT64
SMALLINT 2 bytes, 2^16 - 1 INT64
MEDIUMINT 3 bytes, 2^24 - 1 INT64
BIGINT 8 bytes, 2^64 - 1 INT64
UNSIGNED BIGINT 8 bytes, 2^64 - 1 NUMERIC
Exact numeric
DECIMAL (M,D) A decimal represents by (M,D) where M is the total number of digits and D is the number of decimals. M <= 65 NUMERIC

BigQuery NUMERIC range is smaller, only supports 38 decimal digits of precision and 9 decimal digits of scale.
Approximate numeric
FLOAT (M,D) 4 bytes, M <= 23 FLOAT64
DOUBLE (M,D) 8 bytes, M <= 53 FLOAT64
Date and time
TIMESTAMP '1970-01-01 00:00:01'UTC to '2038-01-19 03:14:07' UTC. TIMESTAMP MySQL TIMESTAMP is retrieved as UTC timezone no matter where user call BigQuery
DATETIME '1000-01-01 00:00:00' to '9999-12-31 23:59:59' DATETIME
DATE '1000-01-01' to '9999-12-31'. DATE
TIME Time in 'HH:MM:SS' format
'-838:59:59' to '838:59:59'.
TIME
BigQuery TIME range is smaller, from 00:00:00 to 23:59:59
YEAR INT64
Character and strings
ENUM string object with a value chosen from a list of permitted values STRING
CHAR (M) A fixed-length string between 1 and 255 characters STRING
VARCHAR (M) A variable-length string between 1 and 255 characters in length. STRING
TEXT A field with a maximum length of 65535 characters. STRING
TINYTEXT TEXT column with a maximum length of 255 characters. STRING
MEDIUMTEXT TEXT column with a maximum length of 16777215 characters. STRING
LONGTEXT TEXT column with a maximum length of 4294967295 characters. STRING
Binary
BLOB A binary large object with a maximum length of 65535 characters. BYTE
MEDIUM_BLOB A BLOB with a maximum length of 16777215 characters. BYTE
LONG_BLOB A BLOB with a maximum length of 4294967295 characters. BYTE
TINY_BLOB A BLOB with a maximum length of 255 characters. BYTE
BINARY A fixed-length binary string between 1 and 255 characters. BYTE
VARBINARY A variable-length binary string between 1 and 255 characters. BYTE
Other
SET when declare SET column, predefine some values. Then INSERT any set of predefined values into this column STRING
GEOMETRY GEOMETRY NOT YET SUPPORTED
BIT INT64 NOT YET SUPPORTED

PostgreSQL to BigQuery type mapping

Name Description BigQuery type Type difference
Integer
smallint 2 bytes, -32768 to +32767 INT64
smallserial See smallint INT64
integer 4 bytes, -2147483648 to +2147483647 INT64
serial See integer INT64
bigint 8 bytes, -9223372036854775808 to 9223372036854775807 INT64
bigserial See bigint INT64
Exact numeric
numeric [ (p, s) ] Precision up to 1,000. NUMERIC BigQuery NUMERIC range is smaller, only supports 38 decimal digits of precision and 9 decimal digits of scale.
Decimal [ (p, s) ] See numeric NUMERIC See numeric
money 8 bytes, 2 digit scale, -92233720368547758.08 to +92233720368547758.07 NOT SUPPORTED
Approximate numeric
real 4 bytes, single precision floating-point number FLOAT64
double precision 8 bytes, double precision floating-point number FLOAT64
Date and time
date calendar date (year, month, day) DATE
time [ (p) ] [ without time zone ] time of day (no time zone) TIME
time [ (p) ] with time zone time of day, including time zone NOT SUPPORTED
timestamp [ (p) ] [ without time zone ] date and time (no time zone) DATETIME
timestamp [ (p) ] with time zone date and time, including time zone TIMESTAMP PostgreSQL TIMESTAMP is retrieved as UTC timezone no matter where user call BigQuery
interval A time duration NOT SUPPORTED
Character and strings
character [ (n) ] fixed-length character string STRING
character varying [ (n) ] variable-length character string STRING
text variable-length character string STRING
Binary
bytea binary data ("byte array") BYTES
bit [ (n) ] fixed-length bit string BYTES
bit varying [ (n) ] variable-length bit string BYTES
Other
boolean logical Boolean (true/false) BOOL
inet IPv4 or IPv6 host address NOT SUPPORTED
path geometric path on a plane NOT SUPPORTED
pg_lsn PostgreSQL Log Sequence Number NOT SUPPORTED
point geometric point on a plane NOT SUPPORTED
polygon closed geometric path on a plane NOT SUPPORTED
tsquery text search query NOT SUPPORTED
tsvector text search document NOT SUPPORTED
txid_snapshot user-level transaction ID snapshot NOT SUPPORTED
uuid universally unique identifier NOT SUPPORTED
xml XML data STRING
box rectangular box on a plane NOT SUPPORTED
cidr IPv4 or IPv6 network address NOT SUPPORTED
circle circle on a plane NOT SUPPORTED
interval [ fields ] [ (p) ] time span NOT SUPPORTED
json textual JSON data STRING
jsonb binary JSON data, decomposed NOT SUPPORTED
line infinite line on a plane NOT SUPPORTED
lseg line segment on a plane NOT SUPPORTED
macaddr MAC (Media Access Control) address NOT SUPPORTED
macaddr8 MAC (Media Access Control) address (EUI-64 format) NOT SUPPORTED

Unsupported data types

If your external query contains a data type that is unsupported in BigQuery, the query will fail immediately. You can cast the unsupported data type to a different supported MySQL / PostgreSQL data type.

  • Unsupported MySQL data type

    • Error message: Invalid table-valued function external_query Found unsupported MySQL type in BigQuery. at [1:15]
    • Unsupported type: GEOMETRY, BIT
    • Resolution: Cast the unsupported data type to STRING.
    • Example: SELECT ST_AsText(ST_GeomFromText('POINT(1 1)')); This command casts the unsupported data type GEOMETRY to STRING.
  • Unsupported PostgreSQL data type

    • Error message: Invalid table-valued function external_query Postgres type (OID = 790) is not supported now at [1:15]
    • Unsupported type: money, time with time zone, inet, path, pg_lsn, point, polygon, tsquery, tsvector, txid_snapshot, uuid, box, cidr, circle, interval, jsonb, line, lseg, macaddr, macaddr8
    • Resolution: Cast the unsupported data type to STRING.
    • Example: SELECT CAST('12.34'::float8::numeric::money AS varchar(30)); This command casts the unsupported data type money to string.