Data types in Standard SQL

BigQuery supports simple data types such as integers, as well as more complex types such as ARRAY and STRUCT. This page provides an overview of each data type, including allowed values. For information on data type literals and constructors, see Lexical Structure and Syntax.

Data type properties

When storing and querying data, it is helpful to keep the following data type properties in mind:

Property Description Applies To
Nullable NULL is a valid value. All data types, with the following exceptions:
  • ARRAYs cannot be NULL.
  • NULL ARRAY elements cannot persist to a table.
Orderable Can be used in an ORDER BY clause. All data types except for:
  • ARRAY
  • STRUCT
  • GEOGRAPHY
Groupable Can generally appear in an expression following
GROUP BY, DISTINCT, and PARTITION BY.
However, PARTITION BY expressions cannot include
floating point types.
All data types except for:
  • ARRAY
  • STRUCT
  • GEOGRAPHY
Comparable Values of the same type can be compared to each other. All data types, with the following exceptions: ARRAY comparisons are not supported.

Equality comparisons for STRUCTs are supported field by field, in field order. Field names are ignored. Less than and greater than comparisons are not supported.

GEOGRAPHY comparisons are not supported. To compare GEOGRAPHY values, use ST_Equals .

All types that support comparisons can be used in a JOIN condition. See JOIN Types for an explanation of join conditions.

Parameterized data types

Syntax:

DATA_TYPE(param[, ...])

You can use parameters to specify constraints for the following data types:

  • STRING
  • BYTES
  • NUMERIC
  • BIGNUMERIC

A data type that is declared with parameters is called a parameterized data type. You can only use parameterized data types with columns and script variables. A column with a parameterized data type is a parameterized column and a script variable with a parameterized data type is a parameterized script variable. Parameterized type constraints are enforced when writing a value to a parameterized column or when assigning a value to a parameterized script variable.

A data type's parameters are not propagated in an expression, only the data type is.

Examples

-- Declare a variable with type parameters.
DECLARE x STRING(10);

-- This is a valid assignment to x.
SET x = "hello";

-- This assignment to x violates the type parameter constraint and results in an OUT_OF_RANGE error.
SET x = "this string is too long"
-- Declare variables with type parameters.
DECLARE x NUMERIC(10) DEFAULT 12345;
DECLARE y NUMERIC(5, 2) DEFAULT 123.45;

-- The variable x is treated as a NUMERIC value when read, so the result of this query
-- is a NUMERIC without type parameters.
SELECT x;

-- Type parameters are not propagated within expressions, so variables x and y are treated
-- as NUMERIC values when read and the result of this query is a NUMERIC without type parameters.
SELECT x + y;

Array type

Name Description
ARRAY Ordered list of zero or more elements of any non-ARRAY type.

An ARRAY is an ordered list of zero or more elements of non-ARRAY values. ARRAYs of ARRAYs are not allowed. Queries that would produce an ARRAY of ARRAYs will return an error. Instead a STRUCT must be inserted between the ARRAYs using the SELECT AS STRUCT construct.

Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:

  • BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
  • BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.

Declaring an ARRAY type

ARRAY<T>

ARRAY types are declared using the angle brackets (< and >). The type of the elements of an ARRAY can be arbitrarily complex with the exception that an ARRAY cannot directly contain another ARRAY.

Examples

Type Declaration Meaning
ARRAY<INT64> Simple ARRAY of 64-bit integers.
ARRAY<BYTES(5)> (preview) Simple ARRAY of parameterized bytes.
ARRAY<STRUCT<INT64, INT64>> An ARRAY of STRUCTs, each of which contains two 64-bit integers.
ARRAY<ARRAY<INT64>>
(not supported)
This is an invalid type declaration which is included here just in case you came looking for how to create a multi-level ARRAY. ARRAYs cannot contain ARRAYs directly. Instead see the next example.
ARRAY<STRUCT<ARRAY<INT64>>> An ARRAY of ARRAYS of 64-bit integers. Notice that there is a STRUCT between the two ARRAYs because ARRAYs cannot hold other ARRAYs directly.

Boolean type

Name Description
BOOL Boolean values are represented by the keywords TRUE and FALSE (case insensitive).

Boolean values are sorted in this order, from least to greatest:

  1. NULL
  2. FALSE
  3. TRUE

Bytes type

Name Description
BYTES Variable-length binary data.

STRING and BYTES are separate types that cannot be used interchangeably. Most functions on STRING are also defined on BYTES. The BYTES version operates on raw bytes rather than Unicode characters. Casts between STRING and BYTES enforce that the bytes are encoded using UTF-8.

Parameterized bytes type

Parameterized Type Description
BYTES(L) Sequence of bytes with a maximum of L bytes allowed in the binary string, where L is a positive INT64 value. If a sequence of bytes has more than L bytes, throws an OUT_OF_RANGE error.

See Parameterized Data Types for more information on parameterized types and where they can be used.

Date type

Name Range
DATE 0001-01-01 to 9999-12-31.

The DATE type represents a logical calendar date, independent of time zone. A DATE value does not represent a specific 24-hour time period. Rather, a given DATE value represents a different 24-hour period when interpreted in different time zones, and may represent a shorter or longer day during Daylight Savings Time transitions. To represent an absolute point in time, use a timestamp.

Canonical format
'YYYY-[M]M-[D]D'
  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day

Datetime type

Name Range
DATETIME 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999

A DATETIME object represents a date and time, as they might be displayed on a calendar or clock, independent of time zone. It includes the year, month, day, hour, minute, second, and subsecond. To represent an absolute point in time, use a timestamp.

Canonical format
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.F]]
  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( |T): A space or a `T` separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.F]: Up to six fractional digits (microsecond precision)

Geography type

Name Description
GEOGRAPHY A collection of points, lines, and polygons, which is represented as a point set, or a subset of the surface of the Earth.

The GEOGRAPHY type is based on the OGC Simple Features specification (SFS), and is a collection of following objects:

  • Point geography: A single location in coordinate space. A point has an x-coordinate value and a y-coordinate value, where the x-coordinate is longitude and the y-coordinate is latitude of the point on the WGS84 reference ellipsoid.
  • Linestring geography: A one-dimensional geometric object, with a sequence of points and geodesic edges between them.
  • Polygon geography: A planar surface defined by 1 exterior boundary and 0 or more interior boundaries. Each interior boundary defines a hole in the polygon. The boundary loops of polygons are oriented so that if you traverse the boundary vertices in order, the interior of the polygon is on the left.

The points, linestrings and polygons of a GEOGRAPHY value form a simple arrangement on the WGS84 reference ellipsoid. A simple arrangement is one where no point on the WGS84 surface is contained by multiple elements of the collection. If self intersections exist, they are automatically removed.

The GEOGRAPHY that contains no points, linestrings or polygons is called an empty GEOGRAPHY.

A GEOGRAPHY is the result of, or an argument to, a Geography Function.

Numeric types

Numeric types include the following types:

  • INT64
  • NUMERIC with alias DECIMAL
  • BIGNUMERIC with alias BIGDECIMAL
  • FLOAT64

Integer type

Integers are numeric values that do not have fractional components.

Name Range
INT64 -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Decimal types

Decimal type values are numeric values with fixed precision and scale. Precision is the number of digits that the number contains. Scale is how many of these digits appear after the decimal point.

This type can represent decimal fractions exactly, and is suitable for financial calculations.

Name Precision, Scale, and Range
NUMERIC
DECIMAL
Precision: 38
Scale: 9
Min: -9.9999999999999999999999999999999999999E+28
Max: 9.9999999999999999999999999999999999999E+28
BIGNUMERIC
BIGDECIMAL
Precision: 76.76 (the 77th digit is partial)
Scale: 38
Min: -5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+38
Max: 5.7896044618658097711785492504343953926634992332820282019728792003956564819967E+38

DECIMAL is an alias for NUMERIC. BIGDECIMAL is an alias for BIGNUMERIC.

Parameterized decimal type

Parameterized Type Description
NUMERIC(P[,S])
DECIMAL(P[,S])
A NUMERIC or DECIMAL type with a maximum precision of P and maximum scale of S, where P and S are INT64 types. S is interpreted to be 0 if unspecified.

Maximum scale range: 0 ≤ S ≤ 9
Maximum precision range: max(1, S) ≤ PS + 29
BIGNUMERIC(P[, S])
BIGDECIMAL(P[, S])
A BIGNUMERIC or BIGDECIMAL type with a maximum precision of P and maximum scale of S, where P and S are INT64 types. S is interpreted to be 0 if unspecified.

Maximum scale range: 0 ≤ S ≤ 38
Maximum precision range: max(1, S) ≤ PS + 38

If a value has more than S decimal digits, the value is rounded to S decimal digits. For example, inserting the value 1.125 into a NUMERIC(5, 2) column rounds 1.125 half-up to 1.13.

If a value has more than P digits, throws an OUT_OF_RANGE error. For example, inserting 1111 into a NUMERIC(5, 2) column returns an OUT_OF_RANGE error since 1111 is larger than 999.99, the maximum allowed value in a NUMERIC(5, 2) column.

See Parameterized Data Types for more information on parameterized types and where they can be used.

Floating point type

Floating point values are approximate numeric values with fractional components.

Name Description
FLOAT64 Double precision (approximate) numeric values.

Floating point semantics

When working with floating point numbers, there are special non-numeric values that need to be considered: NaN and +/-inf

Arithmetic operators provide standard IEEE-754 behavior for all finite input values that produce finite output and for all operations for which at least one input is non-finite.

Function calls and operators return an overflow error if the input is finite but the output would be non-finite. If the input contains non-finite values, the output can be non-finite. In general functions do not introduce NaNs or +/-inf. However, specific functions like IEEE_DIVIDE can return non-finite values on finite input. All such cases are noted explicitly in Mathematical functions.

Mathematical function examples
Left Term Operator Right Term Returns
Any value + NaN NaN
1.0 + +inf +inf
1.0 + -inf -inf
-inf + +inf NaN
Maximum FLOAT64 value + Maximum FLOAT64 value Overflow error
Minimum FLOAT64 value / 2.0 0.0
1.0 / 0.0 "Divide by zero" error

Comparison operators provide standard IEEE-754 behavior for floating point input.

Comparison operator examples
Left Term Operator Right Term Returns
NaN = Any value FALSE
NaN < Any value FALSE
Any value < NaN FALSE
-0.0 = 0.0 TRUE
-0.0 < 0.0 FALSE

Floating point values are sorted in this order, from least to greatest:

  1. NULL
  2. NaN — All NaN values are considered equal when sorting.
  3. -inf
  4. Negative numbers
  5. 0 or -0 — All zero values are considered equal when sorting.
  6. Positive numbers
  7. +inf

Special floating point values are grouped this way, including both grouping done by a GROUP BY clause and grouping done by the DISTINCT keyword:

  • NULL
  • NaN — All NaN values are considered equal when grouping.
  • -inf
  • 0 or -0 — All zero values are considered equal when grouping.
  • +inf

String type

Name Description
STRING Variable-length character (Unicode) data.

Input STRING values must be UTF-8 encoded and output STRING values will be UTF-8 encoded. Alternate encodings like CESU-8 and Modified UTF-8 are not treated as valid UTF-8.

All functions and operators that act on STRING values operate on Unicode characters rather than bytes. For example, functions like SUBSTR and LENGTH applied to STRING input count the number of characters, not bytes.

Each Unicode character has a numeric value called a code point assigned to it. Lower code points are assigned to lower characters. When characters are compared, the code points determine which characters are less than or greater than other characters.

Most functions on STRING are also defined on BYTES. The BYTES version operates on raw bytes rather than Unicode characters. STRING and BYTES are separate types that cannot be used interchangeably. There is no implicit casting in either direction. Explicit casting between STRING and BYTES does UTF-8 encoding and decoding. Casting BYTES to STRING returns an error if the bytes are not valid UTF-8.

Parameterized string type

Parameterized Type Description
STRING(L) String with a maximum of L Unicode characters allowed in the string, where L is a positive INT64 value. If a string with more than L Unicode characters is assigned, throws an OUT_OF_RANGE error.

See Parameterized Data Types for more information on parameterized types and where they can be used.

Struct type

Name Description
STRUCT Container of ordered fields each with a type (required) and field name (optional).

Declaring a STRUCT type

STRUCT<T>

STRUCT types are declared using the angle brackets (< and >). The type of the elements of a STRUCT can be arbitrarily complex.

Examples

Type Declaration Meaning
STRUCT<INT64> Simple STRUCT with a single unnamed 64-bit integer field.
STRUCT<x STRING(10)> (preview) Simple STRUCT with a single parameterized string field named x.
STRUCT<x STRUCT<y INT64, z INT64>> A STRUCT with a nested STRUCT named x inside it. The STRUCT x has two fields, y and z, both of which are 64-bit integers.
STRUCT<inner_array ARRAY<INT64>> A STRUCT containing an ARRAY named inner_array that holds 64-bit integer elements.

Constructing a STRUCT

Tuple syntax

(expr1, expr2 [, ... ])

The output type is an anonymous STRUCT type with anonymous fields with types matching the types of the input expressions. There must be at least two expressions specified. Otherwise this syntax is indistinguishable from an expression wrapped with parentheses.

Examples

Syntax Output Type Notes
(x, x+y) STRUCT<?,?> If column names are used (unquoted strings), the STRUCT field data type is derived from the column data type. x and y are columns, so the data types of the STRUCT fields are derived from the column types and the output type of the addition operator.

This syntax can also be used with STRUCT comparison for comparison expressions using multi-part keys, e.g. in a WHERE clause:

WHERE (Key1,Key2) IN ( (12,34), (56,78) )

Typeless struct syntax

STRUCT( expr1 [AS field_name] [, ... ])

Duplicate field names are allowed. Fields without names are considered anonymous fields and cannot be referenced by name. STRUCT values can be NULL, or can have NULL field values.

Examples

Syntax Output Type
STRUCT(1,2,3) STRUCT<int64,int64,int64>
STRUCT() STRUCT<>
STRUCT('abc') STRUCT<string>
STRUCT(1, t.str_col) STRUCT<int64, str_col string>
STRUCT(1 AS a, 'abc' AS b) STRUCT<a int64, b string>
STRUCT(str_col AS abc) STRUCT<abc string>

Typed struct syntax

STRUCT<[field_name] field_type, ...>( expr1 [, ... ])

Typed syntax allows constructing STRUCTs with an explicit STRUCT data type. The output type is exactly the field_type provided. The input expression is coerced to field_type if the two types are not the same, and an error is produced if the types are not compatible. AS alias is not allowed on the input expressions. The number of expressions must match the number of fields in the type, and the expression types must be coercible or literal-coercible to the field types.

Examples

Syntax Output Type
STRUCT<int64>(5) STRUCT<int64>
STRUCT<date>("2011-05-05") STRUCT<date>
STRUCT<x int64, y string>(1, t.str_col) STRUCT<x int64, y string>
STRUCT<int64>(int_col) STRUCT<int64>
STRUCT<x int64>(5 AS x) Error - Typed syntax does not allow AS

Limited comparisons for STRUCT

STRUCTs can be directly compared using equality operators:

  • Equal (=)
  • Not Equal (!= or <>)
  • [NOT] IN

Notice, though, that these direct equality comparisons compare the fields of the STRUCT pairwise in ordinal order ignoring any field names. If instead you want to compare identically named fields of a STRUCT, you can compare the individual fields directly.

Time type

Name Range
TIME 00:00:00 to 23:59:59.99999.

A TIME object represents a time, as might be displayed on a watch, independent of a specific date and timezone. To represent an absolute point in time, use a timestamp.

Canonical format
[H]H:[M]M:[S]S[.DDDDDD|.F]
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.F]: Up to six fractional digits (microsecond precision)

Timestamp type

Name Range
TIMESTAMP 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999 UTC

A TIMESTAMP object represents an absolute point in time, independent of any time zone or convention such as Daylight Savings Time with microsecond precision.

  • To represent a date as it might appear on a calendar, use a DATE object.
  • To represent a time, as it might appear on a clock, use a TIME object.
  • To represent a date and time, as they might appear on a calendar and clock, use a DATETIME object.
Canonical format
YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.F]][time zone]
  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( |T): A space or a `T` separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.F]: Up to six fractional digits (microsecond precision)
  • [time zone]: String representing the time zone. When a time zone is not explicitly specified, the default time zone, UTC, is used. See the time zones section for details.

Time zones

Time zones are used when parsing timestamps or formatting timestamps for display. The timestamp value itself does not store a specific time zone, nor does it change when you apply a time zone offset.

Time zones are represented by strings in one of these two canonical formats:

  • Offset from Coordinated Universal Time (UTC), or the letter Z for UTC
  • Time zone name from the tz database

Offset from Coordinated Universal Time (UTC)

(+|-)H[H][:M[M]]
Z

Examples

-08:00
-8:15
+3:00
+07:30
-7
Z

When using this format, no space is allowed between the time zone and the rest of the timestamp.

2014-09-27 12:30:00.45-8:00
2014-09-27T12:30:00.45Z

Time zone name

continent/[region/]city

Time zone names are from the tz database. For a less comprehensive but simpler reference, see the List of tz database time zones on Wikipedia.

Examples

America/Los_Angeles
America/Argentina/Buenos_Aires

When using a time zone name, a space is required between the name and the rest of the timestamp:

2014-09-27 12:30:00.45 America/Los_Angeles

Note that not all time zone names are interchangeable even if they do happen to report the same time during a given part of the year. For example, America/Los_Angeles reports the same time as UTC-7:00 during Daylight Savings Time, but reports the same time as UTC-8:00 outside of Daylight Savings Time.

If a time zone is not specified, the default time zone value is used.

Leap seconds

A timestamp is simply an offset from 1970-01-01 00:00:00 UTC, assuming there are exactly 60 seconds per minute. Leap seconds are not represented as part of a stored timestamp.

If the input contains values that use ":60" in the seconds field to represent a leap second, that leap second is not preserved when converting to a timestamp value. Instead that value is interpreted as a timestamp with ":00" in the seconds field of the following minute.

Leap seconds do not affect timestamp computations. All timestamp computations are done using Unix-style timestamps, which do not reflect leap seconds. Leap seconds are only observable through functions that measure real-world time. In these functions, it is possible for a timestamp second to be skipped or repeated when there is a leap second.