Data types in GoogleSQL

This page provides an overview of all GoogleSQL for Spanner data types, including information about their value domains. For information on data type literals and constructors, see Lexical Structure and Syntax.

Data type properties

When storing and querying data, it is helpful to keep the following data type properties in mind:

Valid column types

All data types are valid column types, except for:

  • STRUCT

Valid key column types

All data types are valid key column types for primary keys, foreign keys, and secondary indexes, except for:

  • ARRAY
  • JSON
  • STRUCT

Storage size for data types

Each data type includes 8 bytes of storage overhead, in addition to the following values:

  • ARRAY: The sum of the size of its elements.
  • BOOL: 1 byte.
  • BYTES: The number of bytes.
  • DATE: 4 bytes.
  • FLOAT64: 8 bytes.
  • INT64: 8 bytes.
  • JSON: The number of bytes in UTF-8 encoding of the JSON-formatted string equivalent after canonicalization.
  • NUMERIC: A function of both the precision and scale of the value being stored. The value 0 is stored as 1 byte. The storage size for all other values varies between 6 and 22 bytes.
  • STRING: The number of bytes in its UTF-8 encoding.
  • STRUCT: The sum of its parts.
  • TIMESTAMP: 12 bytes.

Nullable data types

For nullable data types, NULL is a valid value. Currently, all existing data types are nullable.

Orderable data types

Expressions of orderable data types can be used in an ORDER BY clause. Applies to all data types except for:

  • ARRAY
  • STRUCT
  • JSON

Ordering NULLs

In the context of the ORDER BY clause, NULLs are the minimum possible value; that is, NULLs appear first in ASC sorts and last in DESC sorts.

To learn more about using ASC and DESC, see the ORDER BY clause.

Ordering floating points

Floating point values are sorted in this order, from least to greatest:

  1. NULL
  2. NaN — All NaN values are considered equal when sorting.
  3. -inf
  4. Negative numbers
  5. 0 or -0 — All zero values are considered equal when sorting.
  6. Positive numbers
  7. +inf

Groupable data types

Can generally appear in an expression following GROUP BY and DISTINCT. All data types are supported except for:

  • JSON
  • ARRAY
  • STRUCT

Grouping with floating point types

Groupable floating point types can appear in an expression following GROUP BY and DISTINCT.

Special floating point values are grouped in the following way, including both grouping done by a GROUP BY clause and grouping done by the DISTINCT keyword:

  • NULL
  • NaN — All NaN values are considered equal when grouping.
  • -inf
  • 0 or -0 — All zero values are considered equal when grouping.
  • +inf

Comparable data types

Values of the same comparable data type can be compared to each other. All data types are supported except for:

  • JSON

Notes:

  • Equality comparisons for array data types are supported as long as the element types are the same, and the element types are comparable. Less than and greater than comparisons are not supported.
  • Equality comparisons for structs are supported field by field, in field order. Field names are ignored. Less than and greater than comparisons are not supported.
  • All types that support comparisons can be used in a JOIN condition. See JOIN Types for an explanation of join conditions.

The maximum size of a column value is 10MiB, which applies to scalar and array types.

Array type

Name Description
ARRAY Ordered list of zero or more elements of any non-array type.

An array is an ordered list of zero or more elements of non-array values. Elements in an array must share the same type.

Arrays of arrays are not allowed. Queries that would produce an array of arrays will return an error. Instead, a struct must be inserted between the arrays using the SELECT AS STRUCT construct.

To learn more about the literal representation of an array type, see Array literals.

To learn more about using arrays in GoogleSQL, see Work with arrays.

NULLs and the array type

An empty array and a NULL array are two distinct values. Arrays can contain NULL elements.

Declaring an array type

ARRAY<T>

Array types are declared using the angle brackets (< and >). The type of the elements of an array can be arbitrarily complex with the exception that an array cannot directly contain another array.

Examples

Type Declaration Meaning
ARRAY<INT64> Simple array of 64-bit integers.
ARRAY<STRUCT<INT64, INT64>> An array of structs, each of which contains two 64-bit integers.
ARRAY<ARRAY<INT64>>
(not supported)
This is an invalid type declaration which is included here just in case you came looking for how to create a multi-level array. Arrays cannot contain arrays directly. Instead see the next example.
ARRAY<STRUCT<ARRAY<INT64>>> An array of arrays of 64-bit integers. Notice that there is a struct between the two arrays because arrays cannot hold other arrays directly.

Constructing an array

You can construct an array using array literals or array functions.

Using array literals

You can build an array literal in GoogleSQL using brackets ([ and ]). Each element in an array is separated by a comma.

SELECT [1, 2, 3] AS numbers;

SELECT ["apple", "pear", "orange"] AS fruit;

SELECT [true, false, true] AS booleans;

You can also create arrays from any expressions that have compatible types. For example:

SELECT [a, b, c]
FROM
  (SELECT 5 AS a,
          37 AS b,
          406 AS c);

SELECT [a, b, c]
FROM
  (SELECT CAST(5 AS INT64) AS a,
          CAST(37 AS FLOAT64) AS b,
          406 AS c);

Notice that the second example contains three expressions: one that returns an INT64, one that returns a FLOAT64, and one that declares a literal. This expression works because all three expressions share FLOAT64 as a supertype.

To declare a specific data type for an array, use angle brackets (< and >). For example:

SELECT ARRAY<FLOAT64>[1, 2, 3] AS floats;

Arrays of most data types, such as INT64 or STRING, don't require that you declare them first.

SELECT [1, 2, 3] AS numbers;

You can write an empty array of a specific type using ARRAY<type>[]. You can also write an untyped empty array using [], in which case GoogleSQL attempts to infer the array type from the surrounding context. If GoogleSQL cannot infer a type, the default type ARRAY<INT64> is used.

Using generated values

You can also construct an ARRAY with generated values.

Generating arrays of integers

GENERATE_ARRAY generates an array of values from a starting and ending value and a step value. For example, the following query generates an array that contains all of the odd integers from 11 to 33, inclusive:

SELECT GENERATE_ARRAY(11, 33, 2) AS odds;

/*--------------------------------------------------*
 | odds                                             |
 +--------------------------------------------------+
 | [11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33] |
 *--------------------------------------------------*/

You can also generate an array of values in descending order by giving a negative step value:

SELECT GENERATE_ARRAY(21, 14, -1) AS countdown;

/*----------------------------------*
 | countdown                        |
 +----------------------------------+
 | [21, 20, 19, 18, 17, 16, 15, 14] |
 *----------------------------------*/
Generating arrays of dates

GENERATE_DATE_ARRAY generates an array of DATEs from a starting and ending DATE and a step INTERVAL.

You can generate a set of DATE values using GENERATE_DATE_ARRAY. For example, this query returns the current DATE and the following DATEs at 1 WEEK intervals up to and including a later DATE:

SELECT
  GENERATE_DATE_ARRAY('2017-11-21', '2017-12-31', INTERVAL 1 WEEK)
    AS date_array;

/*--------------------------------------------------------------------------*
 | date_array                                                               |
 +--------------------------------------------------------------------------+
 | [2017-11-21, 2017-11-28, 2017-12-05, 2017-12-12, 2017-12-19, 2017-12-26] |
 *--------------------------------------------------------------------------*/

Boolean type

Name Description
BOOL Boolean values are represented by the keywords TRUE and FALSE (case-insensitive).

Boolean values are sorted in this order, from least to greatest:

  1. NULL
  2. FALSE
  3. TRUE

Bytes type

Name Description
BYTES Variable-length binary data.

String and bytes are separate types that cannot be used interchangeably. Most functions on strings are also defined on bytes. The bytes version operates on raw bytes rather than Unicode characters. Casts between string and bytes enforce that the bytes are encoded using UTF-8.

To learn more about the literal representation of a bytes type, see Bytes literals.

Date type

Name Range
DATE 0001-01-01 to 9999-12-31.

The date type represents a Gregorian calendar date, independent of time zone. A date value does not represent a specific 24-hour time period. Rather, a given date value represents a different 24-hour period when interpreted in different time zones, and may represent a shorter or longer day during daylight saving time (DST) transitions. To represent an absolute point in time, use a timestamp.

Canonical format
YYYY-[M]M-[D]D
  • YYYY: Four-digit year.
  • [M]M: One or two digit month.
  • [D]D: One or two digit day.

To learn more about the literal representation of a date type, see Date literals.

JSON type

Name Description
JSON Represents JSON, a lightweight data-interchange format.

Expect these canonicalization behaviors when creating a value of JSON type:

  • Booleans, strings, and nulls are preserved exactly.
  • Whitespace characters are not preserved.
  • A JSON value can store integers in the range of -9,223,372,036,854,775,808 (minimal signed 64-bit integer) to 18,446,744,073,709,551,615 (maximal unsigned 64-bit integer) and floating point numbers within a domain of FLOAT64.
  • The order of elements in an array is preserved exactly.
  • The order of the members of an object is lexicographically ordered.
  • If an object has duplicate keys, the first key that is found is preserved.
  • Up to 80 levels can be nested.
  • The format of the original string representation of a JSON number may not be preserved.

To learn more about the literal representation of a JSON type, see JSON literals.

Numeric types

Numeric types include the following types:

  • INT64
  • NUMERIC
  • FLOAT64

Integer type

Integers are numeric values that do not have fractional components.

Name Range
INT64 -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

To learn more about the literal representation of an integer type, see Integer literals.

Decimal type

Decimal type values are numeric values with fixed decimal precision and scale. Precision is the number of digits that the number contains. Scale is how many of these digits appear after the decimal point.

This type can represent decimal fractions exactly, and is suitable for financial calculations.

Name Precision, Scale, and Range
NUMERIC Precision: 38
Scale: 9
Minimum value greater than 0 that can be handled: 1e-9
Min: -9.9999999999999999999999999999999999999E+28
Max: 9.9999999999999999999999999999999999999E+28

To learn more about the literal representation of a NUMERIC type, see NUMERIC literals.

Floating point type

Floating point values are approximate numeric values with fractional components.

Name Description
FLOAT64 Double precision (approximate) numeric values.

To learn more about the literal representation of a floating point type, see Floating point literals.

Floating point semantics

When working with floating point numbers, there are special non-numeric values that need to be considered: NaN and +/-inf

Arithmetic operators provide standard IEEE-754 behavior for all finite input values that produce finite output and for all operations for which at least one input is non-finite.

Function calls and operators return an overflow error if the input is finite but the output would be non-finite. If the input contains non-finite values, the output can be non-finite. In general functions do not introduce NaNs or +/-inf. However, specific functions like IEEE_DIVIDE can return non-finite values on finite input. All such cases are noted explicitly in Mathematical functions.

Floating point values are approximations.

  • The binary format used to represent floating point values can only represent a subset of the numbers between the most positive number and most negative number in the value range. This enables efficient handling of a much larger range than would be possible otherwise. Numbers that are not exactly representable are approximated by utilizing a close value instead. For example, 0.1 cannot be represented as an integer scaled by a power of 2. When this value is displayed as a string, it is rounded to a limited number of digits, and the value approximating 0.1 might appear as "0.1", hiding the fact that the value is not precise. In other situations, the approximation can be visible.
  • Summation of floating point values might produce surprising results because of limited precision. For example, (1e30 + 1) - 1e30 = 0, while (1e30 - 1e30) + 1 = 1.0. This is because the floating point value does not have enough precision to represent (1e30 + 1), and the result is rounded to 1e30. This example also shows that the result of the SUM aggregate function of floating points values depends on the order in which the values are accumulated. In general, this order is not deterministic and therefore the result is not deterministic. Thus, the resulting SUM of floating point values might not be deterministic and two executions of the same query on the same tables might produce different results.
  • If the above points are concerning, use a decimal type instead.
Mathematical function examples
Left Term Operator Right Term Returns
Any value + NaN NaN
1.0 + +inf +inf
1.0 + -inf -inf
-inf + +inf NaN
Maximum FLOAT64 value + Maximum FLOAT64 value Overflow error
Minimum FLOAT64 value / 2.0 0.0
1.0 / 0.0 "Divide by zero" error

Comparison operators provide standard IEEE-754 behavior for floating point input.

Comparison operator examples
Left Term Operator Right Term Returns
NaN = Any value FALSE
NaN < Any value FALSE
Any value < NaN FALSE
-0.0 = 0.0 TRUE
-0.0 < 0.0 FALSE

For more information on how these values are ordered and grouped so they can be compared, see Ordering floating point values.

String type

Name Description
STRING Variable-length character (Unicode) data.

Input string values must be UTF-8 encoded and output string values will be UTF-8 encoded. Alternate encodings like CESU-8 and Modified UTF-8 are not treated as valid UTF-8.

All functions and operators that act on string values operate on Unicode characters rather than bytes. For example, functions like SUBSTR and LENGTH applied to string input count the number of characters, not bytes.

Each Unicode character has a numeric value called a code point assigned to it. Lower code points are assigned to lower characters. When characters are compared, the code points determine which characters are less than or greater than other characters.

Most functions on strings are also defined on bytes. The bytes version operates on raw bytes rather than Unicode characters. Strings and bytes are separate types that cannot be used interchangeably. There is no implicit casting in either direction. Explicit casting between string and bytes does UTF-8 encoding and decoding. Casting bytes to string returns an error if the bytes are not valid UTF-8.

To learn more about the literal representation of a string type, see String literals.

Struct type

Name Description
STRUCT Container of ordered fields each with a type (required) and field name (optional).

To learn more about the literal representation of a struct type, see Struct literals.

Declaring a struct type

STRUCT<T>

Struct types are declared using the angle brackets (< and >). The type of the elements of a struct can be arbitrarily complex.

Examples

Type Declaration Meaning
STRUCT<INT64> Simple struct with a single unnamed 64-bit integer field.
STRUCT<x STRUCT<y INT64, z INT64>> A struct with a nested struct named x inside it. The struct x has two fields, y and z, both of which are 64-bit integers.
STRUCT<inner_array ARRAY<INT64>> A struct containing an array named inner_array that holds 64-bit integer elements.

Constructing a struct

Tuple syntax

(expr1, expr2 [, ... ])

The output type is an anonymous struct type with anonymous fields with types matching the types of the input expressions. There must be at least two expressions specified. Otherwise this syntax is indistinguishable from an expression wrapped with parentheses.

Examples

Syntax Output Type Notes
(x, x+y) STRUCT<?,?> If column names are used (unquoted strings), the struct field data type is derived from the column data type. x and y are columns, so the data types of the struct fields are derived from the column types and the output type of the addition operator.

This syntax can also be used with struct comparison for comparison expressions using multi-part keys, e.g., in a WHERE clause:

WHERE (Key1,Key2) IN ( (12,34), (56,78) )

Typeless struct syntax

STRUCT( expr1 [AS field_name] [, ... ])

Duplicate field names are allowed. Fields without names are considered anonymous fields and cannot be referenced by name. struct values can be NULL, or can have NULL field values.

Examples

Syntax Output Type
STRUCT(1,2,3) STRUCT<int64,int64,int64>
STRUCT() STRUCT<>
STRUCT('abc') STRUCT<string>
STRUCT(1, t.str_col) STRUCT<int64, str_col string>
STRUCT(1 AS a, 'abc' AS b) STRUCT<a int64, b string>
STRUCT(str_col AS abc) STRUCT<abc string>

Typed struct syntax

STRUCT<[field_name] field_type, ...>( expr1 [, ... ])

Typed syntax allows constructing structs with an explicit struct data type. The output type is exactly the field_type provided. The input expression is coerced to field_type if the two types are not the same, and an error is produced if the types are not compatible. AS alias is not allowed on the input expressions. The number of expressions must match the number of fields in the type, and the expression types must be coercible or literal-coercible to the field types.

Examples

Syntax Output Type
STRUCT<int64>(5) STRUCT<int64>
STRUCT<date>("2011-05-05") STRUCT<date>
STRUCT<x int64, y string>(1, t.str_col) STRUCT<x int64, y string>
STRUCT<int64>(int_col) STRUCT<int64>
STRUCT<x int64>(5 AS x) Error - Typed syntax does not allow AS

Limited comparisons for structs

Structs can be directly compared using equality operators:

  • Equal (=)
  • Not Equal (!= or <>)
  • [NOT] IN

Notice, though, that these direct equality comparisons compare the fields of the struct pairwise in ordinal order ignoring any field names. If instead you want to compare identically named fields of a struct, you can compare the individual fields directly.

Timestamp type

Name Range
TIMESTAMP 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999999 UTC.

A timestamp value represents an absolute point in time, independent of any time zone or convention such as daylight saving time (DST), with nanosecond precision.

A timestamp is typically represented internally as the number of elapsed nanoseconds since a fixed initial point in time.

Note that a timestamp itself does not have a time zone; it represents the same instant in time globally. However, the display of a timestamp for human readability usually includes a Gregorian date, a time, and a time zone, in an implementation-dependent format. For example, the displayed values "2020-01-01 00:00:00 UTC", "2019-12-31 19:00:00 America/New_York", and "2020-01-01 05:30:00 Asia/Kolkata" all represent the same instant in time and therefore represent the same timestamp value.

  • To represent a Gregorian date as it might appear on a calendar (a civil date), use a date value.
Canonical format

For Rest and RPC APIs

Follow the rules for encoding to and decoding from JSON values as described in TypeCode (RPC) and TypeCode (REST). In particular, the timestamp value must end with an uppercase literal "Z" to specify Zulu time (UTC-0).

For example:

2014-09-27T12:30:00.45Z

Timestamp values must be expressed in Zulu time and cannot include a UTC offset. For example, the following timestamp is not supported:

-- NOT SUPPORTED! TIMESTAMPS CANNOT INCLUDE A UTC OFFSET WHEN USED WITH THE REST AND RPC APIS
2014-09-27 12:30:00.45-8:00

For client libraries

Use the language-specific timestamp format.

For SQL queries

The canonical format for a timestamp literal has the following parts:

{
  civil_date_part[time_part [time_zone]] |
  civil_date_part[time_part[time_zone_offset]] |
  civil_date_part[time_part[utc_time_zone]]
}

civil_date_part:
    YYYY-[M]M-[D]D

time_part:
    { |T|t}[H]H:[M]M:[S]S[.F]
  • YYYY: Four-digit year.
  • [M]M: One or two digit month.
  • [D]D: One or two digit day.
  • { |T|t}: A space or a T or t separator. The T and t separators are flags for time.
  • [H]H: One or two digit hour (valid values from 00 to 23).
  • [M]M: One or two digit minutes (valid values from 00 to 59).
  • [S]S: One or two digit seconds (valid values from 00 to 60).
  • [.F]: Up to nine fractional digits (nanosecond precision).
  • [time_zone]: String representing the time zone. When a time zone is not explicitly specified, the default time zone, America/Los_Angeles, is used. For details, see time zones.
  • [time_zone_offset]: String representing the offset from the Coordinated Universal Time (UTC) time zone. For details, see time zones.
  • [utc_time_zone]: String representing the Coordinated Universal Time (UTC), usually the letter Z or z. For details, see time zones.

To learn more about the literal representation of a timestamp type, see Timestamp literals.

Time zones

A time zone is used when converting from a civil date or time (as might appear on a calendar or clock) to a timestamp (an absolute time), or vice versa. This includes the operation of parsing a string containing a civil date and time like "2020-01-01 00:00:00" and converting it to a timestamp. The resulting timestamp value itself does not store a specific time zone, because it represents one instant in time globally.

Time zones are represented by strings in one of these canonical formats:

  • Offset from Coordinated Universal Time (UTC), or the letter Z or z for UTC.
  • Time zone name from the tz database.

The following timestamps are identical because the time zone offset for America/Los_Angeles is -08 for the specified date and time.

SELECT UNIX_MILLIS(TIMESTAMP '2008-12-25 15:30:00 America/Los_Angeles') AS millis;
SELECT UNIX_MILLIS(TIMESTAMP '2008-12-25 15:30:00-08:00') AS millis;

Offset from Coordinated Universal Time (UTC)

Format:

{+|-}H[H][:M[M]]
{Z|z}

Examples:

-08:00
-8:15
+3:00
+07:30
-7
Z

When using this format, no space is allowed between the time zone and the rest of the timestamp.

2014-09-27 12:30:00.45-8:00
2014-09-27T12:30:00.45Z

Time zone name

Format:

tz_identifier

A time zone name is a tz identifier from the tz database. For a less comprehensive but simpler reference, see the List of tz database time zones on Wikipedia.

Examples:

America/Los_Angeles
America/Argentina/Buenos_Aires
Etc/UTC
Pacific/Auckland

When using a time zone name, a space is required between the name and the rest of the timestamp:

2014-09-27 12:30:00.45 America/Los_Angeles

Note that not all time zone names are interchangeable even if they do happen to report the same time during a given part of the year. For example, America/Los_Angeles reports the same time as UTC-7:00 during daylight saving time (DST), but reports the same time as UTC-8:00 outside of DST.

If a time zone is not specified, the default time zone value is used.

Leap seconds

A timestamp is simply an offset from 1970-01-01 00:00:00 UTC, assuming there are exactly 60 seconds per minute. Leap seconds are not represented as part of a stored timestamp.

If the input contains values that use ":60" in the seconds field to represent a leap second, that leap second is not preserved when converting to a timestamp value. Instead that value is interpreted as a timestamp with ":00" in the seconds field of the following minute.

Leap seconds do not affect timestamp computations. All timestamp computations are done using Unix-style timestamps, which do not reflect leap seconds. Leap seconds are only observable through functions that measure real-world time. In these functions, it is possible for a timestamp second to be skipped or repeated when there is a leap second.