Working with BigQuery GIS data

BigQuery GIS allows you to analyze geographic data in BigQuery. Geographic data is also known as geospatial data.

BigQuery GIS adds support for a GEOGRAPHY data type to standard SQL. The GEOGRAPHY data type represents a pointset on the Earth's surface. A pointset is a set of points, lines and polygons on the WGS84 reference spheroid, with geodesic edges.

You use the GEOGRAPHY data type by calling one of the standard SQL geography functions. The output of the geography functions is rendered as WKT (well-known text). WKT uses a longitude first, latitude second format.

Geospatial data formats

Single points on Earth can be described by just a (longitude, latitude) pair. For describing more complex geographies such as lines and polygons, BigQuery allows you to load geospatial data into a GEOGRAPHY column if the data is in one of the following supported formats:

Differences between WKT and GeoJSON

Geometry objects versus spatial features and feature collections

Common types of objects when working with spatial data include the following:

  • An individual geometry or GEOGRAPHY value represents a surface area on the Earth. It is often described using points, lines, polygons, or a collection of points, lines, and polygons. A geometry collection represents a spatial union of all shapes in the collection.
  • A spatial feature represents a logical spatial object. It combines geometry with arbitrary additional application-specific attributes.
  • A spatial feature collection is a set of feature objects.

WKT is a text format for describing individual geometry shapes using points, lines, polygons with optional holes, or a collection of points, lines, or polygons. For example, a point in WKT would look like the following:

POINT(-121 41)

To describe a spatial feature, WKT is usually embedded in some container file format, often CSV file, or in a database table. A file row or a table row usually corresponds to the spatial feature. The whole file or the table correspond to the feature collection.

WKB is binary version of WKT format.

GeoJSON is a more complex, JSON-based format for geometries and spatial features. For example, a point in GeoJSON would look like the following:

{ "type": "Point", "coordinates": [-121,41] }

GeoJSON is used to describe one of the following:

  • An individual geometry object. A single object can have a complex spatial shape, described as a union of points, lines, and polygons with optional holes. This is similar to WKT format usage.
  • A Feature object. A feature object is an object with a geometry, plus arbitrary additional named properties. This is similar to one row in a CSV file with a WKT column.
  • A FeatureCollection. Feature collections are a set of feature objects similar to a table in a database or a CSV file with many rows and columns.

BigQuery GIS supports only individual geometry objects in GeoJSON. BigQuery GIS does not currently support GeoJSON feature objects, feature collections, or the GeoJSON file format.

Coordinate systems and edges

The WKT format does not provide a coordinate system, so BigQuery GIS defines one. In BigQuery GIS, WKT points are positions on the surface of a WGS84 spheroid, expressed as longitude and geodetic latitude. An edge is a spherical geodesic between two endpoints. In GeoJSON, the coordinate system is explicitly WGS84 coordinates with planar edges.

To convert between these two kinds of edges, BigQuery GIS adds additional points to the line where necessary so the converted sequence of edges remains within 10 meters of the original line. This is a process known as tessellation or non-uniform densification. Currently, you cannot directly control the tessellation process.

For importing geographies with spherical edges, use WKT as in the following example:

  ST_GeogFromText(wkt) AS g

For importing geographies with planar edges, often called "geometries", use GeoJSON as in the following example:

  ST_GeogFromGeoJSON(geocol) AS g

You may also exclude the original GeoJSON column from the results:

  * EXCEPT(geocol),
  ST_GeogFromGeoJSON(geocol) AS geocol

Be sure to use the proper format. Most systems will either advertise their support for parsing geography, as opposed to geometry, from WKT, or they assume planar edges, in which case GeoJSON should be used as an interchange format.

Your coordinates should be longitude first, latitude second. If the geography has any long segments or edges then they must be tessellated, because BigQuery GIS interprets them as spherical geodesics, which may not correspond to the coordinate system where your data originated.

Polygon orientation

On a sphere, every polygon has a complimentary polygon. For example, a polygon that describes the Earth's continents would have a complimentary polygon that describes the Earth's oceans. Because the two polygons are described by the same boundary rings, rules are required to resolve the ambiguity around which of the two polygons is described by a given WKT string.

When you load WKT and WKB strings from files or by using streaming ingestion, BigQuery GIS assumes the polygons in the input are oriented as follows: if you traverse the boundary of the polygon in the order of the input vertices, the interior of the polygon is on the left. BigQuery GIS uses the same rule when exporting geography objects to WKT and WKB strings.

When you build geography objects from a WKT string by using the ST_GeogFromText function, there are two options to determine the polygon described by the WKT string:

  1. (Default) Interpret the input as the polygon with the smaller area. Do not assume oriented polygons (oriented = FALSE).

  2. Assume oriented polygons to allow loading polygons with an area larger than a hemisphere (oriented = TRUE).

These rules do not apply when you load GeoJSON strings. Because GeoJSON strings are defined on a planar map, the orientation can be determined without ambiguity even if the input does not follow the orientation rule defined in GeoJSON RFC 7946, section 3.1.6. — Polygon: counterclockwise external rings, clockwise internal rings.

Loading BigQuery GIS data

When you load BigQuery GIS data into BigQuery, you can specify a GEOGRAPHY column in the table's schema definition. When you specify the column's data type as GEOGRAPHY, BigQuery GIS can detect whether the data is in WKT or GeoJSON format.

When you load GeoJSON geometry objects into a GEOGRAPHY column, they should be formatted as text strings, not as JSON objects. This is true even if the object is being loaded from a newline-delimited JSON file.

If you load data using schema auto-detect, geography values are loaded as STRINGs. Currently, schema auto-detect cannot detect geography columns.

For more information about loading data into BigQuery, see Introduction to Loading Data from Cloud Storage.

Transforming BigQuery GIS data

If your table contains separate columns for longitude and latitude, you can transform the values into geographies using standard SQL geography functions such as ST_GeogPoint. For example, if you have two DOUBLE columns for longitude and latitude, you can create a geography column using the following query:

  ST_GeogPoint(longitude, latitude) AS g

BigQuery currently supports converting WKT and GeoJSON strings to geography types. Shapefiles and many other formats should be converted using external tools.

Dealing with improperly formatted spatial data

When you load data into BigQuery, you may encounter invalid WKT or GeoJSON data from other tools that fails to be converted to a GEOGRAPHY column. For example, an error such as Edge K has duplicate vertex with edge N indicates that the polygon has duplicate vertices (besides the first and last).

To avoid formatting issues, you can use a function that generates standards-compliant output. For example, when you export data from PostGIS, you can use the ST_MakeValid function to standardize the output.

To find or to ignore the improperly formatted data, use the SAFE function prefix to output the problematic data. For example, the following query uses the SAFE prefix to retrieve improperly formatted spatial data.

  geojson AS bad_geojson
  geojson IS NOT NULL
  AND SAFE.ST_GeogFromGeoJson(geojson) IS NULL

Partitioning and clustering BigQuery GIS data

You can partition and cluster tables that contain GEOGRAPHY columns. You can use a GEOGRAPHY column as a clustering column, but you cannot use a GEOGRAPHY column as a partitioning column.

If you store GEOGRAPHY data in a table that is partitioned or clustered and your queries filter data by using a spatial predicate, ensure your geography data is spatially compact. A spatial predicate calls a boolean geography function and has a GEOGRAPHY column as one of the arguments. The following sample shows a spatial predicate that uses the ST_DWithin function:

WHERE ST_DWithin(geo, ST_GeogPoint(longitude, latitude), 100)

For example, if you have a table with columns for COUNTRY, STATE, and ZIP, add a column to the table to store a concatenated version of these columns. The following query fragment demonstrates this:

CONCAT(country, '+', IFNULL(state, ''), '+', IFNULL(zip, '')) as loc

In this example, IFNULL is used to eliminate missing values. After you create the concatenated column, you can use it to cluster the table.

Using JOINs with spatial data

Spatial JOINs are joins of two tables with a predicate geographic function in the WHERE clause. For example:

-- how many stations within 1 mile range of each zip code?
    zipcode AS zip,
    ST_GeogFromText(ANY_VALUE(zip_codes.zipcode_geom)) AS polygon,
    COUNT(*) AS bike_stations
    `bigquery-public-data.new_york.citibike_stations` AS bike_stations,
    `bigquery-public-data.utility_us.zipcode_area` AS zip_codes
         ST_GeogPoint(bike_stations.longitude, bike_stations.latitude),
ORDER BY bike_stations DESC

Spatial joins perform better when your geography data is persisted. The example above creates the geography values in the query. It is more performant to store the geography values in a BigQuery table.

For example, the following query retrieves longitude, latitude pairs and converts them to geographic points. When you run this query, you specify a new destination table to store the query results:

  ST_GeogPoint(pLongitude, pLatitude) AS p

BigQuery implements optimized spatial JOINs for INNER JOIN and CROSS JOIN operators with the following standard SQL predicate functions:

Spatial joins are not optimized:

  • For LEFT, RIGHT or FULL OUTER joins
  • In cases involving ANTI joins
  • When the spatial predicate is negated

A JOIN that uses the ST_DWithin predicate is optimized only when the distance parameter is a constant expression.

Exporting spatial data

When you export spatial data from BigQuery, GEOGRAPHY column values are always formatted as WKT strings. To export data in GeoJSON format, use the ST_AsGeoJSON function.

If the tools you're using to analyze the exported data do not understand the GEOGRAPHY data type, you can convert the column values to strings using a geographic function such as ST_AsText or ST_AsGeoJSON. BigQuery GIS adds additional points to the line where necessary so that the converted sequence of edges remains within 10 meters of the original geodesic line.

For example, the following query uses ST_AsGeoJSON to convert GeoJSON values to strings.

  ST_AsGeoJSON(ST_MakeLine(ST_GeogPoint(1,1), ST_GeogPoint(3,2)))

The resulting data would look like the following:

{ "type": "LineString", "coordinates": [ [1, 1], [1.99977145571783, 1.50022838764041], [2.49981908082299, 1.75018082434274], [3, 2] ] }

The GeoJSON line has two additional points. BigQuery GIS adds these points so that the GeoJSON line closely follows the same path on the ground as the original line.

Working with geography in the BigQuery client libraries

Only the BigQuery client library for Python currently supports the GEOGRAPHY data type. For other client libraries, please convert GEOGRAPHY values to strings using the ST_ASTEXT or ST_ASGEOJSON function. For example, use the ST_AsText function: ST_AsText(ANY_VALUE(zip_regions_geometry.geometry)) AS geometry.

Converting to text using ST_AsText stores only one value, and converting to WKT means that the data is annotated as a STRING type instead of a GEOGRAPHY type.

What's next

หน้านี้มีประโยชน์ไหม โปรดแสดงความคิดเห็น


หากต้องการความช่วยเหลือ ให้ไปที่หน้าการสนับสนุน