Data Analytics

Querying the Stars with BigQuery GIS

QueryTheStars.png

Many organizations maintain large data warehouses full of analytics, sales numbers, performance metrics, and more. But nature gives us other massive datasets, including a night sky full of stars. While BigQuery GIS was explicitly designed to serve the needs of geospatial users here on Earth, its spherical coordinate systems and built-in transformation functions are equally well suited to another domain for spherical coordinates: astronomy.

What makes BigQuery a great platform for analyzing astronomy datasets?

  • BigQuery is intended for online analysis (OLAP), and optimized to work with massive datasets that are not transactional. That is true for most work with astronomy catalogs that are released every year or so, depending on the project.

  • BigQuery supports queries on spherical geometry, using BigQuery GIS. Locating objects on the celestial sphere requires spherical geometry.

  • BigQuery GIS can query astronomy data nearly as fast as more specialized database platforms, and may be faster when used to perform full table scans.

And there’s no lack of astronomy data to explore. For example, catalog data organizes the observations of a telescope project into giant tables. Some of the larger catalog datasets comprise a billion or so objects with many observed features, and for some features, these datasets include observations that span over the hours or years. WISE and Gaia are satellite-based telescopes that provide us with high resolution image data. LSST, a major new ground-based telescope, will soon come online. It is mandated to release catalogs of observed objects over the 10 year life of the project. Later in this post, we’ll explore how to use BigQuery GIS with this kind of catalog data.  

Understanding the celestial coordinate system

But before we show you examples of how to query astronomy catalog data with BigQuery, let’s take a step back and discuss the broad set of functions implemented in BigQuery GIS to support your GIS needs.

Look down for a second

Consider that the Earth is a sphere, and that you find yourself on the two-dimensional surface of our planet with latitude and longitude, easily obtained from a global positioning system (GPS) that locates you and guides you to where you want to go using "lat and long" coordinates.

If you want to find out how long a trip is, remembering your high school geometry, you might think you can find the total distance using the Pythagorean theorem. In some cases, that might seem to work at first, but the farther you travel, your situation quickly becomes more complex. First, you need to convert your source and destination, lat and long, to Cartesian coordinates on a Euclidean plane, and convert angles to meters or miles. And worse, Euclidean distance is all about planar geometry, but surface or the earth is not flat (rather, it’s spherical), so Pythagoras' theorem doesn’t work. The ancient Greek and Islamic mathematicians had most of the math worked out 1000 years ago, but that doesn't make it any easier. The good news is that BigQuery GIS takes advantage of Google’s S2 Geometry library that can help you perform these calculations, so you can access all that above-mentioned messy geometry in much simpler Standard SQL. You can calculate the distance between points on earth, and get fancier still doing work with regions, polygons and so on. It’s very powerful, and pretty easy to use.

Ad astra

Now that you have an understanding of terrestrial geometry, let’s look back up to the stars! BigQuery GIS uses the same basic concepts to track celestial bodies as it does to track things on Earth. In other words, to locate a star in the sky, you assign a coordinate, like lat and long, that points you to exactly where you will find the star in space. But hold on, space is not a sphere! Space is literally a fully three-dimensional sort-of-infinite expanse of stars, galaxies, black holes, planets, quasars, pulsars, and nebulae. They’re all spread out, light years away, not anything like the surface of the earth where I am trying to get from my house to the nearest Google office using GPS coordinates.

Here's where it gets interesting: all the celestial objects I describe above are so distant that we can't easily tell the difference between a closer object and a farther object. They might as well be points of light on a giant black sphere with the Earth at its center, which is kind of what it looks like at night when you look up at the sky. (Although we’re not here to discuss the history of astronomy, avid historians of science will recall that this is exactly the model the ancient Greeks—and up until quite recently all their intellectual descendents—used to describe the heavens. If you are interested, I recommend The Structure of Scientific Revolutions, by Thomas S Kuhn.)

So, back to the celestial sphere. If the night sky and all the celestial bodies are indistinguishable from a giant sphere with the Earth at its center, my earlier proposal to assign a latitude and longitude to locate objects seems reasonable. In fact, astronomers do exactly that. They assign what they call the coordinates right ascension (ra) and declination (dec). These coordinates work exactly like latitude and longitude. Sometimes, right ascension is written in more historical notation using hours, minutes, and seconds.

Let's look at an example. Vega (a star famous from the movie Contact) can be found at RA 18h 36m 56s,  Dec +38° 47′ 1″. Fortunately, modern astronomical data typically uses degrees and decimal points to store coordinates, just like modern geographers do. In modern notation, Vega has the same declination (+39°) as the longitude (39° N) of Kansas City. This means once a day people in Kansas city can look straight up to see Vega (if it's night time). This daily rotation clearly hints at the historical use of the 24 hour system for right ascension.

celestial coordinate system.png

As you can see, the celestial coordinate system is just like the geographic coordinate system, except in astronomy you are looking up and in geography you are looking down.

At this point we have established (somewhat loosely) that a spherical coordinate system using ra and dec is a valid way to locate objects on the celestial sphere, just as we use lat and long to locate objects on the surface of our spherical of the earth. It’s also important to note the following:

  1. The celestial sphere is exactly spherical, by design, so any correction available to the GIS system due to the earth being somewhat flattened (ellipsoidal) should be disabled. Conveniently, BigQuery GIS defaults to use an exact sphere.

  2. The poles of the celestial sphere align with the geographic poles of the earth. The coordinates (ra, dec) remain fixed with respect to the positions of the stars.

There are a wide variety of queries that an astronomer may need to perform. Here are some examples from LSST, or you can follow along below with an example on WISE data.  

An example and a data set

The WISE data set contains a table of objects and the multi-epoch (or time-series) data for those objects. These are typically called "light curves."  One interesting example is Beta Lyrae eclipsing binary AH Cep. Here’s the query to access the data for these light curves from the BigQuery AllWise dataset:

  SELECT
  w1mpro_ep,
  mjd,
  load_id,
  frame_id
FROM
  `bigquery-public-data.wise_all_sky_data_release.mep_wise`
WHERE
  source_id_mf='3425p651_ac51-032187'
ORDER BY
  mjd ASC

This returns the data plotted below using Data Studio.

data studio plot.png

For the purposes of benchmarking, we opted to demonstrate a realistic query, something an astronomer might be interested in doing. After initial tests with the raw tables as loaded, we applied four important optimizations:

  • We partitioned the tables

  • We clustered the data on the integer value Level 7 HTM spatial index key, a triangulation of the celestial sphere

  • We pre-calculated the location of objects using the POINT geometry type

  • We used ST_CONTAINS instead of ST_WITHIN to restrict the region of space to reduce the size of the data set

The final query is below. We chose it to be representative of the nearest neighbor type of query expected in astronomical queries.

  CREATE TEMP FUNCTION
  ArcSecondDistance(p1 GEOGRAPHY, p2 GEOGRAPHY,
    d FLOAT64) AS (ST_DISTANCE(p1,p2) < d * 30.8874796235);
SELECT
  source_id_mf,
  point
FROM
  `bigquery-public-data.wise_all_sky_data_release.mep_wise`
WHERE
  ArcSecondDistance(point,
    ST_GEOGPOINT(201.5, -2.6),60)
  AND ST_CONTAINS(
   ST_GEOGFROMTEXT('Polygon((201.00 -3.10,201.00 -2.10,202.00 -2.10,202.00 -3.10, 201.00 -3.10))'),point)

The combination of these four optimizations reduces the median query time on the 17 terabyte (TB) table from 60 seconds down to 4. This puts BigQuery very close to the performance of database platforms optimized to quickly retrieve information related to a single astronomical source. Additionally, when it comes to full table scans, BigQuery may show significant advantages.

Best of all, it’s still early days for BigQuery GIS and astronomy datasets. We are excited to bring more astronomy catalogs to the BigQuery public data sets. The WISE data set is only the first of several planned. To get started with BigQuery GIS, you can learn to analyze terrestrial data by checking out its documentation. If you’re interested in another example of using BigQuery to record natural phenomena, check out this excellent tutorial on using BigQuery GIS to plot a hurricane's path. To explore how you might run analytics on your business’s terrestrial GIS data, have a look at this tutorial on bicycles in New York City. We can’t wait to hear what you discover in your geospatial (or astronomical) data.