Analyzing Financial Time Series using BigQuery

If you're a quantitative analyst, you use a variety of tools and techniques to mine big data, such as market transaction histories, for information that can provide insight into market trends. Because quotes, trades, and other events happen at predictable intervals, such data represents a financial time series that you can analyze by using established techniques, including frequency analysis and moving averages.

But dealing with massive datasets can be challenging. Traditional tools might not scale as the dataset continually grows. Storage requirements can grow as fast as the dataset, so downloading data to your computer's hard drive is no longer a workable approach. And it can take a long time to retrieve the right subsets of data from a traditional database query.

Google BigQuery solves these issues and others by enabling you to run SQL-like queries against append-only tables and then returning results very quickly by using the processing power of Google's infrastructure. You can use BigQuery on the web, on the command line, and through APIs, including REST APIs. When combined with other components of Google Cloud Platform, and even third-party tools, BigQuery enables you to build the data-analysis solutions you need now, while being confident that you can scale your solutions in the future.

Security is always important when working with financial data. Google Cloud Platform helps to keep your data safe, secure, and private in several ways. For example, all data is encrypted during transmission and when at rest, and the Cloud Platform is ISO 27001, SOC3, FINRA, and PCI compliant.

Objectives

  • Load a dataset into BigQuery.
  • Use the BigQuery web UI to perform various financial time series queries.
  • Visualize the results of the queries.

Prerequisites

  • Enable the BigQuery API for a Google Cloud Platform Console project.
  • Enable billing.
  • Install the Google Cloud SDK on your client operating system.

Load the sample data

This tutorial uses foreign exchange (FX) spot-market data, which represents quotes of international currency exchange rates over time. These values record the prices offered by brokers for the kinds of currency trades that happen such as when people travel, when large financial institutions move funds between countries or regions, or when speculators seek to make a profit by trading currencies. The data uses the following format:

<venue>,
<from-currency>/<to-currency>,
<timestamp>,
<bid-price>,
<ask-price>

All quotes in the sample dataset are for exchanges between the British pound sterling (GBP) and the U.S. dollar (USD). Timestamps are always expressed in Greenwich Mean Time (GMT) and the last three digits are always zero, which makes the resolution of the time series one millisecond. The bid price represents what the buyers offered to pay for the currency, while the ask price represents the price sellers offered per unit.

To load the sample data, follow these steps.

  1. Set the current project. In a terminal window, enter the following command:

    gcloud config set project <your_project_id>
    
  2. Create a new dataset in BigQuery.

    bq mk timeseries
    
  3. Load the data from two CSV files. The first file contains the data for January, 2014, and the second file contains the data for February. It can take some time to load these datasets.

    bq load timeseries.gbpusd_0114 gs://solutions-public-assets/time-series-master/GBPUSD_2014_01.csv venue:STRING,currencies:STRING,time:TIMESTAMP,bid:FLOAT,ask:FLOAT
    
    bq load timeseries.gbpusd_0214 gs://solutions-public-assets/time-series-master/GBPUSD_2014_02.csv venue:STRING,currencies:STRING,time:TIMESTAMP,bid:FLOAT,ask:FLOAT
    
  4. Open the BigQuery web UI.

  5. View the table schema. Under your project name, expand the timeseries dataset and then click gbpusd_0114.

The following figure shows the table schema for the gbpusd_0114 table. The schema for the gbpusd_0214 table is identical.

Table details

Query for January quotes

Start with some simple queries that you can use to see what the FX data looks like.

  1. In the BigQuery web UI, click Compose Query.

  2. In the New Query text box, enter the following query.

    SELECT
     venue,
     currencies,
     time,
     bid,
     ask
    FROM
     timeseries.gbpusd_0114
    ORDER BY
     time ASC
    LIMIT   1000;
    
  3. Click Run Query.

    The query limits the number of results to the first 1000 rows to avoid returning a huge set of data in the browser and to reduce costs. Take a moment to browse the data by using the controls that the tool provides, such as the Next and Prev links. Notice that the format for the data in the time column doesn't show the full resolution of the underlying data, down to milliseconds:

    2014-01-01 00:10:10 UTC
    
  4. Modify the query to use a built-in function to reformat the time information. In the New Query text box, change the first part of the SELECT statement as follows.

    SELECT FORMAT_UTC_USEC(time) AS time,
    

    FORMAT_UTC_USEC is a built-in function that reformats the time data as Coordinated Universal Time (UTC). BigQuery provides many built-in functions that you can use to make it easier to query large data sets.

  5. Click Run Query to see the results. The data format is now in UTC showing the full resolution of the timestamp.

    2014-01-01 00:00:05.763000
    

View the bids within a time frame

Next, modify the query to restrict the results to a particular period of time. This will produce a more manageable set of results that can be downloaded quickly as a CSV file.

In the New Query text box, enter and run the following query. This query restricts the results to a 30-minute time frame on January 1st.

SELECT
  TIME(time) AS time,
  bid
FROM
  timeseries.gbpusd_0114
WHERE
  time BETWEEN TIMESTAMP("2014-01-01 00:00:00.000")
  AND TIMESTAMP("2014-01-01 00:29:59.999")
ORDER BY
  time ASC;

Notice that this query uses the TIME function to show an abbreviated time format. Because you know that that the results are from a particular day, you don't need to see the full timestamp, and this format will look better in the chart you create in the next step.

You can visualize this data by downloading the results as a CSV file and viewing a chart in Google Sheets. Follow these steps:

  1. Click Download as CSV.
  2. Open Google Sheets and create a new spreadsheet.
  3. Click File > Import. Import your results file by uploading the file from your computer.
  4. In the Import file dialog box, select Replace current sheet and then click Import.
  5. Press CTRL+A to select all the data in the spreadsheet.
  6. Click Insert > Chart to create a new chart from the data.
  7. On the Charts tab, select a line chart and then click Insert.

The following figure shows the data as a line chart.

Line chart showing bids

Now you can easily see that the dollar trended lower against the pound for most of the first 20 minutes of the day, and then recovered a bit for the next five minutes.

Combine tables

The data you loaded into BigQuery was split between two CSV files and now is in two tables. You can easily combine the data by embedding a table query, as shown in the following example. In this case, the SQL statement combines all the tables that have IDs containing "gbpusd", which is true for both of the tables you created. The query simply returns the first and last timestamps in the data set.

SELECT
  MIN(time) AS time1,
  MAX(time)AS time2
FROM
  (TABLE_QUERY(timeseries,
      'table_id CONTAINS "gbpusd"'));

You can also combine specific tables, by name. For example, the following query retrieves the ask prices around midnight GMT between January 31 and February 1. Note that when you list columns separated by commas in a the SELECT clause, BigQuery SQL performs a UNION, not the traditional JOIN operation that you might expect if you've used SQL before.

SELECT
  STRFTIME_UTC_USEC(time,"%m-%d-%Y %r") AS time,
  ask
FROM
  timeseries.gbpusd_0114,
  timeseries.gbpusd_0214
WHERE
  time BETWEEN TIMESTAMP("2014-01-31 23:30:00.000")
  AND TIMESTAMP("2014-02-01 00:29:59.999")
ORDER BY
  time ASC;

This query uses the STRFTIME_UTC_USEC function to format the timestamp for two reasons. First, the timestamp needs to be in a format that can be sorted across the two days. A format such as HOUR would not work because the sorted result would not take the change in day into account. Second, the label for the chart should be in an appropriate format. While you could, for example, label the chart with the full UTC time including microseconds, the addition timing information isn't particularly useful as a chart label.

The following chart shows a graph of the results.

Line chart across files

Now you can see exactly what happened to the ask price across the boundary between the two tables.

Analyze the frequency of data

In financial markets, sometimes the rate of change in quotes can yield interesting information. You can run queries to get information about the number of FX quotes, called ticks, in a given time period. It's important to note that a tick is not a consistent unit of time, like the ticking of a clock. One tick is a single quote that often represents a change in price.

The following query returns the number of ticks per hour on January 16, 2014.

SELECT
  HOUR(time) AS hour,
  COUNT(time) AS num_ticks
FROM
  timeseries.gbpusd_0114
WHERE
  time BETWEEN TIMESTAMP("2014-01-16 00:00:00.000")
  AND TIMESTAMP("2014-01-16 23:59:59.999")
GROUP BY
  hour
ORDER BY
  hour ASC;

The following bar chart shows the results.

Bar chart with frequency

You can see in the chart that the greatest amount of activity happened between noon and 4pm GMT on that day.

Next, run the following query to see the average number of ticks for each hour of the day over the entire month of January 2014. The query computes the average by dividing the total number of ticks by the number of trading days in the month and then grouping by the hour.

SELECT
  HOUR(time) AS hour,
  COUNT(time)/COUNT(DISTINCT DAY(time)) AS avg_ticks
FROM
  timeseries.gbpusd_0114
GROUP BY
  hour
ORDER BY
  hour ASC;

The following chart shows the distribution of the averages. You can see that the distribution is similar to the result you got for January 16, so that day was a fairly average day for that month when it comes to quote activity.

Bar chart showing average frequency

Compute a moving average

One fundamental technique for financial time-series analysis is the simple moving average (SMA). You can use BigQuery to compute an SMA of the FX data. The following query retrieves a time series for the 60-second SMA for bid prices in a three-hour period during the morning of January 16th.

SELECT
  TIME(S1.time) AS bid_time,
  AVG(S2.bid) AS avg_bid
FROM
  timeseries.gbpusd_0114 AS S1
JOIN EACH
  timeseries.gbpusd_0114 AS S2
ON
  S1.time = S2.time
WHERE
  TIMESTAMP_TO_SEC(S2.time)
  BETWEEN (TIMESTAMP_TO_SEC(S1.time) - 60)
  AND TIMESTAMP_TO_SEC(S1.time)
  AND
  S1.time
  BETWEEN TIMESTAMP("2014-01-16 09:00:00.000")
  AND TIMESTAMP("2014-01-16 11:59:59.999")
GROUP BY
  bid_time
ORDER BY
  bid_time ASC;

This query uses the built-in AVG function to compute the averages. It creates the window for the moving average by first performing a self join on the table's timestamps and then using the WHERE clause to limit the result of the join to a 60-second time range. In the JOIN clause, the query uses the EACH modifier, which gives BigQuery a hint that the join references two large tables. The query execution engine uses this hint to perform optimizations. Finally, the GROUP BY clause aggregates the moving averages and the ORDER BY clause sorts the results.

The following chart shows the line graph of the data.

Line chart showing moving average

Delete the test data

When you're done working with the test data, you should delete the dataset. Follow these steps:

  1. In the BigQuery web UI, to the right of the timeseries dataset name, click the down-arrow button.
  2. Click Delete dataset.
  3. When you're prompted to confirm the delete operation, click OK.

To learn more about BigQuery, read the BigQuery documentation.

Next steps

Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...