Hide
Google BigQuery

BigQuery API Quickstart

This page provides a 25 minute exercise in building a simple command-line application (in either Java or Python) with the Google BigQuery API. This simple application will run a query on one of the available Sample Datasets and display the result.

  1. Prerequisites
  2. Create a New BigQuery Project
  3. Generate a Client ID and Client Secret for your Application
  4. Import Google API Client Libraries
  5. Authorize Access to the BigQuery API
  6. Running Queries
  7. Next Steps
  8. Complete Source Code

Prerequisites

  1. Review basic BigQuery concepts and terminology

    See what is BigQuery for a list of basic terms and concepts to become familiar with, such as jobs and projects.

  2. Read about how to activate the BigQuery service from a Google Developers Console project

    If you are a member of an existing Google Developers Console project that already has BigQuery enabled, you can use that project to build your application. To learn how to sign up for BigQuery, see sign up.

  3. Be able to write and run a simple application in either Java or Python

    The BigQuery service provides a REST-based API that can be programmatically accessed using Java or Python. In addition to a basic understand of how to develop Java or Python applications, you should also be able to download and install additional libraries before attempting this tutorial.

Create a New BigQuery Project

Before you can use the BigQuery service from server-side code or installed applications, you must first create a Google Developers Console project. This project provides three unique values that you must use with your application to identify it to Google and authorize it to use the BigQuery API. The first is a project number, which identifies which project you are making queries from. The other two values are the client ID and a client secret. These values identify your application to Google to facilitate authorization of the BigQuery API. They also enable per-project accounting for quotas and billing.

  1. Visit the Google Developers Console, create a new project.
  2. Expand APIS & AUTH, and click on APIs. In that list, search for and enable the BigQuery API.
  3. Finally, note the number of the project you just created. You can find the project number listed at the top of the project overview page for your project: https://console.developers.google.com/project/apps~

Generate a Client ID and Client Secret for your Application

In order to use OAuth 2.0 for web server applications, your application will need a client ID and client secret.

  1. Visit the Google Developers Console and select the project you just created.
  2. Click on APIS & AUTH on the left, then select Credentials. Click on "Create new Client ID."
  3. Select Installed application.
  4. Click Create client ID, then click Download JSON to download the client file.
  5. Copy the downloaded file to a location accessible from your application, and give it a name like client_secrets.json. Your application will refer to this file as part of the flow of authorizing access to the BigQuery API. For more information about using the client_secrets.json format, see this page.

Import Google API Client Libraries

Our samples use the Google APIs Client libraries to make calls to the BigQuery API. Download and install the appropriate library from the client libraries page.

If you are developing with Java, Maven is a useful way to manage and organize dependencies. The Google APIs Client Libraries for Java are available in the central Maven repository.

Authorize Access to the BigQuery API

To authorize access to BigQuery, the first step is to write the code necessary to read the values from the client_secrets.json file. The application then prompts the user to authorize the application to access the BigQuery API, and uses the resulting authorization code to create a new BigQuery client. Much of the complexity of these tasks is handled by methods available in the Google APIs Client libraries.

The sample featured in this quick start guide uses an OAuth 2.0 installed application authorization flow. This is the suggested method for providing user-authorized access to the BigQuery API for installed or desktop applications. Here are the steps involved in this authorization flow:

  1. The application displays a URL to a Google API authorization page.
  2. The user authorizes access to the API from this page. If the user grants access to the API, an access code is displayed.
  3. The user pastes this access code into the command line prompt provided by the application.
  4. The application uses this access code to obtain a credential to create an authorized BigQuery API client, which can be used to run queries indefinitely unless the user revokes access via their Google account settings.

The user interface for authorizing installed applications can be improved to eliminate the manual access code copy-paste step, with additional code not covered in this tutorial. For more information, see Using OAuth 2.0 for Installed Applications. For information about other authorization flows, such as those for web applications or server-to-server authorization using service accounts, see authorizing access to the BigQuery API using OAuth 2.0.

Java

This sample uses the Google APIs Client Library for Java.

    
  

Python

This sample uses the Google APIs Client Library for Python.

import httplib2

from apiclient.discovery import build

from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client import tools

# Enter your Google Developer Project number
PROJECT_NUMBER = '12345XXXXXXX'

FLOW = flow_from_clientsecrets('client_secrets.json',
                               scope='https://www.googleapis.com/auth/bigquery')

storage = Storage('bigquery_credentials.dat')
credentials = storage.get()

if credentials is None or credentials.invalid:
  # Run oauth2 flow with default arguments.
  credentials = tools.run_flow(FLOW, storage, tools.argparser.parse_args([]))

http = httplib2.Http()
http = credentials.authorize(http)

bigquery_service = build('bigquery', 'v2', http=http)
  

Running Queries

BigQuery provides two API methods for running queries. The synchronous query method involves a single API call, and will wait to provide a response until the query is complete (unless provided with an optional timeout value). The asynchronous query method will "insert" a query job, and immediately return an ID for that job. You then use this job ID to poll for the status of the query, and retrieve the query result if complete. This example uses the synchronous query method. For more information about different ways to query using BigQuery, see querying data.

Running the Query

To run a synchronous query, the application makes an API call that passes the query itself (as a string), along with the project number that the query will be run under for billing and quota purposes. The query in the example below finds the top 10 most revised articles on Wikipedia from a 350 million row dataset. BigQuery uses a SQL-like syntax, which is described in our query reference guide.

Java

This sample uses the Google APIs Client Library for Java.

    
  

Python

This sample uses the Google APIs Client Library for Python.

# Create a query statement and query request object
query_data = {'query':'SELECT TOP(title, 10) as title, COUNT(*) as revision_count FROM [publicdata:samples.wikipedia] WHERE wp_namespace = 0;'}
query_request = bigquery_service.jobs()

# Make a call to the BigQuery API
query_response = query_request.query(projectId=PROJECT_NUMBER,
                                     body=query_data).execute()
  

Displaying the query result

Once the query has completed, the API returns the result set as a JSON object. In addition to the actual query results, the JSON response contains metadata about the query job, including a unique job ID and the schema of the result set. The application parses the query response and displays the resulting values.

Java

This sample uses the Google APIs Client Library for Java.

    
  

Python

This sample uses the Google APIs Client Library for Python.

# Retrieve and print the result of the query API response
query_response = query_request.query(projectId=PROJECT_NUMBER,
                                     body=query_data).execute()
print 'Query Results:'
for row in query_response['rows']:
  result_row = []
  for field in row['f']:
    result_row.append(field['v'])
  print ('\t').join(result_row)
  

Next Steps

This tutorial covers only the most basic steps necessary to make calls to the BigQuery API from a command-line application. The BigQuery API also provides methods for running asynchronous queries, creating tables and datasets, listing projects, and more.

Complete Source Code

Here is the complete source code for the examples in this quick start guide, with relevant import statements, exception handling, and in the case of Java, authorization flow steps broken out into helper methods.

Java

This sample uses the Google APIs Client Library for Java.

    
  

Python

This sample uses the Google APIs Client Library for Python.

import httplib2
import pprint
import sys

from apiclient.discovery import build
from apiclient.errors import HttpError

from oauth2client.client import AccessTokenRefreshError
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client import tools


# Enter your Google Developer Project number
PROJECT_NUMBER = '12345XXXXXXX'

FLOW = flow_from_clientsecrets('client_secrets.json',
                               scope='https://www.googleapis.com/auth/bigquery')


def main():

  storage = Storage('bigquery_credentials.dat')
  credentials = storage.get()

  if credentials is None or credentials.invalid:
    # Run oauth2 flow with default arguments.
    credentials = tools.run_flow(FLOW, storage, tools.argparser.parse_args([]))

  http = httplib2.Http()
  http = credentials.authorize(http)

  bigquery_service = build('bigquery', 'v2', http=http)

  try:
    query_request = bigquery_service.jobs()
    query_data = {'query':'SELECT TOP( title, 10) as title, COUNT(*) as revision_count FROM [publicdata:samples.wikipedia] WHERE wp_namespace = 0;'}

    query_response = query_request.query(projectId=PROJECT_NUMBER,
                                         body=query_data).execute()
    print 'Query Results:'
    for row in query_response['rows']:
      result_row = []
      for field in row['f']:
        result_row.append(field['v'])
      print ('\t').join(result_row)

  except HttpError as err:
    print 'Error:', pprint.pprint(err.content)

  except AccessTokenRefreshError:
    print ("Credentials have been revoked or expired, please re-run"
           "the application to re-authorize")

if __name__ == '__main__':
  main()