BigQuery public datasets
A public dataset is any dataset that is stored in BigQuery and made available to the general public through the Google Cloud Public Dataset Program. The public datasets are datasets that BigQuery hosts for you to access and integrate into your applications. Google pays for the storage of these datasets and provides public access to the data via a project. You pay only for the queries that you perform on the data. The first 1 TB per month is free, subject to query pricing details.
Public datasets are available for you to analyze using either
legacy SQL or
GoogleSQL
queries. Use a fully qualified table name when querying public datasets, for
example bigquery-public-data.bbc_news.fulltext
. If your organization restricts
data access, for example with security perimeters, then you might need to
contact your administrator for permission to access public datasets.
You can access BigQuery public datasets by using the Google Cloud console, by using the bq command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python. You can also view and query public datasets through Analytics Hub, a data exchange platform that helps you discover and access data libraries.
Public datasets are not accessible by default from within a VPC Service Controls perimeter. There is no service-level agreement (SLA) for the Public Dataset Program.
You can find more details about each individual dataset by clicking the dataset's name in the Datasets section of Cloud Marketplace.
Go to Datasets in Cloud Marketplace
Before you begin
To get started using a BigQuery public dataset, you must create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you must also enable billing.
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
- BigQuery is automatically enabled in new projects.
To activate BigQuery in a preexisting project,
Enable the BigQuery API.
Public dataset locations
Each public dataset is stored in a specific location like US
or
EU
. Currently, the BigQuery sample tables are stored in the
US
multi-region location.
When you query a sample table, supply the --location=US
flag on the
command line, choose US
as the processing location in the
Google Cloud console, or specify the location
property in the
jobReference
section of the
job resource
when you use the API. Because the sample tables are stored in the US, you cannot
write sample table query results to a table in another region, and you cannot
join sample tables with tables in another region.
Access public datasets in the Google Cloud console
You can access public datasets in the Google Cloud console through the following methods:
In the Explorer pane, view the
bigquery-public-data
project. For more information, see Open a public dataset.Use Analytics Hub to view and subscribe to public datasets.
To find out when a data table was last updated, go to the table's Details section as described in Getting table information, and view the Last modified field. For more information about selecting and removing projects, see Work with projects.
Other public datasets
There are many other public datasets available for you to query, some of which are also hosted by Google, but many more that are hosted by third parties. Other datasets include:
- Cloud Life Sciences public datasets
- NIH chest x-ray dataset
- The Cancer Imaging Archive (TCIA) dataset
- Dataset of release notes for the majority of generally available Google Cloud products.
Share a dataset with the public
You can share any of your datasets with the public by changing the dataset's access controls to allow access by "All Authenticated Users". For more information about setting dataset access controls, see Controlling access to datasets.
When you share a dataset with the public:
- Storage charges are incurred by the billing account attached to the project that contains the publicly-shared dataset.
- Query charges are incurred by the billing account attached to the project where the query jobs are run.
For more information, see Overview of BigQuery pricing.
Sample tables
In addition to the public datasets,
BigQuery provides a limited number of sample tables that you can
query. These tables are contained in the
bigquery-public-data:samples
dataset.
The requirements for querying the BigQuery sample tables are the same as the requirements for querying the public datasets.
The bigquery-public-data:samples
dataset includes the following tables:
Name | Description |
---|---|
gsod |
Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010. |
github_nested |
Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012. |
github_timeline |
Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012. |
natality |
Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008. |
shakespeare |
Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus. |
trigrams |
Contains English language trigrams from a sample of works published between 1520 and 2008. |
wikipedia |
Contains the complete revision history for all Wikipedia articles up to April 2010. |
Contact us
If you have any questions about the BigQuery public dataset
program, contact us at bq-public-data@google.com
.
What's next
Learn how to query a table in a public dataset at Quickstart using the Google Cloud console.