A public dataset is any dataset that is stored in BigQuery and made available to the general public. The public datasets listed in the BigQuery documentation are datasets that Google BigQuery hosts for you to access and integrate into your applications. Google pays for the storage of these datasets and provides public access to the data via a project. You pay only for the queries that you perform on the data (the first 1 TB per month is free, subject to query pricing details).
How to query public data sets using BigQuery
BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI, the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.
Currently, BigQuery public datasets are stored in the
location. When you query a public dataset, supply
--location=US flag on the command line, choose
US as the
processing location in the BigQuery web UI, or specify the
location property in the
jobReference section of the job resource
when you use the API. Because the public datasets are stored in the US, you cannot write public
data query results to a table in another region, and you cannot join tables in public datasets
with tables in another region.
To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.
Sign in to your Google Account.
If you don't already have one, sign up for a new account.
Select or create a GCP project.
Make sure that billing is enabled for your project.
- BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.
Other Public Datasets
There are many other public datasets available for you to query, some of which are also hosted by Google, but many more that are hosted by third parties. You can share any of your datasets with the public by changing the sharing permissions associated with your dataset. For more information about sharing datasets, see Assigning access controls to datasets.
- Sample Tables
- Google Genomics Public Data
- Datasets publicly available on Google BigQuery (reddit.com)
- Google Patents Public Datasets
If you have any questions about the BigQuery public dataset program,
contact us at