Storing and querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery is an enterprise data warehouse that solves this problem by enabling super-fast SQL queries using the processing power of Google's infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.
You can access BigQuery by using a web UI or a command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET or Python. There are also a variety of third-party tools that you can use to interact with BigQuery, such as visualizing the data or loading the data.
There are five main concepts you should understand when using BigQuery.
Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.
BigQuery bills on a per-project basis, so it’s usually easiest to create a single project for your company that’s maintained by your billing department. To enable billing, see Sign Up for BigQuery. For more information on how to grant access to your project, see Access Control.
Tables contain your data in BigQuery. Each table has a schema that describes field names, types, and other information. In addition to tables containing data stored in managed storage, BigQuery also supports both views, which are virtual tables defined by a SQL query, and external tables, which are tables defined over data stored in, for example, Cloud Storage.
Datasets allow you to organize and control access to your tables. Because tables are contained in datasets, you'll need to create at least one dataset before loading data into BigQuery.
You share BigQuery data with others by setting ACLs on datasets, not on the tables within them. For more information, see Access Control.
Jobs are actions you construct and BigQuery executes on your behalf to load data, export data, query data, or copy data. Since jobs can potentially take a long time to complete, they execute asynchronously and can be polled for their status. BigQuery saves a history of all jobs associated with a project, accessible via the Google Cloud Platform Console.
Under the hood, analytics throughput is measured in BigQuery slots. A BigQuery slot is a unit of computational capacity required to execute SQL queries. BigQuery automatically calculates how many slots are required by each query, depending on query size and complexity.
See Resource quotas for analytics for the per-account slot quota.
Most users find the default amount of analytics capacity more than sufficient. Access to more slots does not guarantee faster per-query performance. However, a larger pool of resources might improve performance of very large or very complex queries, as well as performance of highly concurrent workloads. To check how many slots your account uses, see Monitoring BigQuery Using Stackdriver.
BigQuery automatically manages your slot quota based on customer history, usage, and spend. For customers with at least $10,000 in monthly analytics spend BigQuery offers several ways to increase the number of allocated slots. Contact your sales representative for more information.
Interacting with BigQuery
There are three main ways you interact with BigQuery.
Loading and exporting data
In most cases, you load data into BigQuery Storage. If you want to get the data back out of BigQuery, you can export the data. You can also set up a table as a federated data source which allows you to use a query to transform your data as you load it.
Querying and viewing data
Once you load your data into BigQuery, there are a few ways to query or view the data in your tables:
- Calling the bigquery.jobs.query() method
- Calling the bigquery.jobs.insert() method with a query configuration
In addition to querying and viewing data, you can manage data in BigQuery in the following ways:
- Listing projects, jobs, tables and datasets
- Getting information about jobs, tables and datasets
- Defining, updating or patching tables and datasets
- Deleting tables and datasets
For more information, see the API reference.