Storing and querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery is an enterprise data warehouse that solves this problem by enabling super-fast SQL queries using the processing power of Google's infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.
You can access BigQuery by using a web UI or a command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET or Python. There are also a variety of third-party tools that you can use to interact with BigQuery, such as visualizing the data or loading the data.
BigQuery is fully-managed; to get started, you don't need to deploy any resources, such as disks and virtual machines. This page discusses key concepts you should understand when using BigQuery.
Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.
To get started with BigQuery, and Google Cloud Platform in general, read about projects and then go to the Quickstart Using the Web UI. For more information on how to grant access to your project, see Access Control.
Tables contain your data in BigQuery. Each table has a schema that describes field names, types, and other information. BigQuery supports the following table types:
- Native tables: tables backed by native BigQuery storage.
- External tables: tables backed by storage external to BigQuery. For more information, see Creating and Querying Federated Data Sources.
- Views: virtual tables defined by a SQL query. For more information, see Creating views.
Datasets enable you to organize and control access to your tables. A table must belong to a dataset, so you'll need to create at least one dataset before loading data into BigQuery.
You share BigQuery data with others by defining roles and setting permissions for organizations, projects, and datasets, but not on the tables within them. For more information, see Access Control.
BigQuery manages the technical aspects of storing your structured data, including compression, encryption, replication, performance tuning, and scaling. BigQuery stores data in the Capacitor columnar data format, and offers the standard database concepts of tables, partitions, columns, and rows.
You can load data into BigQuery storage via batch loads or streaming, perform data operations such as copying tables, query tables using SQL, modify data through SQL DML, export data, or share stored data with others using Identity and Access Management (IAM) permissions.
BigQuery also supports querying data that’s not in BigQuery storage. For more information, see Creating and Querying Federated Data Sources.
Jobs are actions you construct and that BigQuery executes on your behalf to load data, export data, query data, or copy data. Since jobs can potentially take a long time to complete, they execute asynchronously and can be polled for their status. BigQuery saves a history of all jobs associated with a project, accessible via the Google Cloud Platform Console.
Under the hood, analytics throughput is measured in BigQuery slots. A BigQuery slot is a unit of computational capacity required to execute SQL queries. BigQuery automatically calculates how many slots are required by each query, depending on query size and complexity.
See the quota policy for queries for the per-account slot quota.
Most users find the default amount of analytics capacity more than sufficient. Access to more slots does not guarantee faster per-query performance. However, a larger pool of resources might improve performance of very large or very complex queries, as well as performance of highly concurrent workloads. To check how many slots your account uses, see Monitoring BigQuery Using Stackdriver.
BigQuery automatically manages your slot quota based on customer history, usage, and spend. For customers with at least $10,000 in monthly analytics spend BigQuery offers several ways to increase the number of allocated slots. Contact your sales representative for more information.
Interacting with BigQuery
There are three main ways you interact with BigQuery: loading and exporting data, querying and viewing data, and managing data.
Loading and exporting data
In most cases, you load data into BigQuery Storage. If you want to get the data back out of BigQuery, you can export the data. You can also set up a table as a federated data source which allows you to use a query to transform your data as you load it.
Querying and viewing data
Once you load your data into BigQuery, you can query or view the data in your tables. For example, you can:
- Run synchronous or asynchronous queries.
- Run interactive or batch queries.
- Create a view, which is a virtual table defined by a SQL query.
- Use partitioned tables to query a subset of your data.
In addition to querying and viewing data, you can manage data in BigQuery in the following ways:
- Listing projects, jobs, datasets, and tables.
- Getting information about jobs, datasets, and tables.
- Defining, updating, or patching datasets and tables.
- Deleting datasets and tables.
- Managing table partitions.
For more information, see Managing Jobs, Datasets, and Projects.