BigQuery overview
BigQuery is a fully managed, AI-ready data platform that helps you manage and analyze your data with built-in features like machine learning, search, geospatial analysis, and business intelligence. BigQuery's serverless architecture lets you use languages like SQL and Python to answer your organization's biggest questions with zero infrastructure management.
BigQuery provides a uniform way to work with both structured and unstructured data and supports open table formats like Apache Iceberg, Delta, and Hudi. BigQuery streaming supports continuous data ingestion and analysis while BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes.
BigQuery's architecture consists of two parts: a storage layer that ingests, stores, and optimizes data and a compute layer that provides analytics capabilities. These compute and storage layers efficiently operate independently of each other thanks to Google's petabit-scale network that enables the necessary communication between them.
Legacy databases usually have to share resources between read and write operations and analytical operations. This can result in resource conflicts and can slow queries while data is written to or read from storage. Shared resource pools can become further strained when resources are required for database management tasks such as assigning or revoking permissions. BigQuery's separation of compute and storage layers lets each layer dynamically allocate resources without impacting the performance or availability of the other.
This separation principle lets BigQuery innovate faster because storage and compute improvements can be deployed independently, without downtime or negative impact on system performance. It is also essential to offering a fully managed serverless data warehouse in which the BigQuery engineering team handles updates and maintenance. The result is that you don't need to provision or manually scale resources, leaving you free to focus on delivering value instead of traditional database management tasks.
BigQuery interfaces include Google Cloud console interface and the BigQuery command-line tool. Developers and data scientists can use client libraries with familiar programming including Python, Java, JavaScript, and Go, as well as BigQuery's REST API and RPC API to transform and manage data. ODBC and JDBC drivers provide interaction with existing applications including third-party tools and utilities.
As a data analyst, data engineer, data warehouse administrator, or data scientist, BigQuery helps you load, process, and analyze data to inform critical business decisions.
Get started with BigQuery
You can start exploring BigQuery in minutes. Take advantage of BigQuery's free usage tier or no-cost sandbox to start loading and querying data.
- BigQuery's sandbox: Get started in the BigQuery sandbox, risk-free and at no cost.
- Google Cloud console quickstart: Familiarize yourself with the power of the BigQuery Console.
- Public datasets: Experience BigQuery's performance by exploring large, real-world data from the Public Datasets Program.
Explore BigQuery
BigQuery's serverless infrastructure lets you focus on your data instead of resource management. BigQuery combines a cloud-based data warehouse and powerful analytic tools.
BigQuery storage
BigQuery stores data using a columnar storage format that is optimized for analytical queries. BigQuery presents data in tables, rows, and columns and provides full support for database transaction semantics (ACID). BigQuery storage is automatically replicated across multiple locations to provide high availability.
- Learn about common patterns to organize BigQuery resources in the data warehouse and data marts.
- Learn about datasets, BigQuery's top-level container of tables and views.
- Load data into BigQuery using:
- Stream data with the Storage Write API.
- Batch-load data from local files or Cloud Storage using formats that include: Avro, Parquet, ORC, CSV, JSON, Datastore, and Firestore formats.
- BigQuery Data Transfer Service automates data ingestion.
For more information, see Overview of BigQuery storage.
BigQuery analytics
Descriptive and prescriptive analysis uses include business intelligence, ad hoc analysis, geospatial analytics, and machine learning. You can query data stored in BigQuery or run queries on data where it lives using external tables or federated queries including Cloud Storage, Bigtable, Spanner, or Google Sheets stored in Google Drive.
- ANSI-standard SQL queries (SQL:2011 support) including support for joins, nested and repeated fields, analytic and aggregation functions, multi-statement queries, and a variety of spatial functions with geospatial analytics - Geographic Information Systems.
- Create views to share your analysis.
- Business intelligence tool support including BI Engine with Looker Studio, Looker, Google Sheets, and 3rd party tools like Tableau and Power BI.
- BigQuery ML provides machine learning and predictive analytics.
- BigQuery Studio offers features such as Python notebooks, and version control for both notebooks and saved queries. These features make it easier for you to complete your data analysis and machine learning (ML) workflows in BigQuery.
- Query data outside of BigQuery with external tables and federated queries.
For more information, see Overview of BigQuery analytics.
BigQuery administration
BigQuery provides centralized management of data and compute resources while Identity and Access Management (IAM) helps you secure those resources with the access model that's used throughout Google Cloud. Google Cloud security best practices provide a solid yet flexible approach that can include traditional perimeter security or more complex and granular defense-in-depth approach.
- Intro to data security and governance helps you understand data governance, and what controls you might need to secure BigQuery resources.
- Jobs are actions that BigQuery runs on your behalf to load, export, query, or copy data.
- Reservations let you switch between on-demand pricing and capacity-based pricing.
For more information, see Introduction to BigQuery administration.
BigQuery resources
Explore BigQuery resources:
- Release notes provide change logs of features, changes, and deprecations.
Pricing for analysis and storage. See also: BigQuery ML, BI Engine, and Data Transfer Service pricing.
Locations define where you create and store datasets (regional and multi-region locations).
Stack Overflow hosts an engaged community of developers and analysts working with BigQuery.
BigQuery Support provides help with BigQuery.
Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale by Valliappa Lakshmanan and Jordan Tigani, explains how BigQuery works and provides an end-to-end walkthrough on how to use the service.
APIs, tools, and references
Reference materials for BigQuery developers and analysts:
- SQL query syntax for details about using GoogleSQL.
- BigQuery API and client libraries present overviews of BigQuery's features and their use.
- BigQuery code samples provide hundreds of snippets for client libraries in C#, Go, Java, Node.js, Python, Ruby. Or view the sample browser.
- DML, DDL, and user-defined functions (UDF) syntax lets you manage and transform your BigQuery data.
- bq command-line tool reference
documents the syntax, commands, flags, and arguments for the
bq
CLI interface. - ODBC / JDBC integration connect BigQuery to your existing tooling and infrastructure.
BigQuery roles and resources
BigQuery addresses the needs of data professionals across the following roles and responsibilities.
Data Analyst
Task guidance to help if you need to do the following:
- Query BigQuery data using interactive or batch queries using SQL query syntax
- Reference SQL expressions, functions, and operators to query data
Use tools to analyze and visualize BigQuery data including: Looker, Looker Studio, and Google Sheets.
Use geospatial analytics to analyze and visualize geospatial data with BigQuery's Geographic Information Systems
Optimize query performance using:
- Partitioned tables: Prune large tables based on time or integer ranges.
- Materialized views: Define cached views to optimize queries or provide persistent results.
- BI Engine: BigQuery's fast, in-memory analysis service.
To take a tour of BigQuery's data analytics features directly in the Google Cloud console, click Take the tour.
Data Administrator
Task guidance to help if you need to do the following:
- Manage costs with reservations to balance on-demand and capacity-based pricing.
- Understand data security and governance to help secure data by dataset, table, column, row, or view
- Backup data with table snapshots to preserve the contents of a table at a particular time.
- View BigQuery INFORMATION_SCHEMA to understand the metadata of datasets, jobs, access control, reservations, tables and more.
- Use Jobs to have BigQuery load, export, query, or copy data are actions on your behalf.
- Monitor logs and resources to understand BigQuery and workloads.
For more information, see Introduction to BigQuery administration.
To take a tour of BigQuery data administration features directly in the Google Cloud console, click Take the tour.
Data Scientist
Task guidance to help if you need to use BigQuery ML's machine learning to do the following:
- Understand the end-to-end user journey for machine learning models
- Manage access control for BigQuery ML
- Create and train a BigQuery ML models
including:
- Linear regression forecasting
- Binary logistic and multiclass logistic regression classifications
- K-means clustering for data segmentation
- Time series forecasting with Arima+ models
Data Developer
Task guidance to help if you need to do the following:
- Load data into BigQuery
with:
- batch-load data for Avro, Parquet, ORC, CSV, JSON, Datastore, and Firestore formats
- BigQuery Data Transfer Service
- BigQuery Storage Write API
Use code sample library including:
Google Cloud sample browser (scoped for BigQuery)
BigQuery video tutorials
The following series of video tutorials get you started with BigQuery:
Title |
Description |
---|---|
How to get started with BigQuery (17:18) | An overview that summarizes what is BigQuery and how to use it. Segments include: ETL pipelines, pricing and optimization, BigQuery ML and BI Engine, and wrapping up with a demo of BigQuery in Google Cloud console. |
What is BigQuery? (4:39) | An overview of BigQuery of how BigQuery is designed to ingest and store large amounts of data to help analysts and developers alike |
Using the BigQuery sandbox (3:05) | How to set up a BigQuery sandbox, letting you run queries without needing a credit card |
Asking questions, running queries (5:11) | How to write and run SQL queries in the BigQuery UI - plus picking a winning jersey number |
Loading data into BigQuery (5:31) | How to ingest and analyze data in real time, or just a one-time batch analysis of data - plus cats v. dogs |
Visualizing query results (5:38) | How data visualization is useful for making complex datasets easier to understand and internalize |
Managing access with IAM (5:23) | How to allow other users to query your datasets in BigQuery with IAM permissions and access control |
Saving and sharing queries (6:17) | How to save and share your queries in BigQuery hassle-free |
Protecting sensitive data with authorized views (7:12) | How to easily share datasets with different users by setting customized access controls |
Querying external data with BigQuery (5:49) | How to set up an external data source in BigQuery and query data from Cloud Storage, Cloud SQL, Google Drive, and more |
What are user-defined functions? (4:59) | How to create user-defined functions (UDFs) for analyzing datasets in BigQuery |
What's next
- For an overview of BigQuery storage, see Overview of BigQuery storage.
- For an overview of BigQuery queries, see Overview of BigQuery analytics.
- For an overview of BigQuery administration, see Introduction to BigQuery administration.
- For an overview of BigQuery security, see Overview of data security and governance.