Query BIG with BigQuery: A cheat sheet
Priyanka Vergadia
Staff Developer Advocate, Google Cloud
Organizations rely on data warehouses to aggregate data from disparate sources, process it, and make it available for data analysis in support of strategic decision-making. BigQuery is the Google Cloud enterprise data warehouse designed to help organizations to run large scale analytics with ease and quickly unlock actionable insights. You can ingest data into BigQuery either through batch uploading or by streaming data directly to unlock real-time insights. As a fully-managed data warehouse, BigQuery takes care of the infrastructure so you can focus on analyzing your data up to petabyte-scale. BigQuery supports SQL (Structured Query Language), which you’re likely already familiar with if you've worked with ANSI-compliant relational databases.
BigQuery unique features
BI Engine - BigQuery BI Engine is a fast, in-memory analysis service that provides subsecond query response times with high concurrency. BI Engine integrates with Google Data Studio and Looker for visualizing query results and enables integration with other popular business intelligence (BI) tools.
BigQuery ML: BigQuery ML is unlocking machine learning for millions of data analysts. It enables data analysts and data scientists to build and operationalize machine learning models directly within BigQuery, using simple SQL.
BigQuery Omni - BigQuery Omni is a flexible, multi-cloud analytics solution powered by Anthos that lets you cost-effectively access and securely analyze data across Google Cloud, Amazon Web Services (AWS), and Azure, without leaving the BigQuery user interface (UI). Using standard SQL and familiar BigQuery APIs, you can break down data silos and gain critical business insights from a single pane of glass.
Data QnA: Data QnA enables self-service analytics for business users on BigQuery data as well as federated data from Cloud Storage, Bigtable, Cloud SQL, or Google Drive. It uses Dialogflow and enables users to formulate free-form text analytical questions, with auto-suggested entities while users type a question.
Connected Sheets -The native integration between Sheets and BigQuery makes it possible for all business stakeholders, who are already quite familiar with spreadsheet tools, to get their own up-to-date insights at any time.
Geospatial data - BigQuery offers accurate and scalable geospatial analysis with geography data types. It supports core GIS functions – measurements, transforms, constructors, and more – using standard SQL.
How does it work?
Here’s how it works: You ingest your own data into BigQuery or use data from the public datasets. Storage and compute are decoupled and can scale independently on demand. This offers immense flexibility and cost control for your business as you don’t need to keep expensive compute resources up and running all the time. As a result, BigQuery is much more cost-effective than traditional node-based cloud data warehouse solutions or on-premises systems. BigQuery also provides automatic backup and restore of your data.
You can ingest data into BigQuery in batches or stream real-time data from web, IoT, or mobile devices via Pub/Sub. You can also use Data Transfer Service to ingest data from other clouds, on-premises systems or third-party services. BigQuery also supports ODBC and JDBC drivers to connect with existing tools and infrastructure.
Interacting with BigQuery to load data, run queries, or create ML models can be done in three different ways. You can use the UI in the Cloud Console, the BigQuery command-line tool, or the API via client libraries available in several languages.
When it comes time to visualize your data, BigQuery integrates with Looker as well as several other business intelligence tools across the Google partner ecosystem.
What about security?
BigQuery offers built-in data protection at scale. It provides security and governance tools to efficiently govern data and democratize insights within your organization.
- Within BigQuery, users can assign dataset-level and project-level permissions to help govern data access. Secure data sharing ensures you can collaborate and operate your business with trust.
- Data is automatically encrypted both while in transit and at rest, ensuring that your data is protected from intrusions, theft, and attacks.
- Cloud DLP helps you discover and classify sensitive data assets.
- Cloud IAM provides access control and visibility into security policies.
- Data Catalog helps you discover and manage data.
How much does it cost?
The BigQuery sandbox lets you explore BigQuery capabilities at no cost and confirm that BigQuery fits your needs. With BigQuery you get predictable price-performance: you pay for storing and querying data, and for streaming inserts. Loading and exporting data are free of charge. Storage costs are based on the amount of data stored, and have two rates based on how often the data is changing. Query costs can be either:
- On-demand – you are charged per query by the amount of data processed
- Flat-rate – if you prefer to purchase dedicated resources
You can start with the pay-as-you-go, on-demand option and later move to flat-rate if that better suits your usage. Or, start with flat-rate, get a better understanding of your usage and move to the pay-as-you-go models for additional workloads.
To explore BigQuery and its capabilities a bit more, check out the sandbox; and when you’re ready to modernize your data warehouse with BigQuery then check out the documentation to streamline your migration process here.
For more #GCPSketchnote, follow the GitHub repo. For similar cloud content follow me on Twitter @pvergadia and keep an eye out on thecloudgirl.dev.