Google BigQuery continues to define what it means to be fully managed
Posted by Tino Tereshko, BigQuery Technical Program Manager
Data professionals have a lot of options when it comes to managed cloud-based analytics warehouses. As the technical program manager for Google BigQuery, I may be biased, but when I look out at competitive offerings, it’s manageability that really sets BigQuery apart.
When it comes to cloud analytics services, the term “fully managed” tends to be used quite broadly. However, not all cloud data warehouses are created equal. BigQuery’s unique serverless architecture offers a high standard of what it means to be a “fully managed” technology. In the end, BigQuery users benefit from an always-improving, seamlessly scalable, fast and reliable service.
Let’s take a look at how BigQuery is architected, and how that translates into better manageability for end users.
Deployment and scaling
Under the hood, BigQuery employs a vast set of multi-tenant services driven by low-level Google infrastructure technologies like Dremel, Colossus, Jupiter and Borg.
Folks can start using BigQuery by simply loading data and running SQL commands. There's no need to build, deploy or provision clusters; no need to size VMs, storage, or hardware resources; no need to setup disks, define replication, configure compression and encryption, and so forth.
Users are able to seamlessly scale to dozens of petabytes and back to zero because BigQuery engineers have already deployed the resources required to reach this scale. Therefore, scaling is simply a matter of using BigQuery more, rather than provisioning larger clusters. Folks just need to mind best practices and usage quotas.
BigQuery employs the Capacitor columnar storage format on top of Colossus storage system, writing customer data in an opinionated fashion that's optimized for performance and durability. Under the hood, background processes continually study and optimize storage. BigQuery users are insulated from this underlying complexity.
BigQuery does not have a concept of primary keys, sort keys, indexes or distribution keys, simplifying database administration. One only needs to optimize for cost by defining partitioned tables, or perhaps employing a data sharding strategy.
Colossus storage is connected to the Dremel execution engine by the petabit Jupiter network, giving BigQuery I/O performance typical of in-memory databases. Finally, the newest version of Dremel executes SQL queries in-memory.
Upgrades and maintenance
BigQuery rolls out improvements every week. In addition, Google is constantly deploying more infrastructure resources and performing maintenance. All this is done in the background — without user interaction and without any downtime typically associated with upgrades and maintenance. From the user’s perspective, things simply get easier, faster and cheaper.
Number of monthly improvements rolled out by the BigQuery team
Since its first release in 2012, BigQuery has reached numerous milestones. In the past year alone, some of the underlying components we’ve improved include:
- A new in-memory Dremel execution engine
- Capacitor, a next-generation storage engine
- Poseidon, a new data ingest engine
- Long Term Storage, a 50% discount on data older than 90 days
- An entirely new work scheduling service, dynamic pipelined execution and more
BigQuery has a geographically diverse team of Site Reliability Engineers (SREs) who monitor the service 24/7 for outages, performance degradation, latency and failures. SREs track the service against internal SLOs, which are often much stricter than public SLAs. We are also able to help customers research not-so-obvious SQL issues.
The BigQuery team works behind the scenes to help ensure that you get the most current software stack running on fantastic infrastructure. To that end, we may seamlessly migrate your queries to a different data center (while of course respecting the dataset location constraints you’ve set, e.g., if you’ve asked that it remains in Europe). This means that your BigQuery queries may run in one data center in your region in the morning, and in another data center in the afternoon, as we roll out a new version of Dremel, upgrade networking or hardware or implement a new compression algorithm.
Google BigQuery continues to push the boundaries of what it means to be fully managed. For BigQuery users, this means less complexity, fewer tasks and quicker analytics. Folks can spend less time worrying about infrastructure, scale, operations, security and reliability, and more time understanding their data, deriving interesting intelligence and supporting their business with data-driven decision-making. Less tinkering, more doing.