This document provides high-level guidance to organizations migrating from Netezza to BigQuery. The document shows ways that organizations can rethink their existing data model and extract, transform, and load (ETL) processes to get the most out of BigQuery.
For decades, large organizations have relied on systems like Netezza to help store and analyze massive amounts of data. Although these systems are powerful, they require huge investments in hardware, maintenance, and licensing. Also, as the number of data sources and the volume of data increases, organizations face challenges around node management, volume of data per source, archiving costs, and overall scalability of the system.
As a result, more and more organizations are evaluating BigQuery to solve their need for a cloud-based enterprise data warehouse. BigQuery is Google's fully managed, petabyte-scale, serverless enterprise data warehouse (EDW) for analytics. There is no infrastructure to manage, and you don't need a database administrator. You can focus on analyzing data to find meaningful insights using familiar SQL.
BigQuery can scan billions of rows, without an index, in tens of seconds. BigQuery is a cloud-powered, massively parallel query service that shares Google's infrastructure, so it can parallelize each query and run it on tens of thousands of servers simultaneously. The two core technologies that differentiate BigQuery are columnar storage and tree architecture:
- Columnar storage: Data is stored in columns rather than rows, which makes it possible to achieve a very high compression ratio and scan throughput.
- Tree architecture: Queries are dispatched and results are aggregated across thousands of machines in a few seconds.
The technical architecture of BigQuery is explained in more detail in An Inside Look at Google BigQuery.
To read the full document, click the button: