Jump to

What is a Data Warehouse?

A data warehouse, also called an enterprise data warehouse (EDW), is an enterprise data platform used for the analysis and reporting of structured and semi-structured data from multiple data sources, such as point-of-sale transactions, marketing automation, customer relationship management, and more. 

Data warehouses include an analytical database and critical analytical components and procedures. They support ad hoc analysis and custom reporting, such as data pipelines, queries, and business applications. They can consolidate and integrate massive amounts of current and historical data in one place and are designed to give a long-range view of data over time. These data warehouse capabilities have made data warehousing a primary staple of enterprise analytics that help support informed business decisions.

Learn about BigQuery, Google Cloud’s cost-effective, serverless, multicloud enterprise data warehouse.

Data warehouse defined

A data warehouse is an enterprise system used for the analysis and reporting of structured and semi-structured data from multiple sources, such as point-of-sale transactions, marketing automation, customer relationship management, and more. A data warehouse is suited for ad hoc analysis as well custom reporting. A data warehouse can store both current and historical data in one place and is designed to give a long-range view of data over time, making it a primary component of business intelligence.

Traditional vs. cloud-based data warehouse

Traditional data warehouses are hosted on-premises, with data flowing in from relational databases, transactional systems, business applications, and other source systems. However, they are typically designed to capture a subset of data in batches and store it based on rigid schemas, making them unsuitable for spontaneous queries or real-time analysis. Companies also must purchase their own hardware and software with an on-premises data warehouse, making it expensive to scale and maintain. In a traditional warehouse, storage is typically limited compared to compute, so data is transformed quickly and then discarded to keep storage space free.

Today’s data analytics activities have transformed to the center of all core business activities, including revenue generation, cost containment, improving operations, and enhancing customer experiences. As data evolves and diversifies, organizations need more robust data warehouse solutions and advanced analytic tools for storing, managing, and analyzing large quantities of data across their organizations. 

These systems must be scalable, reliable, secure enough for regulated industries, and flexible enough to support a wide variety of data types and big data use cases. They also need to support flexible pricing and compute, so you only pay for what you need instead of guessing your capacity. The requirements go beyond the capabilities of most legacy data warehouses. As a result, many enterprises are turning to cloud-based data warehouse solutions.

A cloud data warehouse makes no trade-offs from a traditional data warehouse, but extends capabilities and runs on a fully managed service in the cloud. Cloud data warehousing offers instant scalability to meet changing business requirements and powerful data processing to support complex analytical queries. 

With a cloud data warehouse, you benefit from the inherent flexibility of a cloud environment with more predictable costs. The up-front investment is typically much lower and lead times are shorter with on-premises data warehouse solutions because the cloud service provider manages and maintains the physical infrastructure. 

How data warehousing works in the cloud

Like a traditional data warehouse, cloud data warehouses collect, integrate, and store data from internal and external data sources. Data is typically transferred from a source system using a data pipeline. The data is extracted from the source system, transformed, and then loaded into the data warehouse—a process known as ETL (extract, transform, load). Data can also be sent directly to a central repository and then converted using ELT (extract, load, transform) processes. From there, users can use different business intelligence (BI) tools to access, mine, and report on data. Cloud data warehouses should also support streaming use cases to activate on data in real or near-real time.

Cloud data warehouses offer structured and semi-structured data storage, processing, integration, cleansing, loading, and so on within a public cloud environment. You can also use them with a cloud data lake to collect and store unstructured data. With some providers, it’s even possible to unify your data warehouse and data lake to maintain and centrally manage a single copy of your enterprise data. 

Different cloud providers may take various approaches when it comes to cloud data warehouse services. For example, some cloud data warehouses may use a cluster-based architecture similar to a traditional data warehouse. In contrast, others adopt a modern serverless architecture, which further minimizes data management responsibilities. However, most cloud data warehouses provide built-in data storage and capacity management features and automatic upgrades.

Other key capabilities that cloud data warehouses include: 

  • Massively parallel processing (MPP)
  • Columnar data stores
  • Self-service ETL and ELT data integration  
  • Disaster recovery features and automatic backups
  • Compliance and data governance tools
  • Built-in integrations for BI, AI, and machine learning

Advantages of data warehousing in the cloud

Companies are increasingly moving away from traditional data warehouses and migrating to the cloud, leveraging the cost savings and scalability that managed services can provide. 

Here are the primary advantages of cloud data warehousing.

Built for scale

Cloud data warehouses are elastic, providing nearly limitless storage and capacity. You can scale them up or down easily as your business needs change and only pay for what you use. 

Machine learning and AI initiatives

Customers can quickly unlock and operationalize machine learning models and AI technologies against cloud data warehouses for data mining, predicting business outcomes, and optimizing other areas, from data life cycle management to business processes to operational costs.

Better uptime

Cloud providers are obligated to meet SLAs and provide better uptime with reliable cloud infrastructure that scales seamlessly. On-premises data warehouses have scale and resource limitations that could impact performance.

Cost predictability

With cloud, you gain more flexible and predictable pricing. Some providers charge by throughput or per hour per node. Others charge a fixed price for a certain amount of resources. In every case, you avoid the mammoth costs incurred by an on-premises data warehouse that runs 24 hours a day, seven days a week, regardless of whether resources are in use or not.

Operational savings

A cloud data warehouse is fully managed, allowing you to outsource management hassles to cloud providers who must meet service level agreements (SLAs). This provides operational savings and can keep your in-house team focused on growth initiatives.

Real-time analytics

Cloud data warehouses provide more powerful computing that supports streaming data, allowing you to query data in real time. As a result, you can access and use data much faster than with an on-premises data warehouse, allowing you to get more accurate insights faster and make more informed business decisions.

What is a data warehouse used for?

Cloud data warehousing offers a range of solutions that can benefit an organization. Here are some of the most common data warehouse use cases:

Making real-time decisions: Analyze data in real time to proactively address challenges, identify opportunities, gain efficiency, reduce costs, and proactively respond to business events.

Consolidating siloed data: Quickly pull data from multiple structured sources across your organization, such as point-of-sale systems, websites, and email lists, and bring it together into one location so that you can perform analysis and get insights.

Enabling business reporting and ad hoc analysis: Keep historical data on a separate server from operational data so that end users can access it and run their own queries and reports without impacting the performance of operational systems or waiting to get help from IT.

Implementing machine learning and AI: Collect historical and real-time data to develop algorithms that can provide predictive insights, such as anticipating traffic spikes or suggesting relevant products to a customer browsing a website.

Many businesses and industries require data analysis that is not only massive in scale, but also ongoing and in real time. For example, some service providers use real-time data to dynamically adjust prices throughout the day. Insurance companies track policies, sales, claims, payroll, and more. They also use machine learning to predict fraud. Gaming companies must track and react to user behavior in real time to enhance the player’s experience. Data warehouses make all of these activities possible.

If your organization has or does any of the following, you’re probably a good candidate for a data warehouse:

  • Multiple sources of disparate data
  • Big-data analysis and visualization—both asynchronously and in real time
  • Machine learning models and other AI-driven processes
  • Streaming analytics
  • Custom report generation and ad hoc analysis
  • Data mining
  • Data science and geospatial analysis

How to choose a cloud-based data warehouse solution

When choosing a cloud-based data warehouse, it’s critical to evaluate how solutions work and have a deep understanding of the existing use cases your cloud data warehouse will need to support. 

There are many considerations beyond warehousing capabilities to take into account when choosing between different providers, including differences in architecture, scalability, security, pricing, performance, and more. For instance, you might find that a solution that is simple to implement isn’t as easy to scale or you might have to retrain all data analysts and buy additional licenses to upgrade your current system. 

Beyond looking at the differences between vendors, it’s also important to consider what your migration to a cloud data warehouse will specifically entail and how that relates to your existing IT investments and specific business needs. 

Enterprise data warehouses play a central role in an organization’s decision-making. Therefore, you’ll need to ensure you have a deep understanding of business requirements, current use cases, and any gaps with existing solutions. It can be helpful to involve key stakeholders early in the process to help figure out the implications of replacing a legacy data warehouse solution, the functional requirements to meet challenges, and detailed technical information about data sources, tools, frameworks, and more.

Solve your business challenges with Google Cloud

New customers get $300 in free credits to spend on Google Cloud.
Get started
Talk to a Google Cloud sales specialist to discuss your unique challenge in more detail.
Contact us

BigQuery is Google Cloud’s fully managed, serverless, multicloud enterprise data warehouse solution. It’s designed to power your data-driven innovation by helping you transform data into actionable insights and incorporate AI and machine learning that enables you to make informed decisions quickly. With no infrastructure to set up or manage, you can accelerate data analysis cost-effectively, rapidly share and analyze across complex datasets, and drive your digital transformation journey with ease.