Data warehouse migration challenges and how to meet them
Ryan McDowell
Strategic Cloud Engineer
Nitin Motgi
Group Product Manager
Editor’s note: This is the second in a series on modernizing your data warehouse. Find part 1 here.
In the last blog post, we discussed why legacy data warehouses are not cutting it any more and why organizations are moving their data warehouses to cloud. We often hear that customers feel that migration is an uphill battle because the migration strategy was not deliberately considered.
Migrating to a modern data warehouse from a legacy environment can require a massive up-front investment in time and resources. There’s a lot to think about before and during the process, so your organization has to take a strategic approach to streamline the process. At Google Cloud, we work with enterprises shifting data to our BigQuery data warehouse, and we’ve helped companies of all kinds successfully migrate to cloud. Here are some of the questions we frequently hear around migrating a data warehouse to the cloud:
How do we minimize any migration risks or security challenges?
How much will it cost?
How do we migrate our data to the target data warehouse?
How quickly will we see equal or better performance?
These are big, important questions to ask—and have answered—when you’re starting your migration. Let’s take them in order.
How do we minimize any migration risks or security challenges?
It’s easy to consider an on-premises data warehouse secure because, well, it’s on-site and you can manage its data protection. But if scaling up an on-prem data warehouse is difficult, so is securing it as your business scales.
We’ve built in multiple features to secure BigQuery. For enterprise users, Cloud Identity and Access Management (Cloud IAM) is key to setting appropriate role-based user access to data. You can also take advantage of SQL’s security views within BigQuery. And all BigQuery data is encrypted at rest and in transit. You can add the protection of customer-managed encryption keys to establish even stronger security measures. Using virtual private cloud (VPC) security controls can secure your migration path, since it helps reduce data exfiltration risks.
How much will it cost?
The cost of a cloud data warehouse has a different structure from what you’re likely used to with a legacy data warehouse. An on-prem system like Teradata may depend on your IT team paying every three years for the hardware, then paying for licenses for users who need to access the system. Capacity increases come at an additional cost outside of that hardware budget.
With cloud, you’ve got a lot more options for cost and scale. Instead of a fixed set of costs, you’re now working on a price-utility gradient, where if you want to get more out of your data warehouse, you can spend more to do so immediately, or vice versa. While cloud data warehouses help reduce or eliminate capital and fixed costs, they are not all the same. You’ll find varying levels of simplicity and cost savings across vendors, so it’s important to check out the operational costs of each data warehouse in relation to its performance.
With a cloud data warehouse like BigQuery, TCO becomes an important metric for customers when they’ve migrated to BigQuery (check out ESG’s report on that), and Google Cloud’s flexibility makes it easy to optimize costs.
How do we migrate all of our data to the target data warehouse?
This question encompasses both migrating your extract, transform, load (ETL) jobs and SAS/BI application workloads to the target data warehouse, as well as migrating all your queries, stored procedures, and other extract, load, transform (ELT) jobs.
Actually getting all of a company’s data into the cloud can seem daunting at the outset of the migration journey. We know that most businesses have a lot of siloed data. That might be multiple data lakes set up over the years for various teams, or systems acquired through acquisition that handle just one or two crucial applications. You may be moving data from an on-prem or cloud data warehouse to BigQuery and type systems or representations don’t match up.
One big step you can take to prepare for a successful migration is to do some workload and use case discovery. That might involve auditing which use cases exist today and whether those use cases are part of a bigger workload, as well as identifying which datasets, tables, and schemas underpin each use case. Use cases will vary by industry and by job role. So, for example, a retail pricing analyst may want to analyze past product price changes to calculate future pricing. Use cases may include the need to ingest data from a transactional database, transforming data into a single time series per product, storing the results in a data warehouse table, and more.
After the preparation and discovery phase, you should assess the current state of your legacy environment to plan for your migration. This includes cataloging and prioritizing your use cases, auditing data to decide what will be moved and what won’t, and evaluating data formats across your organization to decide what you’ll need to convert or rewrite. Once that’s decided, choose your ingest and pipeline methods. All of these tasks take both technology and people management, and require some organizational consensus on what success will look like once the migration is complete.
How quickly will we see equal or better performance?
Managing a legacy data warehouse isn’t usually synonymous with speed. Performance often comes at the cost of capacity, so users can’t do the analysis they need till other queries have finished running. Reporting and other analytics functions may take hours or days, which is especially true for running large reports with a lot of data, like an end-of-quarter sales calculation. As the amount of data and number of users rapidly grows, performance begins to melt down and organizations often face disruptive outages.
However, with a modern cloud data warehouse like BigQuery, compute and storage are decoupled, so you can scale immediately without facing capital infrastructure constraints. BigQuery helps you modernize because it uses a familiar SQL interface, so users can run queries in seconds and share insights right away. Home Depot is an example of a customer that migrated their warehouse and reduced eight-hour workloads to five minutes.
Moving to cloud may seem daunting, especially when you’re migrating an entrenched legacy system. But it brings the benefits of adopting technology that lets the business grow, rather than simply adopting a tool. It’s likely you’ve already seen that the business demand exists. Now it’s time to stop standing in the way of that demand and instead make way for growth.