IT prediction: Unified data pipelines will foster more real-time insights
Irina Farooq
Sr. Director, Product Management
Try Google Cloud
Start building on Google Cloud with $300 in free credits and 20+ always free products.
Free trialEditor’s note: This post is part of an ongoing series on IT predictions from Google Cloud experts. Check out the full list of our predictions on how IT will change in the coming years.
Prediction: By 2025, 90% of data will be actionable in real-time using ML
A recent survey uncovered that only one-third of all companies are able to realize tangible value from their data. As a result, organizations are saddled with the operational burden of managing data infrastructure, moving and duplicating data, and making it available to the right users in the right tools.
At Google, data is in our DNA, and we want to make the same solutions that help us innovate available to our customers. For example, we helped Vodafone unify all their data so that thousands of their employees can innovate across 700 different use-cases and 5,000 different data feeds. They now run AI development 80% faster, more cost-effectively, and all without compromising governance and reliability.
In our own experience building data infrastructure, we’ve found the following principles to be helpful for overcoming barriers to value and innovation:
You have to be able to see and trust your data. First off, spend less time looking for your data. Then, leverage automation and intelligence to catalog your data so you can be sure that you can trust it. Using automatic cataloging tools like Dataplex allows you to discover, manage, monitor, and govern your data from one place, no matter where it’s stored. Instead of spending days searching for the right data, you can find it right when you need it and spend more time actually working with it. Plus, built-in data quality and lineage capabilities help automate data quality and troubleshoot data issues.
You have to be able to work with data. Adopt the best proprietary and open source tools that allow your teams to work across all of your data, from structured to semi-structured to unstructured. The key is finding ways to leverage the best of open source, like Apache Spark, while integrating enterprise solutions, so you can deliver reliability and performance at scale. Imagine what’s possible when you can leverage the power of Google Cloud infrastructure without forking the open source code?
You need to act on today’s data today — not tomorrow. Apply streaming analytics so you can work with data as it’s collected. Building unified batch and real-time pipelines allows you to process real-time events to achieve in-context experiences. For example, a streaming service like Dataflow lets you use Apache Beam to develop unified pipelines once and deploy in batch and real time.
When you can see the data, trust the data, and work with data as it’s collected, we can see how 90% of data will become actionable in real-time using ML and the incredible innovation that it will unlock.