Building your data lake on Google Cloud

Store, process, and analyze all your data in a cost-efficient and agile way.

Cloud Data Lake Overview

Turn raw data into innovation

Where does your data live today, and are you making the most of it? Load all your structured or unstructured data into Google Cloud and our processing, analytics, and machine learning tools will turn it into insights that drive growth across your business.

From ingest to insight

Data in GCP Data Lake

Easy migration to the cloud

Is your data batch or streaming? Are you migrating across your network, using an offline transfer appliance, or capturing real-time streams? Wherever your data lives today and however you need to manage your migration, we make it easy to move it to Google Cloud, where you can count on Cloud Storage’s 99.999999999% durability.

Storing Data at Petabyte Scale

Flexible, powerful, cost-effective storage

Our ability to decouple storage from compute lets you incrementally add data in increments as small as one gigabyte and pay only for what you store. Multiple storage classes mean you can optimize for cost and availability—no schema design required. And compatibility with the entire Google Cloud Platform helps you quickly experiment with new analytics and data to support any use case.

Process Data

Process your data your way

With Cloud Storage you can process data however makes sense for your business. Use Cloud Dataproc, our fully managed Apache Hadoop and Apache Spark service, to spin up clusters in seconds and pay only for the time it takes jobs to run. Our fully managed Apache Beam service, Cloud Dataflow, lets you work with stream and batch workloads in a serverless experience that eliminates provisioning and management complexities.

Serverless Data Warehouse

Fast dashboards and visualizations

Want to perform structured-data analytics at blazing speeds against massive volumes? With BigQuery, Google Cloud’s serverless petabyte-scale data warehouse, you can set up your warehouse in seconds, start querying data immediately, and create instant enterprise reporting and business intelligence with the in-memory BigQuery BI Engine.

Advanced Analytics using ML

New machine learning insights

Our native integrations with Cloud AI open your data lake to the vast potential of machine learning, from unlocking the insights hidden in your images and video to deploying large-scale ML algorithms. Our easy-to-use, built-in BigQuery ML feature democratizes machine learning and supports a data-driven culture inside your company by empowering anyone to build and deploy models.

Ready to create your data lake?

Map on-premises Hadoop data lake workloads to GCP products

Building a cloud data lake on GCPYESNOIm processingstreaming dataWe useApache BeamWe useApache Spark or KafkaCloud DataflowCloud DataprocCloud DataprocIm doinginteractive dataanalysis orad-hoc queryingWe use Apache Sparkwith interactive webnotebooksAre you interested in keepingthese SQL queries as they are?Cloud Dataproc in combinationwith Jupyter or Zeppelinoptional componentsCloud DataprocNo, Im interested inlearning more abouta serverless solution.YESNONo, Im interested inlearning more abouta managed solution.BigQueryWe use SQL with Apache Hive,Apache Drill, Impala,Presto or similarCloud DataprocCloud DataprocIm doing ELT/ETLor batch processingWe use MapReduce,Spark, Pig, or HiveWe use Oozie forworkflow orchestrationCloud ComposerAre you interested inkeeping these workflowjobs as they are?Im supportingNoSQL workloadsWe useApache AccumuloCloud DataprocYESNONeed to use coprocessorsor SQL with Apache Phoenix?Cloud DataprocCloud BigtableWe useApache HBaseIm running anApache Hadoopclusteron-premises