- Automated Cluster Management
- Managed deployment, logging, and monitoring let you focus on your data, not on your cluster. Dataproc clusters are stable, scalable, and speedy.
- Resizable Clusters
- Create and scale clusters quickly with various virtual machine types, disk sizes, number of nodes, and networking options.
- Autoscaling Clusters
- Dataproc Autoscaling provides a mechanism for automating cluster resource management, and enables automatic addition and subtraction of cluster workers (nodes).
- Cloud Integrated
- Built-in integration with Cloud Storage, BigQuery, Bigtable, Stackdriver Logging, Stackdriver Monitoring, and AI Hub, giving you a complete and robust data platform.
- Image versioning allows you to switch between different versions of Apache Spark, Apache Hadoop, and other tools.
- Highly available
- Run clusters in high availability mode with multiple master nodes, and set jobs to restart on failure to ensure your clusters and jobs are highly available.
- Enterprise Security
- When you create a Dataproc cluster, you can enable Hadoop Secure Mode via Kerberos by adding a Security Configuration. Also,GCP and Dataproc offer additional security features that help protect your data. Some of the most commonly used GCP-specific security features used with Dataproc include default at-rest encryption, OS Login, VPC Service Controls, and Customer Managed Encryption Keys (CMEK)
- Cluster Scheduled Deletion
- To help avoid incurring charges for an inactive cluster, you can use Dataproc's scheduled deletion, which provides options to delete a cluster after a specified cluster idle period, at a specified future time, or after a specfied time period.
- Automatic or Manual Configuration
- Dataproc automatically configures hardware and software, but also gives you manual control.
- Developer Tools
- Multiple ways to manage a cluster, including an easy-to-use web UI, the Cloud SDK, RESTful APIs, and SSH access.
- Initialization Actions
- Run initialization actions to install or customize the settings and libraries you need when your cluster is created.
- Optional Components
- Use optional components to install and configure additional components on the cluster. Optional components are integrated with Dataproc components, and offer fully configured environments for Zeppelin, Druid, Presto, and other open source software components related to the Apache Hadoop and Apache Spark ecosystem.
- Custom Images
- Dataproc clusters can be provisioned with a custom image that includes your pre-installed Linux operating system packages.
- Flexible Virtual Machines
- Clusters can use custom machine types and preemptible virtual machines to make them the perfect size for your needs.
- Component Gateway and Notebook Access
- Dataproc Component Gateway enables secure, one-click access to Dataproc default and optional component web interfaces running on the cluster.
- Workflow Templates
- Dataproc workflow templates provide a flexible and easy-to-use mechanism for managing and executing workflows. A Workflow Template is a reusable workflow configuration that defines a graph of jobs with information on where to run those jobs.
Learn and build
New to GCP? Get started with any GCP product for free with a $300 credit.
Need more help?
Our experts will help you build the right solution or find the right partner for your needs.
Products listed on this page are in alpha, beta, or early access. For more information on our product launch stages, see here.
Cloud AI products comply with the SLA policies listed here. They may offer different latency or availability guarantees from other Google Cloud services.