Dataproc optional Anaconda component

You can install additional components like Anaconda when you create a Dataproc cluster using the Optional components feature. This page describes the Anaconda component.

The Anaconda component is a Python distribution and Package Manager with over 1000 popular data science packages. The component is installed on all cluster nodes in /opt/conda/anaconda, and becomes the default Python interpreter. For additional installation information, see Configure Dataproc Python environment.

Install the component

Install the component when you create a Dataproc cluster. See Supported Dataproc versions for the component version included in each Dataproc image release.

gcloud command

To create a Dataproc cluster that includes the Anaconda component, use the gcloud dataproc clusters create cluster-name command with the --optional-components flag.

gcloud dataproc clusters create cluster-name \
    --region=region \
    --optional-components=ANACONDA \
    ... other args

REST API

The Anaconda component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.

Console

  1. Enable the component.
    • In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
    • In the Components section:
      • Under Optional components, select Anaconda and other optional components to install on your cluster.