You can install additional components like Anaconda when you create a Dataproc cluster using the Optional components feature. This page describes the Anaconda component.
The Anaconda component is a Python distribution and Package Manager with over 1000 popular data science packages. For additional installation information, see Configure Dataproc Python environment.
Install the component
Install the component when you create a Dataproc cluster. See Supported Dataproc versions for the component version included in each Dataproc image release.
gcloud command
To create a Dataproc cluster that includes the Anaconda component,
use the
gcloud dataproc clusters create cluster-name
command with the --optional-components
flag.
gcloud dataproc clusters create cluster-name \ --region=region \ --optional-components=ANACONDA \ ... other args
REST API
The Anaconda component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.Console
- Enable the component.
- In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
- In the Components section:
- Under Optional components, select Anaconda and other optional components to install on your cluster.