You can install additional components when you create a Dataproc cluster using the Optional Components feature. This page describes the Anaconda component.
The Anaconda component
is a Python distribution and Package Manager with over 1000 popular data science
packages. The component is installed on all cluster nodes in
and becomes the default Python interpreter. For additional installation information, see
Configure the cluster's Python environment.
Install the component
Install the component when you create a Dataproc cluster. See Supported Dataproc versions for the component version included in each Dataproc image release.
To create a Dataproc cluster that includes the Anaconda component,
gcloud dataproc clusters create cluster-name
command with the
gcloud dataproc clusters create cluster-name \ --region=region \ --optional-components=ANACONDA \ ... other args
REST APIThe Anaconda component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.
- Enable the component.
- In the Cloud Console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
- In the Components section:
- Under Optional components, select Anaconda and other optional components to install on your cluster.