Stay organized with collections
Save and categorize content based on your preferences.
You can install additional components like Apache Pig
when you create a Dataproc cluster using the
Optional components
feature. This page describes the Pig component, an open source platform for
analyzing large data sets.
Install the component
Install the component when you create a Dataproc cluster.
Apache Pig is an optional component in Dataproc 2.3 and later
image versions.
To create a Dataproc cluster that includes the Pig component,
use the
gcloud dataproc clusters create CLUSTER_NAME
command with the --optional-components flag (using image version
2.3 or later).
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[],[],null,["You can install additional components like [Apache Pig](https://pig.apache.org/)\nwhen you create a Dataproc cluster using the\n[Optional components](/dataproc/docs/concepts/components/overview#available_optional_components)\nfeature. This page describes the Pig component, an open source platform for\nanalyzing large data sets.\n\nInstall the component\n\nInstall the component when you create a Dataproc cluster.\n\nApache Pig is an optional component in Dataproc `2.3` and later\nimage versions.\n| **Note:** Apache Pig is automatically installed on Dataproc `2.2` and earlier image versions.\n\nSee\n[Supported Dataproc versions](/dataproc/docs/concepts/versioning/dataproc-versions#supported_cloud_dataproc_versions)\nfor component versions included in the latest Dataproc image\nreleases. \n\ngcloud\n\nTo create a Dataproc cluster that includes the Pig component,\nuse the\n[`gcloud dataproc clusters create `\u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e](/sdk/gcloud/reference/dataproc/clusters/create)\ncommand with the `--optional-components` flag (using image version\n2.3 or later). \n\n```\ngcloud dataproc clusters create CLUSTER_NAME \\\n --region=REGION \\\n --optional-components=PIG \\\n --image-version=2.3 \\\n ... other flags\n```\n\nREST API\n\nThe Pig component can be specified through the Dataproc API\nusing\n[SoftwareConfig.Component](/dataproc/docs/reference/rest/v1/ClusterConfig#Component)\nas part of a\n[clusters.create](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nrequest.\n\nConsole\n\nEnable the component:\n\n1. In the Google Cloud console, open the Dataproc [Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd) page. The Set up cluster panel is selected.\n2. In the Components section, under Optional components, select Pig and other optional components to install on your cluster."]]