Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
.
Kebijakan keamanan organisasi, aturan kepatuhan terhadap peraturan, dan pertimbangan lainnya dapat mendorong Anda untuk "merotasi" cluster Dataproc secara berkala dengan menghapus, lalu membuat ulang cluster sesuai jadwal.
Sebagai bagian dari rotasi cluster, cluster baru dapat disediakan dengan versi image Dataproc terbaru sambil mempertahankan setelan konfigurasi cluster yang diganti.
Halaman ini menunjukkan cara menyiapkan cluster yang akan dirotasi ("cluster yang dirotasi"), mengirimkan tugas ke cluster tersebut, lalu merotasi cluster sesuai kebutuhan.
Rotasi cluster image kustom:
Anda dapat menerapkan penyesuaian sebelumnya atau yang baru ke image dasar Dataproc sebelumnya atau yang baru saat membuat ulang cluster image kustom.
Menyiapkan cluster yang dirotasi
Untuk menyiapkan cluster yang dirotasi, buat nama cluster unik dengan akhiran stempel waktu untuk membedakan cluster lama dari cluster baru, lalu lampirkan label ke cluster yang menunjukkan apakah cluster adalah bagian dari kumpulan cluster yang dirotasi dan secara aktif menerima pengiriman tugas baru. Contoh ini menggunakan label cluster-pool dan
cluster-state=active untuk tujuan ini, tetapi Anda dapat menggunakan
nama label Anda sendiri.
cluster-pool-name: Nama kumpulan cluster yang terkait dengan
satu atau beberapa cluster. Nama ini digunakan dalam nama cluster dan dengan label cluster-pool
yang dilampirkan ke cluster untuk mengidentifikasi cluster sebagai bagian dari pool.
Buat cluster. Anda dapat menambahkan argumen dan menggunakan label yang berbeda.
Contoh Google Cloud CLI dan
directed acyclic graph (DAG) Apache Airflow
berikut mengirimkan tugas Apache Pig ke cluster. Label cluster digunakan untuk mengirimkan tugas ke cluster aktif dalam kumpulan cluster.
gcloud
Kirimkan tugas Apache Pig yang berada di Cloud Storage. Pilih cluster menggunakan label.
Kirimkan tugas Apache Pig yang berada di Cloud Storage menggunakan Airflow.
Pilih cluster menggunakan label.
from airflow import DAG
from airflow.providers.google.cloud.operators.dataproc import DataprocSubmitJobOperator
from datetime import datetime
# Declare variables
project_id= # e.g: my-project
region="us-central1"
dag_id='pig_wordcount'
cluster_labels={"cluster-pool":${CLUSTER_POOL},
"cluster-state":"active"}
wordcount_script="gs://bucket-name/scripts/wordcount.pig"
# Define DAG
dag = DAG(
dag_id,
schedule_interval=None,
start_date=datetime(2023, 8, 16),
catchup=False
)
PIG_JOB = {
"reference": {"project_id": project_id},
"placement": {"cluster_labels": cluster_labels},
"pig_job": {"query_file_uri": wordcount_script},
}
wordcount_task = DataprocSubmitJobOperator(
task_id='wordcount',
region=region,
project_id=project_id,
job=PIG_JOB,
dag=dag
)
Rotasi cluster
Perbarui label cluster yang dilampirkan ke cluster yang akan Anda ganti. Contoh
ini menggunakan label cluster-state=pendingfordeletion untuk menandakan bahwa
cluster tidak menerima pengiriman tugas baru dan sedang dihentikan,
tetapi Anda dapat menggunakan label Anda sendiri untuk tujuan ini.
Setelah label cluster diperbarui, cluster tidak menerima tugas baru karena tugas dikirim ke cluster dalam kumpulan cluster dengan label active saja (lihat Mengirim tugas ke cluster).
Hapus cluster yang Anda ganti setelah selesai menjalankan tugas.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eDataproc clusters can be rotated at regular intervals to adhere to security policies and compliance rules, enabling the provisioning of new clusters with updated image versions while retaining configurations.\u003c/p\u003e\n"],["\u003cp\u003eRotated clusters are set up by assigning unique, timestamp-suffixed names and attaching labels like \u003ccode\u003ecluster-pool\u003c/code\u003e and \u003ccode\u003ecluster-state=active\u003c/code\u003e to distinguish and identify them within a pool.\u003c/p\u003e\n"],["\u003cp\u003eJobs can be submitted to active clusters within a cluster pool by using cluster labels to ensure that the job is directed to a cluster that is currently accepting new submissions.\u003c/p\u003e\n"],["\u003cp\u003eClusters are rotated by updating their labels to indicate they are no longer active, for example, by changing \u003ccode\u003ecluster-state=active\u003c/code\u003e to \u003ccode\u003ecluster-state=pendingfordeletion\u003c/code\u003e, which prevents them from receiving new jobs.\u003c/p\u003e\n"],["\u003cp\u003eClusters marked as ready for deletion can be removed after they have completed their current jobs, which can be automated using a monitoring script.\u003c/p\u003e\n"]]],[],null,[".\n\nOrganization security policies, regulatory compliance rules, and other\nconsiderations can prompt you to \"rotate\" your Dataproc clusters\nat regular intervals by deleting, then recreating clusters on a schedule.\nAs part of cluster rotation, new clusters can be provisioned with the latest\nDataproc image versions while retaining the configuration settings\nof the replaced clusters.\n\nThis page shows you how to set up clusters that you plan to rotate (\"rotated\nclusters\"), submit jobs to them, and then rotate the clusters as needed.\n\n[Custom image](/dataproc/docs/guides/dataproc-images) cluster rotation:\nYou can apply previous or new customizations to a previous or new\nDataproc base image when recreating the custom image cluster.\n\nSet up rotated clusters\n\nTo set up rotated clusters, create unique, timestamp-suffixed cluster names\nto distinguish previous from new clusters, and then attach labels to clusters\nthat indicate if a cluster is part of a rotated cluster pool and actively\nreceiving new job submissions. This example uses `cluster-pool` and\n`cluster-state=active` labels for these purposes, but you can use\nyour own label names.\n\n1. Set environment variables:\n\n ```\n PROJECT=project ID \\\n REGION=/compute/docs/regions-zones#available \\\n CLUSTER_POOL=cluster-pool-name \\\n CLUSTER_NAME=$CLUSTER_POOL-$(date '+%Y%m%d%H%M') \\\n BUCKET=Cloud Storage bucket-name\n ```\n\n \u003cbr /\u003e\n\n Notes:\n - \u003cvar translate=\"no\"\u003ecluster-pool-name\u003c/var\u003e: The name of the cluster pool associated with one or more clusters. This name is used in the cluster name and with the `cluster-pool` label attached to the cluster to identify the cluster as part of the pool.\n2. Create the cluster. You can add arguments and use different labels.\n\n ```\n gcloud dataproc clusters create ${CLUSTER_NAME} \\\n --project=${PROJECT_ID} \\\n --region=${REGION} \\\n --bucket=${BUCKET} \\\n --labels=\"cluster-pool=${CLUSTER_POOL},cluster-state=active\"\n ```\n\nSubmit jobs to clusters\n\nThe following Google Cloud CLI and\n[Apache Airflow directed acyclic graph (DAG)](/composer/docs/how-to/using/writing-dags)\nexamples submit an Apache Pig job to a cluster. Cluster labels are\nused to submit the job to an active cluster within a cluster pool. \n\ngcloud\n\nSubmit an Apache Pig job located in Cloud Storage. Pick the cluster using labels.\n\n\u003cbr /\u003e\n\n```\ngcloud dataproc jobs submit pig \\\n --region=${REGION} \\\n --file=gs://${BUCKET}/scripts/script.pig \\\n --cluster-labels=\"cluster-pool=${CLUSTER_POOL},cluster-state=active\"\n \n```\n\n\u003cbr /\u003e\n\nAirflow\n\nSubmit an Apache Pig job located in Cloud Storage using Airflow.\nPick the cluster using labels. \n\n```\nfrom airflow import DAG\nfrom airflow.providers.google.cloud.operators.dataproc import DataprocSubmitJobOperator\nfrom datetime import datetime\n\n# Declare variables\nproject_id= # e.g: my-project\nregion=\"us-central1\"\ndag_id='pig_wordcount'\ncluster_labels={\"cluster-pool\":${CLUSTER_POOL},\n \"cluster-state\":\"active\"}\nwordcount_script=\"gs://bucket-name/scripts/wordcount.pig\"\n\n# Define DAG\n\ndag = DAG(\n dag_id,\n schedule_interval=None,\n start_date=datetime(2023, 8, 16),\n catchup=False\n)\n\nPIG_JOB = {\n \"reference\": {\"project_id\": project_id},\n \"placement\": {\"cluster_labels\": cluster_labels},\n \"pig_job\": {\"query_file_uri\": wordcount_script},\n}\n\nwordcount_task = DataprocSubmitJobOperator(\n task_id='wordcount',\n region=region,\n project_id=project_id,\n job=PIG_JOB,\n dag=dag\n)\n```\n\n\u003cbr /\u003e\n\nRotate clusters\n\n1. Update the cluster labels attached to the clusters you are rotating out. This\n examples uses the `cluster-state=pendingfordeletion` label to signify that\n the cluster is not receiving new job submissions and is being rotated out,\n but you can use your own label for this purpose.\n\n ```\n gcloud dataproc clusters update ${CLUSTER_NAME} \\\n --region=${REGION} \\\n --update-labels=\"cluster-state=pendingfordeletion\"\n ```\n\n \u003cbr /\u003e\n\n After the cluster label is updated, the cluster does not receive new jobs\n since jobs are submitted to clusters within a cluster pool\n with `active` labels only (see\n [Submit jobs to clusters](#submit_jobs_to_clusters)).\n2. Delete clusters you are rotating out after they finish running jobs.\n\n | **Note:** You can automate this step with a monitoring script that fetches clusters with the `cluster-state=pendingfordeletion` label (or other label you added with the previous command), checks that no jobs are running on the cluster, and then deletes the cluster."]]