Addestrare un modello TensorFlow con Keras su Google Kubernetes Engine
Mantieni tutto organizzato con le raccolte
Salva e classifica i contenuti in base alle tue preferenze.
La sezione seguente fornisce un esempio di
ottimizzazione di un modello BERT
per la classificazione di sequenze utilizzando la libreria
Hugging Face Transformers
con TensorFlow. Il set di dati viene scaricato in un volume basato su Parallelstore montato, consentendo all'addestramento del modello di leggere direttamente i dati dal volume.
Prerequisiti
Assicurati che il tuo nodo abbia almeno 8 GiB di memoria disponibile.
[[["Facile da capire","easyToUnderstand","thumb-up"],["Il problema è stato risolto","solvedMyProblem","thumb-up"],["Altra","otherUp","thumb-up"]],[["Difficile da capire","hardToUnderstand","thumb-down"],["Informazioni o codice di esempio errati","incorrectInformationOrSampleCode","thumb-down"],["Mancano le informazioni o gli esempi di cui ho bisogno","missingTheInformationSamplesINeed","thumb-down"],["Problema di traduzione","translationIssue","thumb-down"],["Altra","otherDown","thumb-down"]],["Ultimo aggiornamento 2025-09-04 UTC."],[],[],null,["# Train a TensorFlow model with Keras on Google Kubernetes Engine\n\nThe following section provides an example of\n[fine-tuning a BERT model](https://huggingface.co/docs/transformers/training#train-a-tensorflow-model-with-keras)\nfor sequence classification using the\n[Hugging Face transformers](https://github.com/huggingface/transformers) library\nwith TensorFlow. The dataset is downloaded into a mounted\nParallelstore-backed volume, allowing the model training to directly read data\nfrom the volume.\n\nPrerequisites\n-------------\n\n- Ensure your node has at least 8 GiB of memory available.\n- [Create a PersistentVolumeClaim requesting for a Parallelstore-backed volume](/kubernetes-engine/docs/how-to/persistent-volumes/parallelstore-csi-new-volume#pvc).\n\nSave the following YAML manifest (`parallelstore-csi-job-example.yaml`) for your model training Job. \n\n apiVersion: batch/v1\n kind: Job\n metadata:\n name: parallelstore-csi-job-example\n spec:\n template:\n metadata:\n annotations:\n gke-parallelstore/cpu-limit: \"0\"\n gke-parallelstore/memory-limit: \"0\"\n spec:\n securityContext:\n runAsUser: 1000\n runAsGroup: 100\n fsGroup: 100\n containers:\n - name: tensorflow\n image: jupyter/tensorflow-notebook@sha256:173f124f638efe870bb2b535e01a76a80a95217e66ed00751058c51c09d6d85d\n command: [\"bash\", \"-c\"]\n args:\n - |\n pip install transformers datasets\n python - \u003c\u003cEOF\n from datasets import load_dataset\n dataset = load_dataset(\"glue\", \"cola\", cache_dir='/data')\n dataset = dataset[\"train\"]\n from transformers import AutoTokenizer\n import numpy as np\n tokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased\")\n tokenized_data = tokenizer(dataset[\"sentence\"], return_tensors=\"np\", padding=True)\n tokenized_data = dict(tokenized_data)\n labels = np.array(dataset[\"label\"])\n from transformers import TFAutoModelForSequenceClassification\n from tensorflow.keras.optimizers import Adam\n model = TFAutoModelForSequenceClassification.from_pretrained(\"bert-base-cased\")\n model.compile(optimizer=Adam(3e-5))\n model.fit(tokenized_data, labels)\n EOF\n volumeMounts:\n - name: parallelstore-volume\n mountPath: /data\n volumes:\n - name: parallelstore-volume\n persistentVolumeClaim:\n claimName: parallelstore-pvc\n restartPolicy: Never\n backoffLimit: 1\n\nApply the YAML manifest to the cluster.\n\n`kubectl apply -f parallelstore-csi-job-example.yaml`\n\nCheck your data loading and model training progress with the following command: \n\n POD_NAME=$(kubectl get pod | grep 'parallelstore-csi-job-example' | awk '{print $1}')\n kubectl logs -f $POD_NAME -c tensorflow\n\n| **Note:** The model training takes approximately five minutes to complete."]]