Vertex AI는 대규모 모델 학습을 운영할 수 있게 해주는 관리형 학습 서비스를 제공합니다. Python용 Vertex AI SDK를 사용한 실험 추적을 사용 설정하여 커스텀 학습 작업을 제출할 때 매개변수 및 성능 측정항목을 캡처할 수 있습니다.
이 기능은 다음과 같은 경우에는 사용할 수 없습니다.
Google Cloud Console 또는 Google Cloud CLI를 통해 학습 작업을 제출합니다.
학습 작업에 TPU 사용,
학습 작업에서 분산 학습을 사용합니다.
사전 빌드된 학습 컨테이너와 커스텀 컨테이너가 지원됩니다.
필수: google-cloud-aiplatform에 대해 1.24.1보다 높은 Vertex AI SDK for Python 버전이 설치되어 있습니다. TensorFlow로 학습하는 경우 충돌을 방지하기 위해 4.0 미만의 protobuf 버전이 설치되어 있는지 확인합니다.
데이터를 Vertex AI 실험에 로깅할 수 있는 두 가지 옵션은 자동 로깅과 수동 로깅입니다.
지원되는 프레임워크인 Fastai, Gluon, Keras, LightGBM, Pytorch Lightning, Scikit-learn, Spark, Statsmodels, XGBoost 중 하나를 사용하는 경우에는 자동 로깅을 사용하는 것이 좋습니다. 프레임워크가 지원되지 않거나 실험 실행에 로깅하려는 커스텀 측정항목이 있는 경우 학습 스크립트를 직접 조정하여 매개변수, 측정항목, 아티팩트를 로깅할 수 있습니다.
데이터 자동 로깅
자동 로깅을 사용 설정하려면 enable_autolog=True를 설정합니다. from_local_script를 참조하세요.
실험 실행 생성 여부를 선택할 수 있습니다. 실험 이름을 지정하지 않으면 실험 이름이 자동으로 생성됩니다.
Python용 Vertex AI SDK는 사용자를 위해 ExperimentRun 리소스 만들기를 처리합니다.
experiment: 실험 이름을 입력합니다. 실험에 텐서보드 인스턴스가 있어야 합니다.
섹션 탐색 메뉴에서 실험을 선택하면 Google Cloud 콘솔에서 실험 목록을 찾을 수 있습니다.
experiment_run: (선택사항) 실행 이름을 지정합니다. 지정하지 않으면 실행이 자동으로 생성됩니다.
수동으로 데이터 로깅
수동으로 데이터 로깅 옵션을 사용하여 학습 스크립트를 통합합니다.
학습 스크립트를 변경하는 방법은 다음과 같습니다.
importosimportpickleimportpandasaspdfromsklearn.linear_modelimportLinearRegression# To use manual logging APIs, import aiplatformfromgoogle.cloudimportaiplatform# Create Datasetdata={'A':[1.1,2.2,4.1,5.2],'B':[200,212.12,22,123],'Y':[1,0,1,0]}df=pd.DataFrame(data)X=df[['A','B']]Y=df['Y']# Train modelmodel=LinearRegression().fit(X,Y)# Save the model to gcsmodel_dir=os.getenv('AIP_MODEL_DIR')model_gcs=model_dir.replace('gs://','/gcs/')model_name='model.joblib'os.mkdir(model_gcs)f=open(os.path.join(model_gcs,model_name),'wb')pickle.dump(model,f)f=open(os.path.join(model_gcs,model_name),'wb')pickle.dump(model,f)# Call aiplatform's logging APIs to save data to Vertex AI Experiments.params=model.get_params()aiplatform.log_params(params)metrics={"training_accuracy":model.score(X,Y)}aiplatform.log_metrics(metrics)
실험 실행 생성 여부를 선택할 수 있습니다. 실험 이름을 지정하지 않으면 실험 이름이 자동으로 생성됩니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Run training job with experiment tracking\n\nVertex AI provides a [managed training service](/vertex-ai/docs/training/overview)\nthat lets you\noperationalize large scale model training. You can enable experiment tracking\nusing Vertex AI SDK for Python to capture parameters and performance metrics when\nsubmitting the custom training job.\n\nThis feature isn't available when you:\n\n- submit a training job through Google Cloud console or Google Cloud CLI,\n- use TPU in the training job,\n- use distributed training in the training job.\n\nBoth [prebuilt training containers](/vertex-ai/docs/training/pre-built-containers)\nand [custom containers are supported](/vertex-ai/docs/training/create-custom-container)\nare supported.\nRequired: A version of the Vertex AI SDK for Python higher than 1.24.1 for\ngoogle-cloud-aiplatform is installed. If you are training with TensorFlow,\nensure the protobuf version less than 4.0 is installed to avoid conflicts.\n\nThere are two options for logging data to Vertex AI Experiments,\nautologging and manual logging.\n\nAutologging is recommended if you are using one of these supported frameworks:\nFastai, Gluon, Keras, LightGBM, Pytorch Lightning, Scikit-learn, Spark,\nStatsmodels, XGBoost. If your framework isn't supported, or there are\ncustom metrics you want to log to your experiment run, you can manually adapt\nyour training script to log parameters, metrics and artifacts.\n\nAutoLog data\n------------\n\nTo enable autologging, just set `enable_autolog=True`,\nsee [`from_local_script`](/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob#google_cloud_aiplatform_CustomJob_from_local_script).\nYou have the option to\ncreate an experiment run, or not. If an experiment name isn't specified, one\nis created for you.\n\nThe Vertex AI SDK for Python handles creating [ExperimentRun](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.ExperimentRun#google_cloud_aiplatform_ExperimentRun)\nresources for you.\n\n### Python\n\n def create_custom_job_with_experiment_autologging_sample(\n project: str,\n location: str,\n staging_bucket: str,\n display_name: str,\n script_path: str,\n container_uri: str,\n service_account: str,\n experiment: str,\n experiment_run: Optional[str] = None,\n ) -\u003e None:\n aiplatform.init(project=project, location=location, staging_bucket=staging_bucket, experiment=experiment)\n\n job = aiplatform.CustomJob.from_local_script(\n display_name=display_name,\n script_path=script_path,\n container_uri=container_uri,\n enable_autolog=True,\n )\n\n job.run(\n service_account=service_account,\n experiment=experiment,\n experiment_run=experiment_run,\n )\n\n- `project`: . You can find these Project IDs in the Google Cloud console [welcome](https://console.cloud.google.com/welcome) page.\n- `location`: See [List of available locations.](/vertex-ai/docs/general/locations)\n- `staging_bucket`: The name you gave your bucket, for example, `my_bucket`.\n- `display_name`: The user-defined name of the [CustomJob](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob).\n- `script_path`: The path, relative to the working directory on your local file system, to the script that is the entry point for your training code.\n- `container_uri`: The URI of the training container image can be a Vertex AI [prebuilt training container](/vertex-ai/docs/training/pre-built-containers) or a [custom container](/vertex-ai/docs/training/containers-overview)\n- `service_account`: See [Create a service account with required permissions](/vertex-ai/docs/experiments/tensorboard-training#create_a_service_account_with_required_permissions).\n- `experiment`: Provide a name for your experiment. The experiment must have a [TensorBoard instance](/vertex-ai/docs/experiments/tensorboard-setup#create-tensorboard-instance). You can find your list of experiments in the Google Cloud console by selecting **Experiments** in the section nav.\n- `experiment_run`: (Optional) Specify a run name. If not specified, a run is auto-created.\n\nManually log data\n-----------------\n\nUse the manually log data option to incorporate your training script.\n\nHere's how to change the training script: \n\n import os\n import pickle\n import pandas as pd\n from sklearn.linear_model import LinearRegression\n # To use manual logging APIs, import aiplatform\n from google.cloud import aiplatform\n\n # Create Dataset\n data = {'A': [1.1,2.2,4.1,5.2],\n 'B': [200, 212.12, 22, 123],\n 'Y': [1,0,1,0]}\n df = pd.DataFrame(data)\n X = df[['A', 'B']]\n Y = df['Y']\n\n # Train model\n model = LinearRegression().fit(X, Y)\n\n # Save the model to gcs\n model_dir = os.getenv('AIP_MODEL_DIR')\n model_gcs = model_dir.replace('gs://', '/gcs/')\n model_name = 'model.joblib'\n os.mkdir(model_gcs)\n f = open(os.path.join(model_gcs, model_name), 'wb')\n pickle.dump(model, f)\n\n f = open(os.path.join(model_gcs, model_name), 'wb')\n pickle.dump(model, f)\n\n # Call aiplatform's logging APIs to save data to Vertex AI Experiments.\n params = model.get_params()\n aiplatform.log_params(params)\n metrics = {\"training_accuracy\": model.score(X,Y)}\n aiplatform.log_metrics(metrics)\n\nYou have the option to create an experiment run, or not. If an experiment name\nisn't specified, one is created for you.\n\nLearn more, see\n[Manually log data to an experiment run](/vertex-ai/docs/experiments/log-data).\n\n### Python\n\n def create_custom_job_with_experiment_sample(\n project: str,\n location: str,\n staging_bucket: str,\n display_name: str,\n script_path: str,\n container_uri: str,\n service_account: str,\n experiment: str,\n experiment_run: Optional[str] = None,\n ) -\u003e None:\n aiplatform.init(\n project=project,\n location=location,\n staging_bucket=staging_bucket,\n experiment=experiment\n )\n\n job = aiplatform.CustomJob.from_local_script(\n display_name=display_name,\n script_path=script_path,\n container_uri=container_uri,\n )\n\n job.run(\n service_account=service_account,\n experiment=experiment,\n experiment_run=experiment_run,\n )\n\n- `project`: . You can find these Project IDs in the Google Cloud console [welcome](https://console.cloud.google.com/welcome) page.\n- `location`: See [List of available locations](/vertex-ai/docs/general/locations)\n- `staging_bucket`: The name you gave your bucket, for example, `my_bucket`.\n- `display_name`: The user-defined name of the [CustomJob](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob).\n- `script_path`: The path, relative to the working directory on your local file system, to the script that is the entry point for your training code.\n- `container_uri`: The URI of the training container image can be a Vertex AI [prebuilt training container](/vertex-ai/docs/training/pre-built-containers), or a [custom container](/vertex-ai/docs/training/containers-overview). If you are using a custom container, be sure [`google-cloud-aiplatform\u003e=1.24.0`](/vertex-ai/docs/start/install-sdk#install-python-sdk) is installed.\n- `service_account`: See [Create a service account with required permissions](/vertex-ai/docs/experiments/tensorboard-training#create_a_service_account_with_required_permissions).\n- `experiment`: Provide a name for your experiment. You can find your list of experiments in the Google Cloud console by selecting **Experiments** in the section nav.\n- `experiment_run`: Specify a run name. If not specified, a run is be auto-created.\n\nView autologged parameters and metrics\n--------------------------------------\n\nUse the Vertex AI SDK for Python\nto [compare runs](/vertex-ai/docs/experiments/compare-analyze-runs#compare-runs) and\nget runs data.\nThe [Google Cloud console](/vertex-ai/docs/experiments/compare-analyze-runs#console-compare-analyze-runs)\nprovides an easy way to compare these runs.\n\nWhat's next\n-----------\n\n- [Log data to an experiment run](/vertex-ai/docs/experiments/log-data)\n\nRelevant notebook sample\n------------------------\n\n- [Custom training autologging](/vertex-ai/docs/experiments/user-journey/uj-custom-training-autologging)"]]