This document describes the logs and metrics that Gemini on Google Distributed Cloud connected API collects and exports.
Configure logging and monitoring
Before you can start gathering logs and metrics, you must do the following:
Enable the logging APIs by using the following commands:
gcloud services enable opsconfigmonitoring.googleapis.com --project PROJECT_ID gcloud services enable logging.googleapis.com --project PROJECT_ID gcloud services enable monitoring.googleapis.com --project PROJECT_ID
Replace
PROJECT_ID
with the ID of the target Google Cloud project.Grant the roles required to write logs and metrics:
gcloud projects add-iam-policy-binding PROJECT_ID \ --role roles/opsconfigmonitoring.resourceMetadata.writer \ --member "serviceAccount:PROJECT_ID.svc.id.goog[kube-system/metadata-agent]" gcloud projects add-iam-policy-binding PROJECT_ID \ --role roles/logging.logWriter \ --member "serviceAccount:PROJECT_ID.svc.id.goog[kube-system/stackdriver-log-forwarder]" gcloud projects add-iam-policy-binding PROJECT_ID \ --role roles/monitoring.metricWriter \ --member "serviceAccount:PROJECT_ID.svc.id.goog[kube-system/gke-metrics-agent]"
Replace
PROJECT_ID
with the ID of the target Google Cloud project.
Logs
This section lists the Cloud Logging resource types supported by Gemini on GDC connected API. To view Gemini on GDC connected API logs, use the Logs Explorer in the Google Cloud console. Gemini on GDC connected API} logging is always enabled.
The Gemini on GDC connected API connected logged resource type is aiplatform.googleapis.com/Endpoint
.
You can also capture and retrieve Gemini on GDC connected API connected logs by using the Cloud Logging API. For information about how to configure this logging mechanism, see the documentation for Cloud Logging client libraries.
Metrics
This section lists the Cloud Monitoring metrics supported by Gemini on GDC connected API. To view Gemini on GDC connected API metrics, use the Metrics explorer in the Google Cloud console.
Distributed Cloud connected cluster metrics
Gemini on GDC connected API endpoints are deployed on Distributed Cloud connected clusters. See Logs and metrics for information on logs and metrics for Distributed Cloud connected.
Inference Gateway metrics
Prometheus Metric Name | Metrics Type | Datatype | Labels | Chemist type | Chemist metric_kind | Chemist value_type | Chemist labels |
---|---|---|---|---|---|---|---|
ig_ops_successful_incoming_requests | Counter | model | aiplatform.googleapis.com/prediction/internal/gdc/ig/successful_requests | CUMULATIVE | INT64 | model | |
ig_ops_unique_users | Counter | model | aiplatform.googleapis.com/prediction/internal/gdc/ig/unique_users | CUMULATIVE | INT64 | model | |
ig_tokens_per_minute | Histogram | double | model | aiplatform.googleapis.com/prediction/internal/gdc/ig/tokens_per_min | CUMULATIVE | DISTRIBUTION | model |
ig_total_response_time | Histogram | double | model | aiplatform.googleapis.com/prediction/internal/gdc/ig/response_time | CUMULATIVE | DISTRIBUTION | model |
ig_ops_ffmpeg_image_latency | Histogram | double | model | aiplatform.googleapis.com/prediction/internal/gdc/ig/ffmpeg_image_latencies | CUMULATIVE | DISTRIBUTION | model |
ig_ops_ffmpeg_video_latency | Histogram | double | model | aiplatform.googleapis.com/prediction/internal/gdc/ig/ffmpeg_video_latencies | CUMULATIVE | DISTRIBUTION | model |
ig_ops_ffmpeg_audio_latency | Histogram | double | model | aiplatform.googleapis.com/prediction/internal/gdc/ig/ffmpeg_audio_latencies | CUMULATIVE | DISTRIBUTION | model |
ig_time_to_first_token | Histogram | double | model context_window | aiplatform.googleapis.com/prediction/internal/gdc/ig/ttft | CUMULATIVE | DISTRIBUTION | model context_window |
ig_time_per_output_token | Histogram | double | model context_window | aiplatform.googleapis.com/prediction/internal/gdc/ig/tpot | CUMULATIVE | DISTRIBUTION | model context_window |
ig_cache_hit | Counter | model | aiplatform.googleapis.com/prediction/internal/gdc/ig/cache_hit_count | CUMULATIVE | DISTRIBUTION | model _gdch_project | |
ig_cache_miss | Counter | model | aiplatform.googleapis.com/prediction/internal/gdc/ig/cache_miss_count | CUMULATIVE | DISTRIBUTION | model _gdch_project |
GenAI Router metrics
Prometheus Metric Name | Metrics Type | Datatype | Labels | Chemist type | Chemist metric_kind | Chemist value_type | Chemist labels |
---|---|---|---|---|---|---|---|
llm_total_request_latency_milliseconds | Histogram | double | context_window model | aiplatform.googleapis.com/prediction/internal/gdc/gair/total_request_latencies | CUMULATIVE | DISTRIBUTION | context_window model |
llm_unary_request_latency_milliseconds | Histogram | double | context_window model | aiplatform.googleapis.com/prediction/internal/gdc/gair/unary_request_latencies | CUMULATIVE | DISTRIBUTION | context_window model |
llm_streaming_ttft_milliseconds | Histogram | double | context_window model | aiplatform.googleapis.com/prediction/internal/gdc/gair/ttft_ms | CUMULATIVE | DISTRIBUTION | context_window model |
llm_streaming_tpot_milliseconds | Histogram | double | context_window model | aiplatform.googleapis.com/prediction/internal/gdc/gair/tpot_ms | CUMULATIVE | DISTRIBUTION | context_window model |
llm_input_token_count | Histogram | double | model | aiplatform.googleapis.com/prediction/internal/gdc/gair/input_token_count | CUMULATIVE | DISTRIBUTION | model |
llm_output_token_count | Histogram | double | model | aiplatform.googleapis.com/prediction/internal/gdc/gair/output_token_count | CUMULATIVE | DISTRIBUTION | model |
llm_success_response_count | Counter | double | model | aiplatform.googleapis.com/prediction/internal/gdc/gair/success_response_count | CUMULATIVE | INT64 | model |
llm_failure_response_count | Counter | double | model | aiplatform.googleapis.com/prediction/internal/gdc/gair/failure_response_count | CUMULATIVE | INT64 | model |
llm_text_tokenization_latency_milliseconds | Histogram | double | model | aiplatform.googleapis.com/prediction/internal/gdc/gair/text_tokenization_latencies | CUMULATIVE | DISTRIBUTION | model |
llm_image_tokenization_latency_milliseconds | Histogram | double | aiplatform.googleapis.com/prediction/internal/gdc/gair/image_tokenization_latencies | CUMULATIVE | DISTRIBUTION | ||
llm_audio_tokenization_latency_milliseconds | Histogram | double | aiplatform.googleapis.com/prediction/internal/gdc/gair/audio_tokenization_latencies | CUMULATIVE | DISTRIBUTION |
GPU metrics
Prometheus Metric Name | Metrics Type | Datatype | Labels | Chemist type | Chemist metric_kind | Chemist value_type | Chemist labels |
---|---|---|---|---|---|---|---|
DCGM_FI_DEV_MEM_COPY_UTIL | Gauge | int64 | gpu UUID pci_bus_id device modelName Hostname DCGM_FI_DRIVER_VERSION | aiplatform.googleapis.com/prediction/internal/gdc/gpu/memory_util | GAUGE | INT64 | uuid gpu_model |
DCGM_FI_DEV_MEMORY_TEMP | Gauge | int64 | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/memory_temp | GAUGE | INT64 | Same as Above |
DCGM_FI_DEV_POWER_USAGE | Gauge | double | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/power_usage | GAUGE | DOUBLE | Same as Above |
DCGM_FI_DEV_GPU_TEMP | Gauge | double | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/gpu_temp | GAUGE | INT64 | Same as Above |
DCGM_FI_DEV_GPU_UTIL | Gauge | double | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/gpu_util | GAUGE | INT64 | Same as Above |
DCGM_FI_DEV_ENC_UTIL | Gauge | int64 | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/encode_util | GAUGE | INT64 | Same as Above |
DCGM_FI_DEV_XID_ERRORS | Counter | int64 | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/xid_errors | CUMULATIVE | INT64 | Same as Above |
DCGM_FI_DEV_POWER_VIOLATION | Counter | int64 | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_power | CUMULATIVE | INT64 | Same as Above |
DCGM_FI_DEV_THERMAL_VIOLATION | Counter | int64 | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_thermal | CUMULATIVE | INT64 | Same as Above |
DCGM_FI_DEV_SYNC_BOOST_VIOLATION | Counter | int64 | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_sync_boost | CUMULATIVE | INT64 | Same as Above |
DCGM_FI_DEV_BOARD_LIMIT_VIOLATION | Counter | int64 | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_board_limit | CUMULATIVE | INT64 | Same as Above |
DCGM_FI_DEV_LOW_UTIL_VIOLATION | Counter | int64 | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_low_util | CUMULATIVE | INT64 | Same as Above |
DCGM_FI_DEV_RELIABILITY_VIOLATION | Counter | int64 | Same as Above | aiplatform.googleapis.com/prediction/internal/gdc/gpu/violation_reliability | CUMULATIVE | INT64 | Same as Above |