Class PerformanceStats (0.1.0)

PerformanceStats(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Performance statistics for a model deployment.

Attributes
Name	Description
`queries_per_second`	`float` Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.
`output_tokens_per_second`	`int` Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
`ntpot_milliseconds`	`int` Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
`ttft_milliseconds`	`int` Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.
`cost`	`MutableSequence[google.cloud.gkerecommender_v1.types.Cost]` Output only. The cost of running the model deployment.

Attributes