Class PerformanceRange (0.1.0)

PerformanceRange(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Performance range for a model deployment.

Attributes

Name Description
throughput_output_range google.cloud.gkerecommender_v1.types.TokensPerSecondRange
Output only. The range of throughput in output tokens per second. This is measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
ttft_range google.cloud.gkerecommender_v1.types.MillisecondRange
Output only. The range of TTFT (Time To First Token) in milliseconds. TTFT is the time it takes to generate the first token for a request.
ntpot_range google.cloud.gkerecommender_v1.types.MillisecondRange
Output only. The range of NTPOT (Normalized Time Per Output Token) in milliseconds. NTPOT is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.