Spark 指标

默认情况下,Dataproc Serverless 允许 可用的 Spark 指标,除非您使用 Spark 指标集合属性 来停用或替换一个或多个 Spark 指标的集合。

Spark 指标集合属性

您可以使用本部分列出的属性 来停用或覆盖一个或多个 可用的 Spark 指标

属性 说明
spark.dataproc.driver.metrics 用于停用或替换 Spark 驱动程序指标
spark.dataproc.executor.metrics 用于停用或替换 Spark 执行器指标
spark.dataproc.system.metrics 用于停用 Spark 系统指标

gcloud CLI 示例:

  • 停用 Spark 驱动程序指标收集:

    gcloud dataproc batches submit spark \
        --properties spark.dataproc.driver.metrics="" \
        --region=region \
        other args ...
    
  • 将 Spark 默认驱动程序指标集合替换为仅收集 BlockManager:disk.diskSpaceUsed_MBDAGScheduler:stage.failedStages 指标:

    gcloud dataproc batches submit spark \
        --properties=^~^spark.dataproc.driver.metrics="BlockManager:disk.diskSpaceUsed_MB,DAGScheduler:stage.failedStages" \
        --region=region \
        other args ...
    

可用的 Spark 指标

Dataproc Serverless 会收集本部分中列出的 Spark 指标 除非您使用 Spark 指标集合属性 来停用或覆盖其集合。

custom.googleapis.com/METRIC_EXPLORER_NAME

Spark 驱动程序指标

指标 Metrics Explorer 名称
BlockManager:disk.diskSpaceUsed_MB spark/driver/BlockManager/disk/diskSpaceUsed_MB
BlockManager:memory.maxMem_MB spark/driver/BlockManager/memory/maxMem_MB
BlockManager:memory.memUsed_MB spark/driver/BlockManager/memory/memUsed_MB
DAGScheduler:job.activeJobs spark/driver/DAGScheduler/job/activeJobs
DAGScheduler:job.allJobs spark/driver/DAGScheduler/job/allJobs
DAGScheduler:messageProcessingTime spark/driver/DAGScheduler/messageProcessingTime
DAGScheduler:stage.failedStages spark/driver/DAGScheduler/stage/failedStages
DAGScheduler:stage.runningStages spark/driver/DAGScheduler/stage/runningStages
DAGScheduler:stage.waitingStages spark/driver/DAGScheduler/stage/waitingStages

Spark 执行器指标

指标 Metrics Explorer 名称
ExecutorAllocationManager:executors.numberExecutorsDecommissionUnfinished spark/driver/ExecutorAllocationManager/executors/numberExecutorsDecommissionUnfinished
ExecutorAllocationManager:executors.numberExecutorsExitedUnexpectedly spark/driver/ExecutorAllocationManager/executors/numberExecutorsExitedUnexpectedly
ExecutorAllocationManager:executors.numberExecutorsGracefullyDecommissioned spark/driver/ExecutorAllocationManager/executors/numberExecutorsGracefullyDecommissioned
ExecutorAllocationManager:executors.numberExecutorsKilledByDriver spark/driver/ExecutorAllocationManager/executors/numberExecutorsKilledByDriver
LiveListenerBus:queue.executorManagement.listenerProcessingTime spark/driver/LiveListenerBus/queue/executorManagement/listenerProcessingTime
executor:bytesRead spark/executor/bytesRead
executor:bytesWritten spark/executor/bytesWrite
executor:cpuTime spark/executor/cpuTime
executor:diskBytesSpilled spark/executor/diskBytesSpilled
executor:jvmGCTime spark/executor/jvmGCTime
executor:memoryBytesSpilled spark/executor/memoryBytesSpilled
executor:recordsRead spark/executor/recordsRead
executor:recordsWritten Spark/执行器/记录写入
executor:runTime spark/executor/runTime
executor:shuffleFetchWaitTime spark/executor/shuffleFetchWaitTime
executor:shuffleRecordsRead spark/executor/shuffleRecordsRead
executor:shuffleRecordsWritten spark/executor/shuffleRecordsWritten
executor:shuffleRemoteBytesReadToDisk spark/executor/shuffleRemoteBytesReadToDisk
executor:shuffleWriteTime spark/executor/shuffleWriteTime
executor:succeededTasks spark/executor/succeededTasks
ExecutorMetrics:MajorGCTime spark/executor/ExecutorMetrics/MajorGCTime
ExecutorMetrics:MinorGCTime spark/executor/ExecutorMetrics/MinorGCTime

系统指标

指标 Metrics Explorer 名称
Agent:uptime 客服人员/正常运行时间
cpu:利用率 CPU 利用率
disk:bytes_used 磁盘/已用字节数
disk:percent_used 磁盘/已用百分比
memory:bytes_used 内存用量/已使用的字节数
memory:percent_used 内存/使用百分比
network:tcp_connections network/tcp_connections

查看 Spark 指标

要查看批处理指标,请在 Dataproc 上点击一个批处理 ID 批次页面 Google Cloud 控制台,打开批处理详细信息页面, 该标签页会在“监控”标签页下显示批量工作负载的指标图。

请参阅 Dataproc Cloud Monitoring ,详细了解如何查看收集的指标。