Spark 指标

本文档介绍了 Spark 指标。默认情况下,Dataproc Serverless 会启用可用 Spark 指标的收集,除非您使用 Spark 指标收集属性停用或替换一个或多个 Spark 指标的收集。

如需了解您可以在提交 Dataproc Serverless Spark 批处理工作负载时设置的其他属性,请参阅 Spark 属性

Spark 指标收集属性

您可以使用本部分列出的属性来停用或替换收集一个或多个可用 Spark 指标

属性 说明
spark.dataproc.driver.metrics 用于停用或替换 Spark 驱动程序指标
spark.dataproc.executor.metrics 用于停用或替换 Spark 执行器指标
spark.dataproc.system.metrics 用于停用 Spark 系统指标

gcloud CLI 示例:

  • 停用 Spark 驱动程序指标收集:

    gcloud dataproc batches submit spark \
        --properties spark.dataproc.driver.metrics="" \
        --region=region \
        other args ...
    
  • 替换 Spark 默认的驱动程序指标收集,以仅收集 BlockManager:disk.diskSpaceUsed_MBDAGScheduler:stage.failedStages 指标:

    gcloud dataproc batches submit spark \
        --properties=^~^spark.dataproc.driver.metrics="BlockManager:disk.diskSpaceUsed_MB,DAGScheduler:stage.failedStages" \
        --region=region \
        other args ...
    

可用的 Spark 指标

除非您使用 Spark 指标收集属性停用或替换其收集,否则 Dataproc Serverless 会收集本部分列出的 Spark 指标。

custom.googleapis.com/METRIC_EXPLORER_NAME

Spark 驱动器指标

指标 Metrics Explorer 名称
BlockManager:disk.diskSpaceUsed_MB spark/driver/BlockManager/disk/diskSpaceUsed_MB
BlockManager:memory.maxMem_MB spark/driver/BlockManager/memory/maxMem_MB
BlockManager:memory.memUsed_MB spark/driver/BlockManager/memory/memUsed_MB
DAGScheduler:job.activeJobs spark/driver/DAGScheduler/job/activeJobs
DAGScheduler:job.allJobs spark/driver/DAGScheduler/job/allJobs
DAGScheduler:messageProcessingTime spark/driver/DAGScheduler/messageProcessingTime
DAGScheduler:stage.failedStages spark/driver/DAGScheduler/stage/failedStages
DAGScheduler:stage.runningStages spark/driver/DAGScheduler/stage/runningStages
DAGScheduler:stage.waitingStages spark/driver/DAGScheduler/stage/waitingStages

Spark 执行器指标

指标 Metrics Explorer 名称
ExecutorAllocationManager:executors.numberExecutorsDecommissionUnfinished spark/driver/ExecutorAllocationManager/executors/numberExecutorsDecommissionUnfinished
ExecutorAllocationManager:executors.numberExecutorsExitedUnexpectedly spark/driver/ExecutorAllocationManager/executors/numberExecutorsExitedUnexpectedly
ExecutorAllocationManager:executors.numberExecutorsGracefullyDecommissioned spark/driver/ExecutorAllocationManager/executors/numberExecutorsGracefullyDecommissioned
ExecutorAllocationManager:executors.numberExecutorsKilledByDriver spark/driver/ExecutorAllocationManager/executors/numberExecutorsKilledByDriver
LiveListenerBus:queue.executorManagement.listenerProcessingTime spark/driver/LiveListenerBus/queue/executorManagement/listenerProcessingTime
executor:bytesRead spark/executor/bytesRead
executor:bytesWritten spark/executor/bytesWritten
executor:cpuTime spark/executor/cpuTime
executor:diskBytesSpilled spark/executor/diskBytesSpilled
executor:jvmGCTime spark/executor/jvmGCTime
executor:memoryBytesSpilled spark/executor/memoryBytesSpilled
executor:recordsRead spark/executor/recordsRead
executor:recordsWritten spark/executor/recordsWritten
executor:runTime spark/executor/runTime
executor:shuffleFetchWaitTime spark/executor/shuffleFetchWaitTime
executor:shuffleRecordsRead spark/executor/shuffleRecordsRead
executor:shuffleRecordsWritten spark/executor/shuffleRecordsWritten
executor:shuffleRemoteBytesReadToDisk spark/executor/shuffleRemoteBytesReadToDisk
executor:shuffleWriteTime spark/executor/shuffleWriteTime
executor:succeededTasks spark/executor/succeededTasks
ExecutorMetrics:MajorGCTime spark/executor/ExecutorMetrics/MajorGCTime
ExecutorMetrics:MinorGCTime spark/executor/ExecutorMetrics/MinorGCTime

系统指标

指标 Metrics Explorer 名称
agent:uptime agent/uptime
cpu:utilization CPU 利用率
disk:bytes_used disk/bytes_used
disk:percent_used disk/percent_used
memory:bytes_used memory/bytes_used
memory:percent_used memory/percent_used
network:tcp_connections network/tcp_connections

查看 Spark 指标

如需查看批处理指标,请点击 Google Cloud 控制台中 Dataproc 批处理页面上的批处理 ID,打开批处理详细信息页面,该页面会在监控标签页下显示批处理工作负载的指标图表。

图 1. 批量工作负载的 Spark 指标图。

如需详细了解如何查看收集的指标,请参阅 Dataproc Cloud Monitoring