Spark 指标

默认情况下,Dataproc Serverless 支持收集可用的 Spark 指标,除非您使用 Spark 指标收集属性停用或替换一个或多个 Spark 指标的收集。

Spark 指标集合属性

您可以使用本部分列出的属性来停用或替换一个或多个可用 Spark 指标的集合。

属性 说明
spark.dataproc.driver.metrics 用于停用或替换 Spark 驱动程序指标
spark.dataproc.executor.metrics 用于停用或替换 Spark Executor 指标
spark.dataproc.system.metrics 用于停用 Spark 系统指标

gcloud CLI 示例:

  • 停用 Spark 驱动程序指标收集:

    gcloud dataproc batches submit spark \
        --properties spark.dataproc.driver.metrics="" \
        --region=region \
        other args ...
    
  • 替换 Spark 默认驱动程序指标收集,以仅收集 BlockManager:disk.diskSpaceUsed_MBDAGScheduler:stage.failedStages 指标:

    gcloud dataproc batches submit spark \
        --properties=spark.dataproc.driver.metrics="BlockManager:disk.diskSpaceUsed_MB,DAGScheduler:stage.failedStages" \
        --region=region \
        other args ...
    

可用的 Spark 指标

除非您使用 Spark 指标收集属性停用或替换其集合,否则 Dataproc Serverless 会收集本部分中列出的 Spark 指标。

custom.googleapis.com/METRIC_EXPLORER_NAME.

Spark 驱动程序指标

指标 Metrics Explorer 名称
BlockManager:disk.diskSpaceUsed_MB spark/driver/BlockManager/disk/diskSpaceUsed_MB
BlockManager:memory.maxMem_MB spark/driver/BlockManager/memory/maxMem_MB
BlockManager:memory.memUsed_MB spark/driver/BlockManager/memory/memUsed_MB
DAGScheduler:job.activeJobs spark/driver/DAGScheduler/job/activeJobs
DAGScheduler:job.allJobs spark/driver/DAGScheduler/job/allJobs
DAGScheduler:messageProcessingTime spark/driver/DAGScheduler/messageProcessingTime
DAGScheduler:stage.failedStages spark/driver/DAGScheduler/stage/failedStages
DAGScheduler:stage.runningStages spark/driver/DAGScheduler/stage/runningStages
DAGScheduler:stage.waitingStages spark/driver/DAGScheduler/stage/waitingStages

Spark Executor 指标

指标 Metrics Explorer 名称
ExecutorAllocationManager:executors.numberExecutorsDecommissionUnfinished spark/driver/ExecutorAllocationManager/executors/numberExecutorsDecommissionUnfinished
ExecutorAllocationManager:executors.numberExecutorsExitedUnexpectedly spark/driver/ExecutorAllocationManager/executors/numberExecutorsExitedUnexpectedly
ExecutorAllocationManager:executors.numberExecutorsGracefullyDecommissioned spark/driver/ExecutorAllocationManager/executors/numberExecutorsGracefullyDecommissioned
ExecutorAllocationManager:executors.numberExecutorsKilledByDriver spark/driver/ExecutorAllocationManager/executors/numberExecutorsKilledByDriver
LiveListenerBus:queue.executorManagement.listenerProcessingTime spark/driver/LiveListenerBus/queue/executorManagement/listenerProcessingTime
executor:bytesRead spark/executor/bytesRead
executor:bytesWritten spark/executor/bytesWritten
executor:cpuTime spark/executor/cpuTime
executor:diskBytesSpilled spark/executor/diskBytesSpilled
executor:jvmGCTime spark/executor/jvmGCTime
executor:memoryBytesSpilled spark/executor/memoryBytesSpilled
executor:recordsRead spark/executor/recordsRead
executor:recordsWritten spark/executor/recordsWritten
executor:runTime spark/executor/runTime
executor:shuffleFetchWaitTime spark/executor/shuffleFetchWaitTime
executor:shuffleRecordsRead spark/executor/shuffleRecordsRead
executor:shuffleRecordsWritten spark/executor/shuffleRecordsWritten
executor:shuffleRemoteBytesReadToDisk spark/executor/shuffleRemoteBytesReadToDisk
executor:shuffleWriteTime spark/executor/shuffleWriteTime
executor:succeededTasks spark/executor/succeededTasks
ExecutorMetrics:MajorGCTime spark/executor/ExecutorMetrics/MajorGCTime
ExecutorMetrics:MinorGCTime spark/executor/ExecutorMetrics/MinorGCTime

系统指标

指标 指标分层图表名称
agent:uptime agent/uptime
cpu:利用率 CPU 利用率
disk:bytes_used 磁盘/已使用的字节数
disk:percent_used 磁盘/百分比使用
storage:bytes_used 内存/已使用的字节数
memory:percent_used 内存/已使用百分比
network:tcp_connections network/tcp_connections

查看 Spark 指标

如需查看批量指标,请在 Google Cloud 控制台的 Dataproc 批次页面上点击批次 ID,以打开批量详细信息页面,该页面会在监控标签页下显示批量工作负载的指标图表。

如需详细了解如何查看收集的指标,请参阅 Dataproc Cloud Monitoring