本文档介绍了 Spark 指标。默认情况下,Serverless for Apache Spark 会启用可用的 Spark 指标的收集,除非您使用 Spark 指标收集属性来停用或替换一个或多个 Spark 指标的收集。
如需了解在提交 Serverless for Apache Spark Spark 批处理工作负载时可以设置的其他属性,请参阅 Spark 属性
Spark 指标收集属性
您可以使用本部分中列出的属性来停用或替换一个或多个可用的 Spark 指标的收集。
| 属性 | 说明 | 
|---|---|
| spark.dataproc.driver.metrics | 用于停用或替换 Spark 驱动程序指标。 | 
| spark.dataproc.executor.metrics | 用于停用或替换 Spark 执行程序指标。 | 
| spark.dataproc.system.metrics | 用于停用 Spark 系统指标。 | 
gcloud CLI 示例:
- 停用 Spark 驱动程序指标收集: - gcloud dataproc batches submit spark \ --properties spark.dataproc.driver.metrics="" \ --region=region \ other args ... 
- 替换 Spark 默认的驱动程序指标收集,以仅收集 - BlockManager:disk.diskSpaceUsed_MB和- DAGScheduler:stage.failedStages指标:- gcloud dataproc batches submit spark \ --properties=^~^spark.dataproc.driver.metrics="BlockManager:disk.diskSpaceUsed_MB,DAGScheduler:stage.failedStages" \ --region=region \ other args ... 
可用的 Spark 指标
除非您使用 Spark 指标收集属性来停用或替换这些指标的收集,否则 Serverless for Apache Spark 会收集本部分中列出的 Spark 指标。
custom.googleapis.com/METRIC_EXPLORER_NAME。
Spark 驱动程序指标
| 指标 | Metrics Explorer 名称 | 
|---|---|
| BlockManager:disk.diskSpaceUsed_MB | spark/driver/BlockManager/disk/diskSpaceUsed_MB | 
| BlockManager:memory.maxMem_MB | spark/driver/BlockManager/memory/maxMem_MB | 
| BlockManager:memory.memUsed_MB | spark/driver/BlockManager/memory/memUsed_MB | 
| DAGScheduler:job.activeJobs | spark/driver/DAGScheduler/job/activeJobs | 
| DAGScheduler:job.allJobs | spark/driver/DAGScheduler/job/allJobs | 
| DAGScheduler:messageProcessingTime | spark/driver/DAGScheduler/messageProcessingTime | 
| DAGScheduler:stage.failedStages | spark/driver/DAGScheduler/stage/failedStages | 
| DAGScheduler:stage.runningStages | spark/driver/DAGScheduler/stage/runningStages | 
| DAGScheduler:stage.waitingStages | spark/driver/DAGScheduler/stage/waitingStages | 
Spark 执行器指标
| 指标 | Metrics Explorer 名称 | 
|---|---|
| ExecutorAllocationManager:executors.numberExecutorsDecommissionUnfinished | spark/driver/ExecutorAllocationManager/executors/numberExecutorsDecommissionUnfinished | 
| ExecutorAllocationManager:executors.numberExecutorsExitedUnexpectedly | spark/driver/ExecutorAllocationManager/executors/numberExecutorsExitedUnexpectedly | 
| ExecutorAllocationManager:executors.numberExecutorsGracefullyDecommissioned | spark/driver/ExecutorAllocationManager/executors/numberExecutorsGracefullyDecommissioned | 
| ExecutorAllocationManager:executors.numberExecutorsKilledByDriver | spark/driver/ExecutorAllocationManager/executors/numberExecutorsKilledByDriver | 
| LiveListenerBus:queue.executorManagement.listenerProcessingTime | spark/driver/LiveListenerBus/queue/executorManagement/listenerProcessingTime | 
| executor:bytesRead | spark/executor/bytesRead | 
| executor:bytesWritten | spark/executor/bytesWritten | 
| executor:cpuTime | spark/executor/cpuTime | 
| executor:diskBytesSpilled | spark/executor/diskBytesSpilled | 
| executor:jvmGCTime | spark/executor/jvmGCTime | 
| executor:memoryBytesSpilled | spark/executor/memoryBytesSpilled | 
| executor:recordsRead | spark/executor/recordsRead | 
| executor:recordsWritten | spark/executor/recordsWritten | 
| executor:runTime | spark/executor/runTime | 
| executor:shuffleFetchWaitTime | spark/executor/shuffleFetchWaitTime | 
| executor:shuffleRecordsRead | spark/executor/shuffleRecordsRead | 
| executor:shuffleRecordsWritten | spark/executor/shuffleRecordsWritten | 
| executor:shuffleRemoteBytesReadToDisk | spark/executor/shuffleRemoteBytesReadToDisk | 
| executor:shuffleWriteTime | spark/executor/shuffleWriteTime | 
| executor:succeededTasks | spark/executor/succeededTasks | 
| ExecutorMetrics:MajorGCTime | spark/executor/ExecutorMetrics/MajorGCTime | 
| ExecutorMetrics:MinorGCTime | spark/executor/ExecutorMetrics/MinorGCTime | 
系统指标
| 指标 | 指标资源管理器名称 | 
|---|---|
| 代理:正常运行时间 | 代理/正常运行时间 | 
| cpu:utilization | CPU 利用率 | 
| disk:bytes_used | disk/bytes_used | 
| disk:percent_used | disk/percent_used | 
| memory:bytes_used | memory/bytes_used | 
| memory:percent_used | memory/percent_used | 
| network:tcp_connections | network/tcp_connections | 
查看 Spark 指标
如需查看批处理指标,请在Google Cloud 控制台中的 Dataproc 批处理页面上点击某个批处理 ID,以打开批处理详情页面,该页面会在监控标签页下显示批处理工作负载的指标图表。
 
    如需详细了解如何查看收集的指标,请参阅 Dataproc Cloud Monitoring。