Cloud Monitoring offre visibilità su prestazioni, uptime e integrità complessiva delle applicazioni basate su cloud. Google Cloud Observability raccoglie e importa metriche, eventi e metadati dai cluster Dataproc, incluse metriche HDFS, YARN, job e operazioni per cluster, al fine di generare insight tramite dashboard e grafici (consulta le metriche di Cloud Monitoring Dataproc).
Consulta i prezzi di Cloud Monitoring per comprendere i tuoi costi.
Per informazioni sulla conservazione dei dati delle metriche, consulta Monitoraggio di quote e limiti.
Raccolta di metriche delle risorse Dataproc
Cloud Monitoring raccoglie metriche relative alle seguenti risorse Dataproc:
- Cluster Cloud Dataproc
- Job Cloud Dataproc
- Batch Cloud Dataproc
- Sessione Cloud Dataproc
Le metriche delle risorse Dataproc sono raccolte nel seguente formato:
dataproc.googleapis.com/RESOURCE/METRIC
e includono la raccolta di diverse metriche OSS.
Visualizza metriche delle risorse Dataproc
Puoi selezionare e visualizzare le metriche delle risorse Dataproc in Metrics Explorer digitando "dataproc" nella casella Filter by resource or metric name
e selezionando una risorsa "Cloud Dataproc".
Raccolta di metriche personalizzate
Quando crei un cluster Dataproc, puoi abilitare la raccolta di metriche da una o più origini di metrica personalizzata. Da ogni origine delle metriche abilitata viene raccolto un set standard di metriche, a meno che tu non specifichi le metriche da raccogliere da un'origine delle metriche (le metriche specificate dall'utente sono chiamate "override delle metriche").
Le metriche OSS personalizzate vengono raccolte nel seguente formato:
custom.googleapis.com/OSS_COMPONENT/METRIC
Esempi di metriche OSS personalizzate:
custom.googleapis.com/spark/driver/DAGScheduler/job/allJobs custom.googleapis.com/hiveserver2/memory/MaxNonHeapMemory
Abilita la raccolta di metrica personalizzata
Puoi utilizzare gcloud CLI o l'API Dataproc per abilitare la raccolta di metriche personalizzate da una o più origini delle metriche.
Interfaccia a riga di comando gcloud
Raccolta di metriche personalizzate
Utilizza il flag
gcloud dataproc clusters create --metric-sources
per abilitare la raccolta di metriche personalizzate da una o più origini delle metriche.
gcloud dataproc clusters create cluster-name \ --metric-sources=METRIC_SOURCE(s) \ ... other flags
Note
--metric-sources
: obbligatoria per attivare la raccolta di metrica personalizzata. Specifica una o più delle seguenti origini delle metriche:spark
,flink
,hdfs
,yarn
,spark-history-server
,hiveserver2
,hivemetastore
emonitoring-agent-defaults
. Il nome dell'origine della metrica non fa distinzione tra maiuscole e minuscole, ad esempio è accettabile "yarn" (filato) o "YARN".- monitoring-agent-defaults non è disponibile nei cluster in versione immagine 2.2, a meno che non sia installato Ops Agent.
Esegui l'override della raccolta delle metriche
Se vuoi, aggiungi il flag --metric-overrides
o --metric-overrides-file
per abilitare la raccolta di una o più metriche personalizzate da una o più origini delle metriche.
-
Qualsiasi metrica personalizzata e tutte le metriche Spark possono essere elencate per la raccolta come override della metrica. I valori delle metriche di override sono sensibili alle maiuscole e devono essere forniti, se opportuno, nel formato CamelCase.
Esempi:
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.committed
hiveserver2:JVM:Memory:NonHeapMemoryUsage.used
yarn:ResourceManager:JvmMetrics:MemHeapMaxM
-
Da una determinata origine metriche verranno raccolte solo le metriche sostituite specificate. Ad esempio, se una o più metriche
spark:executor
sono elencate come sostituzioni delle metriche, le altre metricheSPARK
non verranno raccolte. La raccolta di metriche personalizzate da altre origini delle metriche non è interessata. Ad esempio, se sono abilitate entrambe le origini delle metricheSPARK
eYARN
e vengono forniti override solo per le metriche Spark, verrà raccolto l'insieme standard di metriche YARN abilitate. -
L'origine della sostituzione della metrica specificata deve essere attivata. Ad
esempio, se vengono fornite una o più metriche
spark:driver
come sostituzioni delle metriche, l'origine della metricaspark
deve essere abilitata (--metric-sources=spark
).
Elenco delle metriche di override
gcloud dataproc clusters create cluster-name \ --metric-sources=METRIC_SOURCE(s) \ --metric-overrides=LIST_OF_METRIC_OVERRIDES \ ... other flags
Note
--metric-sources
: obbligatoria per attivare la raccolta di metrica personalizzata. Specifica una o più delle seguenti origini delle metriche:spark
,flink
,hdfs
,yarn
,spark-history-server
,hiveserver2
,hivemetastore
emonitoring-agent-defaults
. Il nome dell'origine della metrica non fa distinzione tra maiuscole e minuscole, ad esempio è accettabile "yarn" (filato) o "YARN".--metric-overrides
: fornisci un elenco di metriche nel seguente formato:METRIC_SOURCE:INSTANCE:GROUP:METRIC
Esempio:
--metric-overrides=sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.committed
Questo flag è un'alternativa e non può essere utilizzato con il flag
--metric-overrides-file
.
File delle metriche di override
gcloud dataproc clusters create cluster-name \ --metric-sources=METRIC-SOURCE(s) \ --metric-overrides-file=METRIC_OVERRIDES_FILENAME \ ... other flags
Note
-
--metric-sources
: obbligatoria per attivare la raccolta di metrica personalizzata. Specifica una o più delle seguenti origini delle metriche:spark
,flink
,hdfs
,yarn
,spark-history-server
,hiveserver2
,hivemetastore
emonitoring-agent-defaults
. Il nome dell'origine della metrica non fa distinzione tra maiuscole e minuscole, ad esempio è accettabile "yarn" (filato) o "YARN". -
--metric-overrides-file
: specifica un file locale o Cloud Storage (gs://bucket/filename
) contenente una o più metriche nel seguente formato:METRIC_SOURCE:INSTANCE:GROUP:METRIC
Utilizza il formato camelcase appropriato.Esempi:
--metric-overrides-file=gs://my-bucket/my-filename.txt
--metric-overrides-file=./local-directory/local-filename.txt
Questo flag è un'alternativa e non può essere utilizzato con il flag
--metric-overrides
.
API REST
Utilizza DataprocMetricConfig come parte di una richiesta clusters.create per abilitare la raccolta di metriche personalizzate. Nota: monitoring-agent-defaults non è disponibile nei cluster in versione immagine 2.2, a meno che non sia installato Ops Agent.
Visualizza metriche personalizzate
Puoi selezionare e visualizzare le metriche delle risorse Dataproc in
Metrics Explorer, selezionando la VM Instance
risorsa e poi Custom metrics
.
Metriche personalizzate
Puoi abilitare Dataproc per raccogliere le metriche personalizzate elencate nelle tabelle seguenti.
La colonna Metriche abilitate è contrassegnata con "y" se Dataproc raccoglie la metrica quando abiliti l'origine della metrica associata.
Qualsiasi delle metriche elencate per un'origine metrica e tutte le metriche Spark possono essere abilitate per la raccolta se esegui l'override della raccolta del set standard di metriche abilitate per l'origine della metrica (consulta la sezione Abilitare la metrica personalizzata personalizzate).
Dataproc utilizza l'agente di monitoraggio per raccogliere le metriche. L'attivazione di qualsiasi origine delle metriche consente la raccolta di metriche degli agenti. Queste metriche non vengono fatturate agli utenti, ma Dataproc le utilizza per diagnosticare i problemi di raccolta delle metriche.
Metriche Hadoop
Metriche HDFS
Metrica | Nome Metrics Explorer | Metriche attivate |
---|---|---|
hdfs:NameNode:FSNamesystem:CapacityTotalGB | dfs/FSNamesystem/CapacityTotalGB | y |
hdfs:NameNode:FSNamesystem:CapacityUsedGB | dfs/FSNamesystem/CapacityUsedGB | y |
hdfs:NameNode:FSNamesystem:CapacityRemainingGB | dfs/FSNamesystem/CapacityRemainingGB | y |
hdfs:NameNode:FSNamesystem:FilesTotal | dfs/FSNamesystem/FilesTotal | y |
hdfs:NameNode:FSNamesystem:MissingBlocks | dfs/FSNamesystem/MissingBlocks | n |
hdfs:NameNode:FSNamesystem:ExpiredHeartbeats | dfs/FSNamesystem/ExpiredHeartbeats | n |
hdfs:NameNode:FSNamesystem:TransactionsSinceLastCheckpoint | dfs/FSNamesystem/TransactionsSinceLastCheckpoint | n |
hdfs:NameNode:FSNamesystem:TransactionsSinceLastLogRoll | dfs/FSNamesystem/TransactionsSinceLastLogRoll | n |
hdfs:NameNode:FSNamesystem:LastWrittenTransactionId | dfs/FSNamesystem/LastWrittenTransactionId | n |
hdfs:NameNode:FSNamesystem:CapacityTotal | dfs/FSNamesystem/CapacityTotal | n |
hdfs:NameNode:FSNamesystem:CapacityUsed | dfs/FSNamesystem/CapacityUsed | n |
hdfs:NameNode:FSNamesystem:CapacityRemaining | dfs/FSNamesystem/CapacityRemaining | n |
hdfs:NameNode:FSNamesystem:CapacityUsedNonDFS | dfs/FSNamesystem/CapacityUsedNonDFS | n |
hdfs:NameNode:FSNamesystem:TotalLoad | dfs/FSNamesystem/TotalLoad | n |
hdfs:NameNode:FSNamesystem:SnapshottableDirectories | dfs/FSNamesystem/SnapshottableDirectories | n |
hdfs:NameNode:FSNamesystem:Snapshots | dfs/FSNamesystem/Snapshots | n |
hdfs:NameNode:FSNamesystem:BlocksTotal | dfs/FSNamesystem/BlocksTotal | n |
hdfs:NameNode:FSNamesystem:PendingReplicationBlocks | dfs/FSNamesystem/PendingReplicationBlocks | n |
hdfs:NameNode:FSNamesystem:UnderReplicatedBlocks | dfs/FSNamesystem/UnderReplicatedBlocks | n |
hdfs:NameNode:FSNamesystem:CorruptBlocks | dfs/FSNamesystem/CorruptBlocks | n |
hdfs:NameNode:FSNamesystem:ScheduledReplicationBlocks | dfs/FSNamesystem/ScheduledReplicationBlocks | n |
hdfs:NameNode:FSNamesystem:PendingDeletionBlocks | dfs/FSNamesystem/PendingDeletionBlocks | n |
hdfs:NameNode:FSNamesystem:ExcessBlocks | dfs/FSNamesystem/ExcessBlocks | n |
hdfs:NameNode:FSNamesystem:PostponedMisreplicatedBlocks | dfs/FSNamesystem/PostponedMisreplicatedBlocks | n |
hdfs:NameNode:FSNamesystem:PendingDataNodeMessageCourt | dfs/FSNamesystem/PendingDataNodeMessageCourt | n |
hdfs:NameNode:FSNamesystem:MillisSinceLastLoadedEdits | dfs/FSNamesystem/MillisSinceLastLoadedEdits | n |
hdfs:NameNode:FSNamesystem:BlockCapacity | dfs/FSNamesystem/BlockCapacity | n |
hdfs:NameNode:FSNamesystem:StaleDataNodes | dfs/FSNamesystem/StaleDataNodes | n |
hdfs:NameNode:FSNamesystem:TotalFiles | dfs/FSNamesystem/TotalFiles | n |
hdfs:NameNode:JvmMetrics:MemHeapUsedM | dfs/jvm/MemHeapUsedM | n |
hdfs:NameNode:JvmMetrics:MemHeapCommittedM | dfs/jvm/MemHeapCommittedM | n |
hdfs:NameNode:JvmMetrics:MemHeapMaxM | dfs/jvm/MemHeapMaxM | n |
hdfs:NameNode:JvmMetrics:MemMaxM | dfs/jvm/MemMaxM | n |
Metriche YARN
Metrica | Nome Metrics Explorer | Metriche attivate |
---|---|---|
yarn:ResourceManager:ClusterMetrics:NumActiveNMs | yarn/ClusterMetrics/NumActiveNMs | y |
yarn:ResourceManager:ClusterMetrics:NumDecommissionedNMs | filato/ClusterMetriche/NumNM ritirati | n |
yarn:ResourceManager:ClusterMetrics:NumLostNMs | filato/ClusterMetrics/NumPersiNM | n |
yarn:ResourceManager:ClusterMetrics:NumUnhealthyNMs | yarn/ClusterMetrics/NumUnhealthyNMs | n |
yarn:ResourceManager:ClusterMetrics:NumRebootedNMs | yarn/ClusterMetrics/NumRebootedNMs | n |
yarn:ResourceManager:QueueMetrics:running_0 | yarn/QueueMetrics/running_0 | y |
yarn:ResourceManager:QueueMetrics:running_60 | filato/MetricaCoda/Coda_60 | y |
yarn:ResourceManager:QueueMetrics:running_300 | yarn/QueueMetrics/incorsa_300 | y |
yarn:ResourceManager:QueueMetrics:running_1440 | filato/MetricaCoda/Coda_1440 | y |
yarn:ResourceManager:QueueMetrics:AppsSubmitted | yarn/QueueMetrics/AppsSubmitted | y |
yarn:ResourceManager:QueueMetrics:AvailableMB | yarn/QueueMetrics/AvailableMB | y |
yarn:ResourceManager:QueueMetrics:PendingContainers | yarn/QueueMetrics/PendingContainers | y |
yarn:ResourceManager:QueueMetrics:AppsRunning | yarn/QueueMetrics/AppsRunning | n |
yarn:ResourceManager:QueueMetrics:AppsPending | yarn/QueueMetrics/AppsPending | n |
yarn:ResourceManager:QueueMetrics:AppsCompleted | yarn/QueueMetrics/AppsCompleted | n |
yarn:ResourceManager:QueueMetrics:AppsKilled | yarn/QueueMetrics/AppsKilled | n |
yarn:ResourceManager:QueueMetrics:AppsFailed | yarn/QueueMetrics/AppsFailed | n |
yarn:ResourceManager:QueueMetrics:AllocatedMB | yarn/QueueMetrics/AllocatedMB | n |
yarn:ResourceManager:QueueMetrics:AllocatedVCores | yarn/QueueMetrics/AllocatedVCores | n |
yarn:ResourceManager:QueueMetrics:AllocatedContainers | yarn/QueueMetrics/AllocatedContainers | n |
yarn:ResourceManager:QueueMetrics:AggregateContainersAllocated | yarn/QueueMetrics/AggregateContainersAllocated | n |
yarn:ResourceManager:QueueMetrics:AggregateContainersReleased | yarn/QueueMetrics/AggregateContainersReleased | n |
yarn:ResourceManager:QueueMetrics:AvailableVCores | yarn/QueueMetrics/AvailableVCores | n |
yarn:ResourceManager:QueueMetrics:PendingMB | yarn/QueueMetrics/PendingMB | n |
yarn:ResourceManager:QueueMetrics:PendingVCores | yarn/QueueMetrics/PendingVCores | n |
yarn:ResourceManager:QueueMetrics:ReservedMB | yarn/QueueMetrics/ReservedMB | n |
yarn:ResourceManager:QueueMetrics:ReservedVCores | yarn/QueueMetrics/ReservedVCores | n |
yarn:ResourceManager:QueueMetrics:ReservedContainers | yarn/QueueMetrics/ReservedContainers | n |
yarn:ResourceManager:QueueMetrics:ActiveUsers | yarn/QueueMetrics/ActiveUsers | n |
yarn:ResourceManager:QueueMetrics:ActiveApplications | yarn/QueueMetrics/ActiveApplications | n |
yarn:ResourceManager:QueueMetrics:FairShareMB | yarn/QueueMetrics/FairShareMB | n |
yarn:ResourceManager:QueueMetrics:FairShareVCores | yarn/QueueMetrics/FairShareVCores | n |
yarn:ResourceManager:QueueMetrics:MinShareMB | yarn/QueueMetrics/MinShareMB | n |
yarn:ResourceManager:QueueMetrics:MinShareVCores | yarn/QueueMetrics/MinShareVCores | n |
yarn:ResourceManager:QueueMetrics:MaxShareMB | yarn/QueueMetrics/MaxShareMB | n |
yarn:ResourceManager:QueueMetrics:MaxShareVCores | yarn/QueueMetrics/MaxShareVCores | n |
yarn:ResourceManager:JvmMetrics:MemHeapUsedM | yarn/jvm/MemHeapUsedM | n |
yarn:ResourceManager:JvmMetrics:MemHeapCommittedM | yarn/jvm/MemHeapCommittedM | n |
yarn:ResourceManager:JvmMetrics:MemHeapMaxM | yarn/jvm/MemHeapMaxM | n |
yarn:ResourceManager:JvmMetrics:MemMaxM | yarn/jvm/MemMaxM | n |
Metriche Spark
Metriche dei driver Spark
Metrica | Nome Metrics Explorer | Metriche attivate |
---|---|---|
spark:driver:BlockManager:disk.diskSpaceUsed_MB | spark/driver/BlockManager/disk/diskSpaceUsed_MB | y |
spark:driver:BlockManager:memory.maxMem_MB | spark/driver/BlockManager/memory/maxMem_MB | y |
spark:driver:BlockManager:memory.memUsed_MB | spark/driver/BlockManager/memory/memUsed_MB | y |
spark:driver:DAGScheduler:job.allJobs | spark/driver/DAGScheduler/job/allJobs | y |
spark:driver:DAGScheduler:stage.failedStages | spark/driver/DAGScheduler/stage/failedStages | y |
spark:driver:DAGScheduler:stage.waitingStages | spark/driver/DAGScheduler/stage/waitingStages | y |
Metriche esecutore Spark
Metrica | Nome Metrics Explorer | Metriche attivate |
---|---|---|
spark:executor:executor:bytesRead | spark/executor/bytesRead | y |
spark:executor:executor:bytesWritten | spark/executor/bytesWritten | y |
spark:executor:executor:cpuTime | spark/executor/cpuTime | y |
spark:executor:executor:diskBytesSpilled | spark/executor/diskBytesSpilled | y |
spark:executor:executor:recordsRead | spark/executor/recordsRead | y |
spark:executor:executor:recordsWritten | spark/executor/recordsWritten | y |
spark:executor:executor:runTime | spark/executor/runTime | y |
spark:executor:executor:shuffleRecordsRead | spark/executor/shuffleRecordsRead | y |
spark:executor:executor:shuffleRecordsWritten | spark/executor/shuffleRecordsWritten | y |
Metriche Flink
Metrica | Nome Metrics Explorer | Metriche attivate |
---|---|---|
flink:jobmanager:numRegisteredTaskManagers | flink/jobmanager/numRegisteredTaskManagers | n |
flink:jobmanager:numRunningJobs | flink/jobmanager/numRunningJobs | n |
flink:jobmanager:Status.JVM.ClassLoader.ClassesLoaded | flink/jobmanager/Status.JVM.ClassLoader.ClassesLoaded | n |
flink:jobmanager:Status.JVM.ClassLoader.ClassesUnloaded | flink/jobmanager/Status.JVM.ClassLoader.ClassesUnloaded | n |
flink:jobmanager:Status.JVM.CPU.Load | flink/jobmanager/Status.JVM.CPU.Load | n |
flink:jobmanager:Status.JVM.CPU.Time | flink/jobmanager/Status.JVM.CPU.Time | y |
flink:jobmanager:Status.JVM.GarbageCollector.PSMarkSweep.Count | flink/jobmanager/Status.JVM.GarbageCollector.PSMarkSweep.Count | n |
flink:jobmanager:Status.JVM.GarbageCollector.PSMarkSweep.Time | flink/jobmanager/Status.JVM.GarbageCollector.PSMarkSweep.Time | n |
flink:jobmanager:Status.JVM.GarbageCollector.PSScavenge.Count | flink/jobmanager/Status.JVM.GarbageCollector.PSScavenge.Count | n |
flink:jobmanager:Status.JVM.GarbageCollector.PSScavenge.Time | flink/jobmanager/Status.JVM.GarbageCollector.PSScavenge.Time | n |
flink:jobmanager:Status.JVM.Memory.Direct.Count | flink/jobmanager/Status.JVM.Memory.Direct.Count | y |
flink:jobmanager:Status.JVM.Memory.Direct.MemoryUsed | flink/jobmanager/Status.JVM.Memory.Direct.MemoryUsed | y |
flink:jobmanager:Status.JVM.Memory.Direct.TotalCapacity | flink/jobmanager/Status.JVM.Memory.Direct.TotalCapacity | y |
flink:jobmanager:Status.JVM.Memory.Heap.Comenabled | flink/jobmanager/Status.JVM.Memory.Heap.Comenabled | y |
flink:jobmanager:Status.JVM.Memory.Heap.Max | flink/jobmanager/Status.JVM.Memory.Heap.Max | y |
flink:jobmanager:Status.JVM.Memory.Heap.Used | flink/jobmanager/Status.JVM.Memory.Heap.Used | y |
flink:jobmanager:Status.JVM.Memory.Mapped.Count | flink/jobmanager/Status.JVM.Memory.Mapped.Count | y |
flink:jobmanager:Status.JVM.Memory.Mapped.MemoryUsed | flink/jobmanager/Status.JVM.Memory.Mapped.MemoryUsed | y |
flink:jobmanager:Status.JVM.Memory.Mapped.TotalCapacity | flink/jobmanager/Status.JVM.Memory.Mapped.TotalCapacity | y |
flink:jobmanager:Status.JVM.Memory.Metaspace.Committed | flink/jobmanager/Status.JVM.Memory.Metaspace.Committed | n |
flink:jobmanager:Status.JVM.Memory.Metaspace.Max | flink/jobmanager/Status.JVM.Memory.Metaspace.Max | n |
flink:jobmanager:Status.JVM.Memory.Metaspace.Used | flink/jobmanager/Status.JVM.Memory.Metaspace.Used | n |
flink:jobmanager:Status.JVM.Memory.NonHeap.Committed | flink/jobmanager/Status.JVM.Memory.NonHeap.Committed | n |
flink:jobmanager:Status.JVM.Memory.NonHeap.Max | flink/jobmanager/Status.JVM.Memory.NonHeap.Max | n |
flink:jobmanager:Status.JVM.Memory.NonHeap.Used | flink/jobmanager/Status.JVM.Memory.NonHeap.Used | n |
flink:jobmanager:Status.JVM.Threads.Count | flink/jobmanager/Status.JVM.Threads.Count | n |
flink:jobmanager:taskSlotsAvailable | flink/jobmanager/taskSlotsAvailable | y |
flink:jobmanager:taskSlotsTotal | flink/jobmanager/taskSlotsTotal | y |
flink:operator:numRecordsIn | flink/operatore/numRecordsIn | n |
flink:operator:numRecordsInPerSecond.count | flink/operator/numRecordsInPerSecond.count | n |
flink:operator:numRecordsInPerSecond.rate | flink/operator/numRecordsInPerSecond.rate | n |
flink:operator:numRecordsOut | flink/operatore/numRecordsOut | n |
flink:operator:numRecordsOutPerSecond.count | flink/operator/numRecordsOutPerSecond.count | n |
flink:operator:numRecordsOutPerSecond.rate | flink/operator/numRecordsOutPerSecond.rate | n |
flink:operator:numSplitsProcessed | flink/operatore/numSplitsProcessed | n |
flink:task:buffers.inPoolUsage | flink/task/buffers.inPoolUsage | n |
flink:task:buffers.input EsclusivoBuffersUsage | flink/task/buffers.inputEsclusibusUsage | n |
flink:task:buffers.inputFloatingBuffersUsage | flink/task/buffers.inputFloatingBuffersUsage | n |
flink:task:buffers.inputQueueLength | flink/task/buffers.inputQueueLength | n |
flink:task:buffers.outPoolUsage | flink/task/buffers.outPoolUsage | n |
flink:task:buffers.outputQueueLength | flink/task/buffers.outputQueueLength | n |
flink:task:idleTimeMsPerSecond.count | flink/task/idleTimeMsPerSecond.count | n |
flink:task:idleTimeMsPerSecond.rate | flink/task/idleTimeMsPerSecond.rate | n |
flink:task:numBuffersInLocal | flink/task/numBuffersInLocal | n |
flink:task:numBuffersInLocalPerSecond.count | flink/task/numBuffersInLocalPerSecond.count | n |
flink:task:numBuffersInLocalPerSecond.rate | flink/task/numBuffersInLocalPerSecond.rate | n |
flink:task:numBuffersInRemote | flink/task/numBuffersInRemote | n |
flink:task:numBuffersInRemotePerSecond.count | flink/task/numBuffersInRemotePerSecond.count | n |
flink:task:numBuffersInRemotePerSecond.rate | flink/task/numBuffersInRemotePerSecond.rate | n |
flink:task:numBuffersOut | flink/task/numBuffersOut | n |
flink:task:numBuffersOutPerSecond.count | flink/task/numBuffersOutPerSecond.count | n |
flink:task:numBuffersOutPerSecond.rate | flink/task/numBuffersOutPerSecond.rate | n |
flink:task:numBytesIn | flink/task/numBytesIn | n |
flink:task:numBytesInLocal | flink/task/numBytesInLocal | n |
flink:task:numBytesInLocalPerSecond.count | flink/task/numBytesInLocalPerSecond.count | n |
flink:task:numBytesInLocalPerSecond.rate | flink/task/numBytesInLocalPerSecond.rate | n |
flink:task:numBytesInPerSecond.count | flink/task/numBytesInPerSecond.count | n |
flink:task:numBytesInPerSecond.rate | flink/task/numBytesInPerSecond.rate | n |
flink:task:numBytesInRemote | flink/task/numBytesInRemote | n |
flink:task:numBytesInRemotePerSecond.count | flink/task/numBytesInRemotePerSecond.count | n |
flink:task:numBytesInRemotePerSecond.rate | flink/task/numBytesInRemotePerSecond.rate | n |
flink:task:numBytesOut | flink/task/numBytesOut | n |
flink:task:numBytesOutPerSecond.count | flink/task/numBytesOutPerSecond.count | n |
flink:task:numBytesOutPerSecond.rate | flink/task/numBytesOutPerSecond.rate | n |
flink:task:numRecordsIn | flink/task/numRecordsIn | n |
flink:task:numRecordsInPerSecond.count | flink/task/numRecordsInPerSecond.count | n |
flink:task:numRecordsInPerSecond.rate | flink/task/numRecordsInPerSecond.rate | n |
flink:task:numRecordsOut | flink/task/numRecordsOut | n |
flink:task:numRecordsOutPerSecond.count | flink/task/numRecordsOutPerSecond.count | n |
flink:task:numRecordsOutPerSecond.rate | flink/task/numRecordsOutPerSecond.rate | n |
flink:task:shuffle.Netty.Input.Buffers.inPoolUsage | flink/task/shuffle.Netty.Input.Buffers.inPoolUsage | n |
flink:task:shuffle.Netty.Input.Buffers.input EsclusivoBuffersUsage | flink/task/shuffle.Netty.Input.Buffers.input EsclusivoBuffersUsage | n |
flink:task:Shuffle.Netty.Input.Buffers.inputFloatingBuffersUsage | flink/task/shuffle.Netty.Input.Buffers.inputFloatingBuffersUsage | n |
flink:task:shuffle.Netty.Input.Buffers.inputQueueLength | flink/task/shuffle.Netty.Input.Buffers.inputQueueLength | n |
flink:task:shuffle.Netty.Input.numBuffersInLocal | flink/task/shuffle.Netty.Input.numBuffersInLocal | n |
flink:task:shuffle.Netty.Input.numBuffersInLocalPerSecond.count | flink/task/shuffle.Netty.Input.numBuffersInLocalPerSecond.count | n |
flink:task:shuffle.Netty.Input.numBuffersInLocalPerSecond.rate | flink/task/shuffle.Netty.Input.numBuffersInLocalPerSecond.rate | n |
flink:task:shuffle.Netty.Input.numBuffersInRemote | flink/task/shuffle.Netty.Input.numBuffersInRemote | n |
flink:task:shuffle.Netty.Input.numBuffersInRemotePerSecond.count | flink/task/shuffle.Netty.Input.numBuffersInRemotePerSecond.count | n |
flink:task:shuffle.Netty.Input.numBuffersInRemotePerSecond.rate | flink/task/shuffle.Netty.Input.numBuffersInRemotePerSecond.rate | n |
flink:task:shuffle.Netty.Input.numBytesInLocal | flink/task/shuffle.Netty.Input.numBytesInLocal | n |
flink:task:shuffle.Netty.Input.numBytesInLocalPerSecond.count | flink/task/Shuffle.Netty.Input.numBytesInLocalPerSecond.count | n |
flink:task:shuffle.Netty.Input.numBytesInLocalPerSecond.rate | flink/task/shuffle.Netty.Input.numBytesInLocalPerSecond.rate | n |
flink:task:shuffle.Netty.Input.numBytesInRemote | flink/task/shuffle.Netty.Input.numBytesInRemote | n |
flink:task:shuffle.Netty.Input.numBytesInRemotePerSecond.count | flink/task/shuffle.Netty.Input.numBytesInRemotePerSecond.count | n |
flink:task:shuffle.Netty.Input.numBytesInRemotePerSecond.rate | flink/task/shuffle.Netty.Input.numBytesInRemotePerSecond.rate | n |
flink:task:shuffle.Netty.Output.Buffers.outPoolUsage | flink/task/shuffle.Netty.Output.Buffers.outPoolUsage | n |
flink:task:shuffle.Netty.Output.Buffers.outputQueueLength | flink/task/shuffle.Netty.Output.Buffers.outputQueueLength | n |
flink:taskmanager:Status.flink.Memory.Managed.Total | flink/taskmanager/Status.flink.Memory.Managed.Total | n |
flink:taskmanager:Status.flink.Memory.Managed.Used | flink/taskmanager/Status.flink.Memory.Managed.Used | n |
flink:taskmanager:Status.JVM.ClassLoader.ClassesLoaded | flink/taskmanager/Status.JVM.ClassLoader.ClassesLoaded | n |
flink:taskmanager:Status.JVM.ClassLoader.ClassesUnloaded | flink/taskmanager/Status.JVM.ClassLoader.ClassesUnloaded | n |
flink:taskmanager:Status.JVM.CPU.Load | flink/taskmanager/Status.JVM.CPU.Load | n |
flink:taskmanager:Status.JVM.CPU.Time | flink/taskmanager/Status.JVM.CPU.Time | y |
flink:taskmanager:Status.JVM.GarbageCollector.PSMarkSweep.Count | flink/taskmanager/Status.JVM.GarbageCollector.PSMarkSweep.Count | n |
flink:taskmanager:Status.JVM.GarbageCollector.PSMarkSweep.Time | flink/taskmanager/Status.JVM.GarbageCollector.PSMarkSweep.Time | n |
flink:taskmanager:Status.JVM.GarbageCollector.PSScavenge.Count | flink/taskmanager/Status.JVM.GarbageCollector.PSScavenge.Count | n |
flink:taskmanager:Status.JVM.GarbageCollector.PSScavenge.Time | flink/taskmanager/Status.JVM.GarbageCollector.PSScavenge.Time | n |
flink:taskmanager:Status.JVM.Memory.Direct.Count | flink/taskmanager/Status.JVM.Memory.Direct.Count | y |
flink:taskmanager:Status.JVM.Memory.Direct.MemoryUsed | flink/taskmanager/Status.JVM.Memory.Direct.MemoryUsed | y |
flink:taskmanager:Status.JVM.Memory.Direct.TotalCapacity | flink/taskmanager/Status.JVM.Memory.Direct.TotalCapacity | y |
flink:taskmanager:Status.JVM.Memory.Heap.Com allowed | flink/taskmanager/Status.JVM.Memory.Heap.Comenabled | y |
flink:taskmanager:Status.JVM.Memory.Heap.Max | flink/taskmanager/Status.JVM.Memory.Heap.Max | y |
flink:taskmanager:Status.JVM.Memory.Heap.Used | flink/taskmanager/Status.JVM.Memory.Heap.Used | y |
flink:taskmanager:Status.JVM.Memory.Mapped.Count | flink/taskmanager/Status.JVM.Memory.Mapped.Count | y |
flink:taskmanager:Status.JVM.Memory.Mapped.MemoryUsed | flink/taskmanager/Status.JVM.Memory.Mapped.MemoryUsed | y |
flink:taskmanager:Status.JVM.Memory.Mapped.TotalCapacity | flink/taskmanager/Status.JVM.Memory.Mapped.TotalCapacity | y |
flink:taskmanager:Status.JVM.Memory.Metaspace.Committed | flink/taskmanager/Status.JVM.Memory.Metaspace.Committed | n |
flink:taskmanager:Status.JVM.Memory.Metaspace.Max | flink/taskmanager/Status.JVM.Memory.Metaspace.Max | n |
flink:taskmanager:Status.JVM.Memory.Metaspace.Used | flink/taskmanager/Status.JVM.Memory.Metaspace.Used | n |
flink:taskmanager:Status.JVM.Memory.NonHeap.Committed | flink/taskmanager/Status.JVM.Memory.NonHeap.Committed | n |
flink:taskmanager:Status.JVM.Memory.NonHeap.Max | flink/taskmanager/Status.JVM.Memory.NonHeap.Max | n |
flink:taskmanager:Status.JVM.Memory.NonHeap.Used | flink/taskmanager/Status.JVM.Memory.NonHeap.Used | n |
flink:taskmanager:Status.JVM.Threads.Count | flink/taskmanager/Status.JVM.Threads.Count | n |
flink:taskmanager:Status.Network.AvailableMemorySegments | flink/taskmanager/Status.Network.AvailableMemorySegments | n |
flink:taskmanager:Status.Network.TotalMemorySegments | flink/taskmanager/Status.Network.TotalMemorySegments | n |
flink:taskmanager:Status.shuffle.Netty.AvailableMemoria | flink/taskmanager/Status.Shuffle.Netty.AvailableMemory. | n |
flink:taskmanager:Status.shuffle.Netty.AvailableMemorySegments | flink/taskmanager/Status.shuffle.Netty.AvailableMemorySegments | n |
flink:taskmanager:Status.shuffle.Netty.TotalMemory. | flink/taskmanager/Status.Shuffle.Netty.TotalMemory. | n |
flink:taskmanager:Status.shuffle.Netty.TotalMemorySegments | flink/taskmanager/Status.Shuffle.Netty.TotalMemorySegments | n |
flink:taskmanager:Status.shuffle.Netty.UsedMemory | flink/taskmanager/Status.shuffle.Netty.UsedMemory | n |
flink:taskmanager:Status.Shuffle.Netty.UsedMemorySegments | flink/taskmanager/Status.Shuffle.Netty.UsedMemorySegments | n |
Metriche del server di cronologia Spark
Dataproc raccoglie le seguenti metriche di memoria JVM del servizio di cronologia Spark:
Metrica | Nome Metrics Explorer | Metriche attivate |
---|---|---|
sparkHistoryServer:JVM:Memory:HeapMemoryUsage.committed | sparkCronologiaServer/memory/ComAllowedHeapMemory | y |
sparkHistoryServer:JVM:Memory:HeapMemoryUsage.used | sparkCronologiaServer/memory/UsedHeapMemory | y |
sparkHistoryServer:JVM:Memory:HeapMemoryUsage.max | sparkCronologiaServer/memory/MaxHeapMemory | y |
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.committed | sparkCronologiaServer/memory/ComAllowedNonHeapMemory | y |
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.used | sparkCronologiaServer/memory/UsedNonHeapMemory | y |
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.max | sparkCronologiaServer/memory/MaxNonHeapMemory | y |
Metriche di HiveServer 2
Metrica | Nome Metrics Explorer | Metriche attivate |
---|---|---|
hiveserver2:JVM:Memory:HeapMemoryUsage.committed | hiveserver2/memoria/Memoria memoria impegnata | y |
hiveserver2:JVM:Memory:HeapMemoryUsage.used | hiveserver2/memoria/Memoria utilizzata | y |
hiveserver2:JVM:Memory:HeapMemoryUsage.max | hiveserver2/memory/MaxHeapMemory | y |
hiveserver2:JVM:Memory:NonHeapMemoryUsage.committed | hiveserver2/memoria/MemoriaNonHeapCom | y |
hiveserver2:JVM:Memory:NonHeapMemoryUsage.used | hiveserver2/memoria/utilizzataNonHeapMemory | y |
hiveserver2:JVM:Memory:NonHeapMemoryUsage.max | hiveserver2/memory/MaxNonHeapMemory | y |
Metriche Hive Metastore
Metrica | Nome Metrics Explorer | Metriche attivate |
---|---|---|
hivemetastore:API:GetDatabase:Mean | hivemetastore/get_database/mean | y |
hivemetastore:API:CreateDatabase:Mean | hivemetastore/create_database/mean | y |
hivemetastore:API:DropDatabase:Mean | hivemetastore/drop_database/mean | y |
hivemetastore:API:AlterDatabase:Mean | hivemetastore/alter_database/mean | y |
hivemetastore:API:GetAllDatabases:Mean | hivemetastore/get_all_databases/mean | y |
hivemetastore:API:CreateTable:Mean | hivemetastore/create_table/mean | y |
hivemetastore:API:DropTable:Mean | hivemetastore/drop_table/mean | y |
hivemetastore:API:AlterTable:Mean | hivemetastore/alter_table/mean | y |
hivemetastore:API:GetTable:Mean | hivemetastore/get_table/mean | y |
hivemetastore:API:GetAllTables:Mean | hivemetastore/get_all_tables/mean | y |
hivemetastore:API:AddPartitionsReq:Mean | hivemetastore/add_partitions_req/mean | y |
hivemetastore:API:DropPartition:Mean | hivemetastore/drop_partition/mean | y |
hivemetastore:API:AlterPartition:Mean | hivemetastore/alter_partition/mean | y |
hivemetastore:API:GetPartition:Mean | hivemetastore/get_partition/mean | y |
hivemetastore:API:GetPartitionNames:Mean | hivemetastore/get_partition_names/mean | y |
hivemetastore:API:GetPartitionsPs:Mean | hivemetastore/get_partitions_ps/mean | y |
hivemetastore:API:GetPartitionsPsWithAuth:Mean | hivemetastore/get_partitions_ps_with_auth/mean | y |
Misurazioni della metrica Hive Metastore
Misura statistica | Metrica di esempio | Nome metrica di esempio |
---|---|---|
Max | hivemetastore:API:GetDatabase:Max | hivemetastore/get_database/max |
Min | hivemetastore:API:GetDatabase:Min | hivemetastore/get_database/min |
Media | hivemetastore:API:GetDatabase:Mean | hivemetastore/get_database/mean |
Conteggio | hivemetastore:API:GetDatabase:Count | hivemetastore/get_database/count |
50° percentile | hivemetastore:API:GetDatabase:50thPercentile | hivemetastore/get_database/median |
75° percentile | hivemetastore:API:GetDatabase:75thPercentile | hivemetastore/get_database/75°_percentile |
95° percentile | hivemetastore:API:GetDatabase:95thPercentile | hivemetastore/get_database/95°_percentile |
98° percentile | hivemetastore:API:GetDatabase:98thPercentile | hivemetastore/get_database/98°_percentile |
99° percentile | hivemetastore:API:GetDatabase:99thPercentile | hivemetastore/get_database/99°_percentile |
999° percentile | hivemetastore:API:GetDatabase:999thPercentile | hivemetastore/get_database/999°_percentile |
StdDev | hivemetastore:API:GetDatabase:StdDev | hivemetastore/get_database/stddev |
FifteenMinuteRate | hivemetastore:API:GetDatabase:FifteenMinuteRate | hivemetastore/get_database/15min_rate |
FiveMinuteRate | hivemetastore:API:GetDatabase:FiveMinuteRate | hivemetastore/get_database/intervallo_di_5 min |
OneMinuteRate | hivemetastore:API:GetDatabase:OneMinuteRate | hivemetastore/get_database/1min_rate |
MeanRate | hivemetastore:API:GetDatabase:MeanRate | hivemetastore/get_database/mean_rate |
Metriche dell'agente Dataproc Monitoring
Dataproc raccoglie le seguenti metriche dell'agente di monitoraggio Dataproc quando imposti --metric-sources=monitoring-agent-defaults.
Queste metriche vengono pubblicate con il prefisso agent.googleapis.com
.
CPU
agent.googleapis.com/cpu/load_15m
agent.googleapis.com/cpu/load_1m
agent.googleapis.com/cpu/load_5m
agent.googleapis.com/cpu/usage_time*
agent.googleapis.com/cpu/utilization*
Disco
agent.googleapis.com/disk/bytes_used
agent.googleapis.com/disk/io_time
agent.googleapis.com/disk/merged_operations
agent.googleapis.com/disk/operation_count
agent.googleapis.com/disk/operation_time
agent.googleapis.com/disk/pending_operations/readd_counts
agent.kb/usat_counts_it
agent.googleapis_percents/used_operations
Scambia
agent.googleapis.com/swap/bytes_used
agent.googleapis.com/swap/io
agent.googleapis.com/swap/percent_used
Memoria
agent.googleapis.com/memory/bytes_used
agent.googleapis.com/memory/percent_used
Processes: (segue un criterio per le quote leggermente diverso per alcuni attributi)
agent.googleapis.com/processes/count_by_state
agent.googleapis.com/processes/cpu_time
agent.googleapis.com/processes/disk/read_bytes_count
agent.googleapis.com/processes/disk/write_bytes_count
agent.processogoogleapis.com/processes_usages/processes_googleapis.com/processes_usage.googleapis.pro
Interfaccia
agent.googleapis.com/interface/errors
agent.googleapis.com/interface/packets
agent.googleapis.com/interface/traffic
Rete
agent.googleapis.com/network/tcp_connections
Crea una dashboard di Monitoring
Puoi creare una dashboard di Monitoring che visualizza grafici delle metriche Dataproc selezionate.
Seleziona + CREA DASHBOARD dalla pagina Panoramica delle dashboard di Monitoring. Specifica un nome per la dashboard, quindi fai clic su Aggiungi grafico nel menu in alto a destra per aprire la finestra Aggiungi grafico. Seleziona "Cluster Cloud Dataproc" come tipo di risorsa. Seleziona una o più metriche e proprietà di metriche e grafici. Quindi, Salva il grafico.
Puoi aggiungere altri grafici alla tua dashboard. Dopo aver salvato la dashboard, il titolo della dashboard viene visualizzato nella pagina Panoramica delle dashboard di Monitoring. I grafici della dashboard possono essere visualizzati, aggiornati ed eliminati dalla pagina di visualizzazione della dashboard.
Passaggio successivo
- Consulta la documentazione di Cloud Monitoring
- Scopri come creare avvisi per le metriche Dataproc