Professional Cloud DevOps Engineer

Certification exam guide

Professional Cloud DevOps Engineers implement processes throughout the systems development lifecycle using Google-recommended methodologies and tools. They build and deploy software and infrastructure delivery pipelines, optimize and maintain production systems and services, and balance service reliability with delivery speed.

Section 1: Bootstrapping a Google Cloud organization for DevOps (~17% of the exam)

1.1 Designing the overall resource hierarchy for an organization. Considerations include:

● Projects and folders

● Shared networking

● Identity and Access Management (IAM) roles and organization-level policies

● Creating and managing service accounts

1.2 Managing infrastructure as code. Considerations include:

● Infrastructure as code tooling (e.g., Cloud Foundation Toolkit, Config Connector, Terraform, Helm)

● Making infrastructure changes using Google-recommended practices and infrastructure as code blueprints

● Immutable architecture

1.3 Designing a CI/CD architecture stack in Google Cloud, hybrid, and multi-cloud environments. Considerations include:

● CI with Cloud Build

● CD with Google Cloud Deploy

● Widely used third-party tooling (e.g., Jenkins, Git, ArgoCD, Packer)

● Security of CI/CD tooling

1.4 Managing multiple environments (e.g., staging, production). Considerations include:

● Determining the number of environments and their purpose

● Creating environments dynamically for each feature branch with Google Kubernetes Engine (GKE) and Terraform

● Config Management

Section 2: Building and implementing CI/CD pipelines for a service (~23% of the exam)

2.1 Designing and managing CI/CD pipelines. Considerations include:

● Artifact management with Artifact Registry

● Deployment to hybrid and multi-cloud environments (e.g., Anthos, GKE)

● CI/CD pipeline triggers

● Testing a new application version in the pipeline

● Configuring deployment processes (e.g., approval flows)

● CI/CD of serverless applications

2.2 Implementing CI/CD pipelines. Considerations include:

● Auditing and tracking deployments (e.g., Artifact Registry, Cloud Build, Google Cloud Deploy, Cloud Audit Logs)

● Deployment strategies (e.g., canary, blue/green, rolling, traffic splitting)

● Rollback strategies

● Troubleshooting deployment issues

2.3 Managing CI/CD configuration and secrets. Considerations include:

● Secure storage methods and key rotation services (e.g., Cloud Key Management Service, Secret Manager)

● Secret management

● Build versus runtime secret injection

2.4 Securing the CI/CD deployment pipeline. Considerations include:

● Vulnerability analysis with Artifact Registry

● Binary Authorization

● IAM policies per environment

Section 3: Applying site reliability engineering practices to a service (~23% of the exam)

3.1 Balancing change, velocity, and reliability of the service. Considerations include:

● Discovering SLIs (e.g., availability, latency)

● Defining SLOs and understanding SLAs

● Error budgets

● Toil automation

● Opportunity cost of risk and reliability (e.g., number of “nines”)

3.2 Managing service lifecycle. Considerations include:

● Service management (e.g., introduction of a new service by using a pre-service onboarding checklist, launch plan, or deployment plan, deployment, maintenance, and retirement)

● Capacity planning (e.g., quotas and limits management)

● Autoscaling using managed instance groups, Cloud Run, Cloud Functions, or GKE

● Implementing feedback loops to improve a service

3.3 Ensuring healthy communication and collaboration for operations. Considerations include:

● Preventing burnout (e.g., setting up automation processes to prevent burnout)

● Fostering a culture of learning and blamelessness

● Establishing joint ownership of services to eliminate team silos

3.4 Mitigating incident impact on users. Considerations include:

● Communicating during an incident

● Draining/redirecting traffic

● Adding capacity

3.5 Conducting a postmortem. Considerations include:

● Documenting root causes

● Creating and prioritizing action items

● Communicating the postmortem to stakeholders

Section 4: Implementing service monitoring strategies (~21% of the exam)

4.1 Managing logs. Considerations include:

● Collecting structured and unstructured logs from Compute Engine, GKE, and serverless platforms using Cloud Logging

● Configuring the Cloud Logging agent

● Collecting logs from outside Google Cloud

● Sending application logs directly to the Cloud Logging API

● Log levels (e.g., info, error, debug, fatal)

● Optimizing logs (e.g., multiline logging, exceptions, size, cost)

4.2 Managing metrics with Cloud Monitoring. Considerations include:

● Collecting and analyzing application and platform metrics

● Collecting networking and service mesh metrics

● Using Metrics Explorer for ad hoc metric analysis

● Creating custom metrics from logs

4.3 Managing dashboards and alerts in Cloud Monitoring. Considerations include:

● Creating a monitoring dashboard

● Filtering and sharing dashboards

● Configuring alerting

● Defining alerting policies based on SLOs and SLIs

● Automating alerting policy definition using Terraform

● Using Google Cloud Managed Service for Prometheus to collect metrics and set up monitoring and alerting

4.4 Managing Cloud Logging platform. Considerations include:

● Enabling data access logs (e.g., Cloud Audit Logs)

● Enabling VPC Flow Logs

● Viewing logs in the Google Cloud console

● Using basic versus advanced log filters

● Logs exclusion versus logs export

● Project-level versus organization-level export

● Managing and viewing log exports

● Sending logs to an external logging platform

● Filtering and redacting sensitive data (e.g., personally identifiable information [PII], protected health information [PHI])

4.5 Implementing logging and monitoring access controls. Considerations include:

● Restricting access to audit logs and VPC Flow Logs with Cloud Logging

● Restricting export configuration with Cloud Logging

● Allowing metric and log writing with Cloud Monitoring

Section 5: Optimizing the service performance (~16% of the exam)

5.1 Identifying service performance issues. Considerations include:

● Using Google Cloud’s operations suite to identify cloud resource utilization

● Interpreting service mesh telemetry

● Troubleshooting issues with compute resources

● Troubleshooting deploy time and runtime issues with applications

● Troubleshooting network issues (e.g., VPC Flow Logs, firewall logs, latency, network details)

5.2 Implementing debugging tools in Google Cloud. Considerations include:

● Application instrumentation

● Cloud Logging

● Cloud Trace

● Error Reporting

● Cloud Profiler

● Cloud Monitoring

5.3 Optimizing resource utilization and costs. Considerations include:

● Preemptible/Spot virtual machines (VMs)

● Committed-use discounts (e.g., flexible, resource-based)

● Sustained-use discounts

● Network tiers

● Sizing recommendations

Take the next step

Tell us what you’re solving for. A Google Cloud expert will help you find the best solution.

Contact sales

Take the next step

Tell us what you’re solving for. A Google Cloud expert will help you find the best solution.

Contact sales

Work with a trusted partner
Find a partner
Start using Google Cloud
Try it free
Continue browsing
See all products

Work with a trusted partner
Find a partner
Start using Google Cloud
Go to console
Continue browsing
See all products