本頁面由 Cloud Translation API 翻譯而成。

在 GKE 上部署高可用性 PostgreSQL 資料庫

PostgreSQL 是開放原始碼的物件關聯式資料庫，以可靠性和資料完整性著稱。它符合 ACID 標準，並支援外鍵、聯結、檢視區塊、觸發程序和預存程序。

本文件適用於有興趣在 Google Kubernetes Engine (GKE) 上部署高可用性 PostgreSQL 拓撲的資料庫管理員、雲端架構師和營運專員。

建立叢集基礎架構

在本節中，您將執行 Terraform 指令碼，建立自訂虛擬私有雲 (VPC)、Artifact Registry 存放區 (用於儲存 PostgreSQL 映像檔)，以及兩個區域性 GKE 叢集。一個叢集會部署在 us-central1，另一個備份叢集則會部署在 us-west1。

如要建立叢集，請按照下列步驟操作：

Autopilot

在 Cloud Shell 中執行下列指令：

terraform -chdir=terraform/gke-autopilot init
terraform -chdir=terraform/gke-autopilot apply -var project_id=$PROJECT_ID

系統顯示提示訊息時，請輸入 yes。

瞭解 Terraform 設定

Terraform 設定檔會建立下列資源，以部署基礎架構：

建立 Artifact Registry 存放區，用於儲存 Docker 映像檔。

resource "google_artifact_registry_repository" "main" {
  location      = "us"
  repository_id = "main"
  format        = "DOCKER"
  project       = var.project_id
}

為 VM 的網路介面建立虛擬私有雲網路和子網路。

module "gcp-network" {
  source  = "terraform-google-modules/network/google"
  version = "< 8.0.0"

  project_id   = var.project_id
  network_name = "vpc-gke-postgresql"

  subnets = [
    {
      subnet_name           = "snet-gke-postgresql-us-central1"
      subnet_ip             = "10.0.0.0/17"
      subnet_region         = "us-central1"
      subnet_private_access = true
    },
    {
      subnet_name           = "snet-gke-postgresql-us-west1"
      subnet_ip             = "10.0.128.0/17"
      subnet_region         = "us-west1"
      subnet_private_access = true
    },
  ]

  secondary_ranges = {
    ("snet-gke-postgresql-us-central1") = [
      {
        range_name    = "ip-range-pods-db1"
        ip_cidr_range = "192.168.0.0/18"
      },
      {
        range_name    = "ip-range-svc-db1"
        ip_cidr_range = "192.168.64.0/18"
      },
    ],
    ("snet-gke-postgresql-us-west1") = [
      {
        range_name    = "ip-range-pods-db2"
        ip_cidr_range = "192.168.128.0/18"
      },
      {
        range_name    = "ip-range-svc-db2"
        ip_cidr_range = "192.168.192.0/18"
      },
    ]
  }
}

output "network_name" {
  value = module.gcp-network.network_name
}

output "primary_subnet_name" {
  value = module.gcp-network.subnets_names[0]
}

output "secondary_subnet_name" {
  value = module.gcp-network.subnets_names[1]
}

建立主要 GKE 叢集。

Terraform 會在 us-central1 區域中建立私人叢集，並啟用 GKE 備份功能以進行災難復原，以及啟用 Managed Service for Prometheus 以監控叢集。

只有搭載 GKE 1.25 以上版本的 Autopilot 叢集，才支援 Managed Service for Prometheus。

module "gke-db1-autopilot" {
  source                          = "../modules/beta-autopilot-private-cluster"
  project_id                      = var.project_id
  name                            = "cluster-db1"
  kubernetes_version              = "1.25" # Will be ignored if use "REGULAR" release_channel
  region                          = "us-central1"
  regional                        = true
  zones                           = ["us-central1-a", "us-central1-b", "us-central1-c"]
  network                         = module.network.network_name
  subnetwork                      = module.network.primary_subnet_name
  ip_range_pods                   = "ip-range-pods-db1"
  ip_range_services               = "ip-range-svc-db1"
  horizontal_pod_autoscaling      = true
  release_channel                 = "RAPID" # Default version is 1.22 in REGULAR. GMP on Autopilot requires V1.25 via var.kubernetes_version
  enable_vertical_pod_autoscaling = true
  enable_private_endpoint         = false
  enable_private_nodes            = true
  master_ipv4_cidr_block          = "172.16.0.0/28"
  create_service_account          = false
}

在 us-west1 區域建立備份叢集，用於災難復原。

module "gke-db2-autopilot" {
  source                          = "../modules/beta-autopilot-private-cluster"
  project_id                      = var.project_id
  name                            = "cluster-db2"
  kubernetes_version              = "1.25" # Will be ignored if use "REGULAR" release_channel
  region                          = "us-west1"
  regional                        = true
  zones                           = ["us-west1-a", "us-west1-b", "us-west1-c"]
  network                         = module.network.network_name
  subnetwork                      = module.network.secondary_subnet_name
  ip_range_pods                   = "ip-range-pods-db2"
  ip_range_services               = "ip-range-svc-db2"
  horizontal_pod_autoscaling      = true
  release_channel                 = "RAPID" # Default version is 1.22 in REGULAR. GMP on Autopilot requires V1.25 via var.kubernetes_version
  enable_vertical_pod_autoscaling = true
  enable_private_endpoint         = false
  enable_private_nodes            = true
  master_ipv4_cidr_block          = "172.16.0.16/28"
  create_service_account          = false
}

標準

在 Cloud Shell 中執行下列指令：

terraform -chdir=terraform/gke-standard init
terraform -chdir=terraform/gke-standard apply -var project_id=$PROJECT_ID

系統顯示提示訊息時，請輸入 yes。

瞭解 Terraform 設定

Terraform 設定檔會建立下列資源，以部署基礎架構：

建立 Artifact Registry 存放區，用於儲存 Docker 映像檔。

resource "google_artifact_registry_repository" "main" {
  location      = "us"
  repository_id = "main"
  format        = "DOCKER"
  project       = var.project_id
}
resource "google_artifact_registry_repository_iam_binding" "binding" {
  provider   = google-beta
  project    = google_artifact_registry_repository.main.project
  location   = google_artifact_registry_repository.main.location
  repository = google_artifact_registry_repository.main.name
  role       = "roles/artifactregistry.reader"
  members = [
    "serviceAccount:${module.gke-db1.service_account}",
  ]
}

為 VM 的網路介面建立虛擬私有雲網路和子網路。

module "gcp-network" {
  source  = "terraform-google-modules/network/google"
  version = "< 8.0.0"

  project_id   = var.project_id
  network_name = "vpc-gke-postgresql"

  subnets = [
    {
      subnet_name           = "snet-gke-postgresql-us-central1"
      subnet_ip             = "10.0.0.0/17"
      subnet_region         = "us-central1"
      subnet_private_access = true
    },
    {
      subnet_name           = "snet-gke-postgresql-us-west1"
      subnet_ip             = "10.0.128.0/17"
      subnet_region         = "us-west1"
      subnet_private_access = true
    },
  ]

  secondary_ranges = {
    ("snet-gke-postgresql-us-central1") = [
      {
        range_name    = "ip-range-pods-db1"
        ip_cidr_range = "192.168.0.0/18"
      },
      {
        range_name    = "ip-range-svc-db1"
        ip_cidr_range = "192.168.64.0/18"
      },
    ],
    ("snet-gke-postgresql-us-west1") = [
      {
        range_name    = "ip-range-pods-db2"
        ip_cidr_range = "192.168.128.0/18"
      },
      {
        range_name    = "ip-range-svc-db2"
        ip_cidr_range = "192.168.192.0/18"
      },
    ]
  }
}

output "network_name" {
  value = module.gcp-network.network_name
}

output "primary_subnet_name" {
  value = module.gcp-network.subnets_names[0]
}

output "secondary_subnet_name" {
  value = module.gcp-network.subnets_names[1]
}

建立主要 GKE 叢集。

Terraform 會在 us-central1 區域中建立私人叢集，並啟用 GKE 備份功能，以進行災難復原，以及啟用 Managed Service for Prometheus，以監控叢集。

module "gke-db1" {
  source                   = "../modules/beta-private-cluster"
  project_id               = var.project_id
  name                     = "cluster-db1"
  regional                 = true
  region                   = "us-central1"
  network                  = module.network.network_name
  subnetwork               = module.network.primary_subnet_name
  ip_range_pods            = "ip-range-pods-db1"
  ip_range_services        = "ip-range-svc-db1"
  create_service_account   = true
  enable_private_endpoint  = false
  enable_private_nodes     = true
  master_ipv4_cidr_block   = "172.16.0.0/28"
  network_policy           = true
  cluster_autoscaling = {
    "autoscaling_profile": "OPTIMIZE_UTILIZATION",
    "enabled" : true,
    "gpu_resources" : [],
    "min_cpu_cores" : 36,
    "min_memory_gb" : 144,
    "max_cpu_cores" : 48,
    "max_memory_gb" : 192,
  }
  monitoring_enable_managed_prometheus = true
  gke_backup_agent_config = true

  node_pools = [
    {
      name            = "pool-sys"
      autoscaling     = true
      min_count       = 1
      max_count       = 3
      max_surge       = 1
      max_unavailable = 0
      machine_type    = "e2-standard-4"
      node_locations  = "us-central1-a,us-central1-b,us-central1-c"
      auto_repair     = true
    },
    {
      name            = "pool-db"
      autoscaling     = true
      max_surge       = 1
      max_unavailable = 0
      machine_type    = "e2-standard-8"
      node_locations  = "us-central1-a,us-central1-b,us-central1-c"
      auto_repair     = true
    },
  ]
  node_pools_labels = {
    all = {}
    pool-db = {
      "app.stateful/component" = "postgresql"
    }
    pool-sys = {
      "app.stateful/component" = "postgresql-pgpool"
    }
  }
  node_pools_taints = {
    all = []
    pool-db = [
      {
        key    = "app.stateful/component"
        value  = "postgresql"
        effect = "NO_SCHEDULE"
      },
    ],
    pool-sys = [
      {
        key    = "app.stateful/component"
        value  = "postgresql-pgpool"
        effect = "NO_SCHEDULE"
      },
    ],
  }
  gce_pd_csi_driver = true
}

在 us-west1 區域建立備份叢集，用於災難復原。

module "gke-db2" {
  source                   = "../modules/beta-private-cluster"
  project_id               = var.project_id
  name                     = "cluster-db2"
  regional                 = true
  region                   = "us-west1"
  network                  = module.network.network_name
  subnetwork               = module.network.secondary_subnet_name
  ip_range_pods            = "ip-range-pods-db2"
  ip_range_services        = "ip-range-svc-db2"
  create_service_account   = false
  service_account          = module.gke-db1.service_account
  enable_private_endpoint  = false
  enable_private_nodes     = true
  master_ipv4_cidr_block   = "172.16.0.16/28"
  network_policy           = true
  cluster_autoscaling = {
    "autoscaling_profile": "OPTIMIZE_UTILIZATION",
    "enabled" : true,
    "gpu_resources" : [],
    "min_cpu_cores" : 10,
    "min_memory_gb" : 144,
    "max_cpu_cores" : 48,
    "max_memory_gb" : 192,
  }
  monitoring_enable_managed_prometheus = true
  gke_backup_agent_config = true
  node_pools = [
    {
      name            = "pool-sys"
      autoscaling     = true
      min_count       = 1
      max_count       = 3
      max_surge       = 1
      max_unavailable = 0
      machine_type    = "e2-standard-4"
      node_locations  = "us-west1-a,us-west1-b,us-west1-c"
      auto_repair     = true
    },
    {
      name            = "pool-db"
      autoscaling     = true
      max_surge       = 1
      max_unavailable = 0
      machine_type    = "e2-standard-8"
      node_locations  = "us-west1-a,us-west1-b,us-west1-c"
      auto_repair     = true
    },
  ]
  node_pools_labels = {
    all = {}
    pool-db = {
      "app.stateful/component" = "postgresql"
    }
    pool-sys = {
      "app.stateful/component" = "postgresql-pgpool"
    }
  }
  node_pools_taints = {
    all = []
    pool-db = [
      {
        key    = "app.stateful/component"
        value  = "postgresql"
        effect = "NO_SCHEDULE"
      },
    ],
    pool-sys = [
      {
        key    = "app.stateful/component"
        value  = "postgresql-pgpool"
        effect = "NO_SCHEDULE"
      },
    ],
  }
  gce_pd_csi_driver = true
}

在叢集上部署 PostgreSQL

在本節中，您將使用 Helm 圖表部署 PostgreSQL 資料庫執行個體，在 GKE 上執行。

安裝 PostgreSQL

如要在叢集上安裝 PostgreSQL，請按照下列步驟操作。

設定 Docker 存取權。

gcloud auth configure-docker us-docker.pkg.dev

在 Artifact Registry 中填入必要的 PostgreSQL Docker 映像檔。
```
./scripts/gcr.sh bitnami/postgresql-repmgr 15.1.0-debian-11-r0
./scripts/gcr.sh bitnami/postgres-exporter 0.11.1-debian-11-r27
./scripts/gcr.sh bitnami/pgpool 4.3.3-debian-11-r28
```
指令碼會將下列 Bitnami 映像檔推送至 Artifact Registry，供 Helm 安裝：
- postgresql-repmgr：這個 PostgreSQL 叢集解決方案包含 PostgreSQL 複製管理工具 (repmgr)，這項開放原始碼工具可管理 PostgreSQL 叢集上的複製和容錯移轉作業。
- postgres-exporter： PostgreSQL Exporter 會收集 PostgreSQL 指標，供 Prometheus 使用。
- pgpool：Pgpool-II 是 PostgreSQL Proxy。並提供連線集區和負載平衡。

確認正確的映像檔儲存在存放區中。

gcloud artifacts docker images list us-docker.pkg.dev/$PROJECT_ID/main \
    --format="flattened(package)"

輸出結果會與下列內容相似：

---
image: us-docker.pkg.dev/[PROJECT_ID]/main/bitnami/pgpool
---
image: us-docker.pkg.dev/[PROJECT_ID]/main/bitnami/postgres-exporter
---
image: us-docker.pkg.dev/h[PROJECT_ID]/main/bitnami/postgresql-repmgr

設定主要叢集的 kubectl 指令列存取權。

gcloud container clusters get-credentials $SOURCE_CLUSTER \
--location=$REGION --project=$PROJECT_ID

建立命名空間。

export NAMESPACE=postgresql
kubectl create namespace $NAMESPACE

如果您要部署至 Autopilot 叢集，請在三個區域中設定節點佈建。如果您要部署到 Standard 叢集，可以略過這個步驟。

根據預設，Autopilot 只會在兩個區域中佈建資源。prepareforha.yaml 中定義的部署作業可確保 Autopilot 透過設定下列值，在叢集的三個區域中佈建節點：

replicas:3
podAntiAffinity，當中包含requiredDuringSchedulingIgnoredDuringExecution和topologyKey: "topology.kubernetes.io/zone"

kubectl -n $NAMESPACE apply -f scripts/prepareforha.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prepare-three-zone-ha
  labels:
    app: prepare-three-zone-ha
    app.kubernetes.io/name: postgresql-ha
spec:
  replicas: 3
  selector:
    matchLabels:
      app: prepare-three-zone-ha
      app.kubernetes.io/name: postgresql-ha
  template:
    metadata:
      labels:
        app: prepare-three-zone-ha
        app.kubernetes.io/name: postgresql-ha
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - prepare-three-zone-ha
            topologyKey: "topology.kubernetes.io/zone"
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: cloud.google.com/compute-class
                operator: In
                values:
                - "Scale-Out"
            weight: 1
      nodeSelector:
        app.stateful/component: postgresql
      tolerations:
      - effect: NoSchedule
        key: app.stateful/component
        operator: Equal
        value: postgresql
      containers:
      - name: prepare-three-zone-ha
        image: busybox:latest
        command:
            - "/bin/sh"
            - "-c"
            - "while true; do sleep 3600; done"
        resources:
          limits:
            cpu: "500m"
            ephemeral-storage: "10Mi"
            memory: "0.5Gi"
          requests:
            cpu: "500m"
            ephemeral-storage: "10Mi"
            memory: "0.5Gi"

更新 Helm 依附元件。

cd helm/postgresql-bootstrap
helm dependency update

檢查並驗證 Helm 將安裝的圖表。

helm -n postgresql template postgresql . \
  --set global.imageRegistry="us-docker.pkg.dev/$PROJECT_ID/main"

安裝 Helm 資訊套件。

helm -n postgresql upgrade --install postgresql . \
    --set global.imageRegistry="us-docker.pkg.dev/$PROJECT_ID/main"

輸出結果會與下列內容相似：

NAMESPACE: postgresql
STATUS: deployed
REVISION: 1
TEST SUITE: None

確認 PostgreSQL 副本正在執行。

kubectl get all -n $NAMESPACE

輸出結果會與下列內容相似：

NAME                                                          READY   STATUS    RESTARTS   AGE
pod/postgresql-postgresql-bootstrap-pgpool-75664444cb-dkl24   1/1     Running   0          8m39s
pod/postgresql-postgresql-ha-pgpool-6d86bf9b58-ff2bg          1/1     Running   0          8m39s
pod/postgresql-postgresql-ha-postgresql-0                     2/2     Running   0          8m39s
pod/postgresql-postgresql-ha-postgresql-1                     2/2     Running   0          8m39s
pod/postgresql-postgresql-ha-postgresql-2                     2/2     Running   0          8m38s

NAME                                                   TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)    AGE
service/postgresql-postgresql-ha-pgpool                ClusterIP   192.168.99.236    <none>        5432/TCP   8m39s
service/postgresql-postgresql-ha-postgresql            ClusterIP   192.168.90.20     <none>        5432/TCP   8m39s
service/postgresql-postgresql-ha-postgresql-headless   ClusterIP   None              <none>        5432/TCP   8m39s
service/postgresql-postgresql-ha-postgresql-metrics    ClusterIP   192.168.127.198   <none>        9187/TCP   8m39s

NAME                                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/postgresql-postgresql-bootstrap-pgpool   1/1     1            1           8m39s
deployment.apps/postgresql-postgresql-ha-pgpool          1/1     1            1           8m39s

NAME                                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/postgresql-postgresql-bootstrap-pgpool-75664444cb   1         1         1       8m39s
replicaset.apps/postgresql-postgresql-ha-pgpool-6d86bf9b58          1         1         1       8m39s

NAME                                                   READY   AGE
statefulset.apps/postgresql-postgresql-ha-postgresql   3/3     8m39s

建立測試資料集

在本節中，您將建立資料庫和含有樣本值的資料表。資料庫會做為容錯移轉程序的測試資料集，您將在本教學課程稍後測試該程序。

連線至 PostgreSQL 執行個體。

cd ../../
./scripts/launch-client.sh

輸出結果會與下列內容相似：

Launching Pod pg-client in the namespace postgresql ...
pod/pg-client created
waiting for the Pod to be ready
Copying script files to the target Pod pg-client ...
Pod: pg-client is healthy

啟動 Shell 工作階段。

kubectl exec -it pg-client -n postgresql -- /bin/bash

建立資料庫和資料表，然後插入一些測試資料列。

psql -h $HOST_PGPOOL -U postgres -a -q -f /tmp/scripts/generate-db.sql

確認每個資料表的資料列數。
```
psql -h $HOST_PGPOOL -U postgres -a -q -f /tmp/scripts/count-rows.sql
```
輸出結果會與下列內容相似：
```
select COUNT(*) from tb01;
 count
--------
 300000
(1 row)

select COUNT(*) from tb02;
 count
--------
 300000
(1 row)
```
提示： 您也可以使用 pgbench 建立虛擬資料，但為了更輕鬆區分查詢要求流量，建議使用提供的指令碼建立資料庫和資料表，以便在讀取/寫入測試期間進行查詢。

產生測試資料。

export DB=postgres
pgbench -i -h $HOST_PGPOOL -U postgres $DB -s 50

輸出結果會與下列內容相似：

dropping old tables...
creating tables...
generating data (client-side)...
5000000 of 5000000 tuples (100%) done (elapsed 29.85 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 36.86 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 31.10 s, vacuum 1.88 s, primary keys 3.86 s).

退出 postgres 用戶端 Pod。
```
exit
```

監控 PostgreSQL

在本節中，您將查看指標，並為 PostgreSQL 執行個體設定快訊。您將使用 Google Cloud Managed Service for Prometheus 執行監控和快訊作業。

查看指標

PostgreSQL 部署作業包含 postgresql-exporter Sidecar 容器。這個容器會公開 /metrics 端點。 Google Cloud Managed Service for Prometheus 已設定為監控這個端點上的 PostgreSQL Pod。您可以透過 Google Cloud 主控台資訊主頁查看這些指標。

Google Cloud 控制台提供幾種建立及儲存資訊主頁設定的方式：

建立及匯出：您可以直接在 Google Cloud console 中建立資訊主頁，然後匯出並儲存在程式碼存放區。如要這麼做，請在資訊主頁工具列中開啟 JSON 編輯器，然後下載資訊主頁 JSON 檔案。
儲存和匯入：按一下「+Create Dashboard」(建立資訊主頁)，然後使用「JSON editor」(JSON 編輯器) 選單上傳資訊主頁的 JSON 內容，即可從 JSON 檔案匯入資訊主頁。

如要以視覺化方式呈現 PostgreSQL 應用程式和 GKE 叢集的資料，請按照下列步驟操作：

建立下列資訊主頁。

cd monitoring
gcloud monitoring dashboards create \
        --config-from-file=dashboard/postgresql-overview.json \
        --project=$PROJECT_ID
gcloud monitoring dashboards create \
        --config-from-file dashboard/gke-postgresql.json \
        --project $PROJECT_ID

在 Google Cloud 控制台中，前往 Cloud Monitoring 資訊主頁。前往 Cloud Monitoring 資訊主頁
從資訊主頁清單中選取「自訂」。系統會顯示下列資訊主頁：
- PostgreSQL 總覽：顯示 PostgreSQL 應用程式的指標，包括資料庫正常運作時間、資料庫大小和交易延遲時間。
- GKE PostgreSQL 叢集：顯示 PostgreSQL 執行的 GKE 叢集指標，包括 CPU 使用率、記憶體使用率和磁碟區使用率。
點選各個連結，即可查看產生的資訊主頁。

設定快訊

快訊功能會及時提醒您應用程式的問題，方便您迅速解決。您可以建立快訊政策，指定快訊發送時機及通知方式。您也可以建立通知管道，選取要接收快訊的位置。

在本節中，您將使用 Terraform 設定下列範例快訊：

db_max_transaction：監控交易的最大延遲時間 (以秒為單位)，如果值大於 10，系統就會觸發快訊。
db_node_up：監控資料庫 Pod 的狀態；0 表示 Pod 已關閉，並觸發警報。

如要設定快訊，請按照下列步驟操作：

使用 Terraform 設定快訊。

EMAIL=YOUR_EMAIL
cd alerting/terraform
terraform init
terraform plan -var project_id=$PROJECT_ID -var email_address=$EMAIL
terraform apply -var project_id=$PROJECT_ID -var email_address=$EMAIL

替換下列值：

YOUR_EMAIL：您的電子郵件地址。

輸出結果會與下列內容相似：

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

連線至用戶端 Pod。

cd ../../../
kubectl exec -it --namespace postgresql pg-client -- /bin/bash

產生負載測試，測試 db_max_transaction 快訊。

pgbench -i -h $HOST_PGPOOL -U postgres -s 200 postgres

輸出結果會與下列內容相似：

dropping old tables...
creating tables...
generating data (client-side)...
20000000 of 20000000 tuples (100%) done (elapsed 163.22 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 191.30 s (drop tables 0.14 s, create tables 0.01 s, client-side generate 165.62 s, vacuum 4.52 s, primary keys 21.00 s).

系統會觸發快訊，並傳送電子郵件給 YOUR_EMAIL，主旨行開頭為「[ALERT] Max Lag of transaction」。

在 Google Cloud 控制台中，前往「快訊政策」頁面。

前往快訊政策
從列出的政策中選取 db_max_transaction。從圖表中，您應該會看到負載測試造成的尖峰，超過 Prometheus 指標 pg_stat_activity_max_tx_duration/gauge 的 10 個閾值。
退出 postgres 用戶端 Pod。
```
exit
```

管理 PostgreSQL 和 GKE 升級

PostgreSQL 和 Kubernetes 的版本更新會定期發布。請遵循作業最佳做法，定期更新軟體環境。根據預設，GKE 會為您管理叢集和節點集區升級作業。

升級 PostgreSQL

本節說明如何升級 PostgreSQL 版本。在本教學課程中，您將使用滾動更新策略升級 Pod，確保所有 Pod 都不會同時停止運作。

如要升級版本，請按照下列步驟操作：

將 postgresql-repmgr 映像檔的更新版本推送至 Artifact Registry。定義新版本 (例如 postgresql-repmgr 15.1.0-debian-11-r1)。

NEW_IMAGE=us-docker.pkg.dev/$PROJECT_ID/main/bitnami/postgresql-repmgr:15.1.0-debian-11-r1
./scripts/gcr.sh bitnami/postgresql-repmgr 15.1.0-debian-11-r1

使用 kubectl 觸發滾動式更新。

kubectl set image statefulset -n postgresql postgresql-postgresql-ha-postgresql postgresql=$NEW_IMAGE
kubectl rollout restart statefulsets -n postgresql postgresql-postgresql-ha-postgresql
kubectl rollout status statefulset -n postgresql postgresql-postgresql-ha-postgresql

您會看到 StatefulSet 完成輪替更新，從序數最高的副本開始，依序更新至序數最低的副本。

輸出結果會與下列內容相似：

Waiting for 1 pods to be ready...
waiting for statefulset rolling update to complete 1 pods at revision postgresql-postgresql-ha-postgresql-5c566ccf49...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
waiting for statefulset rolling update to complete 2 pods at revision postgresql-postgresql-ha-postgresql-5c566ccf49...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 3 pods at revision postgresql-postgresql-ha-postgresql-5c566ccf49...

規劃 Standard 叢集的 GKE 升級作業

如果您執行的是標準叢集，請參閱本節內容。執行有狀態服務時，您可以採取主動措施並設定配置，以降低風險並順利升級叢集，包括：

請遵循升級叢集的 GKE 最佳做法。選擇適當的升級策略，確保升級作業會在維護期間進行：
- 如果成本最佳化很重要，且工作負載可容忍在 60 分鐘內正常關機，請選擇突增升級。
- 如果工作負載較無法容忍中斷，且可接受因資源用量增加而導致的暫時性成本增加，請選擇藍綠升級。
詳情請參閱「升級執行有狀態工作負載的叢集」。
使用 Recommender 服務檢查淘汰深入分析資料和建議，避免服務中斷。
使用維護期間，確保升級作業在您預期的時間進行。在維護時段前，請確認資料庫備份作業是否成功。
允許流量進入升級後的節點前，請先使用完備性和有效性探測，確保節點已準備好處理流量。
建立探查，評估複寫是否同步，再接受流量。視資料庫的複雜度和規模而定，您可以透過自訂指令碼完成這項作業。

在 Standard 叢集升級期間確認資料庫可用性

如果您執行的是標準叢集，請參閱本節內容。如要驗證升級期間的 PostgreSQL 可用性，一般程序是在升級期間對 PostgreSQL 資料庫產生流量。然後使用 pgbench 檢查資料庫在升級期間是否能處理基準流量，與資料庫完全可用時的流量相比。

連線至 PostgreSQL 執行個體。

./scripts/launch-client.sh

輸出結果會與下列內容相似：

Launching Pod pg-client in the namespace postgresql ...
pod/pg-client created
waiting for the Pod to be ready
Copying script files to the target Pod pg-client ...
Pod: pg-client is healthy

在 Cloud Shell 中，進入用戶端 Pod 的殼層。

kubectl exec -it -n postgresql pg-client -- /bin/bash

初始化 pgbench。

pgbench -i -h $HOST_PGPOOL -U postgres postgres

使用下列指令取得基準結果，確認 PostgreSQL 應用程式在升級時間範圍內維持高可用性。如要取得基準結果，請透過多個工作 (執行緒) 測試多重連線 30 秒。

pgbench -h $HOST_PGPOOL -U postgres postgres -c10 -j4 -T 30 -R 200

輸出看起來類似以下內容：

pgbench (14.5)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 4
duration: 30 s
number of transactions actually processed: 5980
latency average = 7.613 ms
latency stddev = 2.898 ms
rate limit schedule lag: avg 0.256 (max 36.613) ms
initial connection time = 397.804 ms
tps = 201.955497 (without initial connection time)

為確保升級期間的可用性，您可以對資料庫產生一些負載，並確保 PostgreSQL 應用程式在升級期間提供一致的回應率。如要執行這項測試，請使用 pgbench 指令，針對資料庫產生一些流量。以下指令會執行 pgbench 一小時，目標為每秒 200 筆交易，並每 2 秒列出要求率。
```
pgbench -h $HOST_PGPOOL -U postgres postgres --client=10 --jobs=4 --rate=200 --time=3600 --progress=2 --select-only
```
其中：
- --client：模擬的用戶端數量，也就是並行資料庫工作階段數。
- --jobs：pgbench 中的工作執行緒數量。在多 CPU 機器上使用多個執行緒可能會有幫助。用戶端會盡可能平均分配至可用執行緒。預設值是 1。
- --rate：以每秒交易次數表示
- --progress：每 sec 秒顯示進度報告。
輸出結果會與下列內容相似：
```
pgbench (14.5)
starting vacuum...end.
progress: 5.0 s, 354.8 tps, lat 25.222 ms stddev 15.038
progress: 10.0 s, 393.8 tps, lat 25.396 ms stddev 16.459
progress: 15.0 s, 412.8 tps, lat 24.216 ms stddev 14.548
progress: 20.0 s, 405.0 tps, lat 24.656 ms stddev 14.066
```
在 Google Cloud 控制台中，返回 Cloud Monitoring 的「PostgreSQL Overview」(PostgreSQL 總覽) 資訊主頁。請注意「每個資料庫的連線數」和「每個 Pod 的連線數」圖表中的尖峰。
結束用戶端 Pod。
```
exit
```

刪除用戶端 Pod。

kubectl delete pod -n postgresql pg-client

模擬 PostgreSQL 服務中斷

在本節中，您將停止複製管理員服務，模擬其中一個 PostgreSQL 副本的服務中斷情形。這會導致 Pod 無法將流量提供給同層級副本，且存活探查會失敗。

開啟新的 Cloud Shell 工作階段，並設定主要叢集的 kubectl 指令列存取權。

gcloud container clusters get-credentials $SOURCE_CLUSTER \
--location=$REGION --project=$PROJECT_ID

查看 Kubernetes 中發出的 PostgreSQL 事件。

kubectl get events -n postgresql --field-selector=involvedObject.name=postgresql-postgresql-ha-postgresql-0 --watch

在先前的 Cloud Shell 工作階段中，停止 PostgreSQL repmgr，模擬服務故障。

將工作階段附加至資料庫容器。

kubectl exec -it -n $NAMESPACE postgresql-postgresql-ha-postgresql-0 -c postgresql -- /bin/bash

使用 repmgr 停止服務，然後移除檢查點和 dry-run 引數。

export ENTRY='/opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh'
export RCONF='/opt/bitnami/repmgr/conf/repmgr.conf'
$ENTRY repmgr -f $RCONF node service --action=stop --checkpoint

為 PostgreSQL 容器設定的存活探測作業會在五秒內開始失敗。每十秒重複一次，直到達到六次失敗的失敗門檻為止。達到 failureThreshold 值後，容器就會重新啟動。您可以設定這些參數，降低存活探查容許值，以調整部署作業的服務水準目標需求。

在事件串中，您會看到 Pod 的有效性和完備性探測失敗，以及容器需要重新啟動的訊息。輸出結果會與下列內容相似：

0s          Normal    Killing                pod/postgresql-postgresql-ha-postgresql-0   Container postgresql failed liveness probe, will be restarted
0s          Warning   Unhealthy              pod/postgresql-postgresql-ha-postgresql-0   Readiness probe failed: psql: error: connection to server at "127.0.0.1", port 5432 failed: Connection refused...
0s          Normal    Pulled                 pod/postgresql-postgresql-ha-postgresql-0   Container image "us-docker.pkg.dev/psch-gke-dev/main/bitnami/postgresql-repmgr:14.5.0-debian-11-r10" already present on machine
0s          Normal    Created                pod/postgresql-postgresql-ha-postgresql-0   Created container postgresql
0s          Normal    Started                pod/postgresql-postgresql-ha-postgresql-0   Started container postgresql

為災難復原做好準備

為確保生產環境工作負載在服務中斷事件發生時仍可使用，您應準備災難復原 (DR) 計畫。如要進一步瞭解 DR 規劃，請參閱「災難復原規劃指南」。

Kubernetes 的災難復原作業可分為兩個階段：

備份是指在服務中斷事件發生前，建立狀態或資料的時間點快照。
復原是指在發生災害後，從備份副本還原狀態或資料。

如要備份及還原 GKE 叢集中的工作負載，可以使用 GKE 備份服務。您可以在新叢集和現有叢集上啟用這項服務。這會部署在叢集中執行的 Backup for GKE 代理程式，負責擷取設定和磁碟區備份資料，並協調復原作業。

備份和還原作業的範圍可以是整個叢集、命名空間或應用程式 (由選取器定義，例如 matchLabels)。

PostgreSQL 備份與還原情境範例

本節的範例說明如何使用 ProtectedApplication 自訂資源，在應用程式範圍內執行備份和還原作業。

下圖顯示 ProtectedApplication 中的元件資源，也就是代表 postgresql-ha 應用程式的 StatefulSet，以及 pgpool 的部署作業，兩者使用相同的標籤 (app.kubernetes.io/name: postgresql-ha)。

這張圖顯示高可用性 PostgreSQL 叢集的備份與復原解決方案範例。 — **圖 2**：高可用性 PostgreSQL 叢集的備份與復原解決方案範例。

如要準備備份及還原 PostgreSQL 工作負載，請按照下列步驟操作：

設定環境變數。在本範例中，您將使用 ProtectedApplication 從來源 GKE 叢集 (us-central1) 還原 PostgreSQL 工作負載及其磁碟區，然後還原至不同區域的另一個 GKE 叢集 (us-west1)。

export SOURCE_CLUSTER=cluster-db1
export TARGET_CLUSTER=cluster-db2
export REGION=us-central1
export DR_REGION=us-west1
export NAME_PREFIX=g-db-protected-app
export BACKUP_PLAN_NAME=$NAME_PREFIX-bkp-plan-01
export BACKUP_NAME=bkp-$BACKUP_PLAN_NAME
export RESTORE_PLAN_NAME=$NAME_PREFIX-rest-plan-01
export RESTORE_NAME=rest-$RESTORE_PLAN_NAME

確認叢集已啟用 GKE 備份服務。您先前執行 Terraform 設定時，應該已啟用這項服務。

gcloud container clusters describe $SOURCE_CLUSTER \
    --project=$PROJECT_ID  \
    --location=$REGION \
    --format='value(addonsConfig.gkeBackupAgentConfig)'

如果已啟用 GKE 備份功能，指令輸出內容會顯示 enabled=True。

設定備份方案並執行還原

您可以透過 GKE 備份功能建立備份方案，以 Cron 作業的形式執行。備份方案包含備份設定，包括來源叢集、要備份的工作負載選取項目，以及要儲存這項方案所產生備份構件的區域。

如要執行備份和還原作業，請按照下列步驟操作：

在 cluster-db1 上驗證 ProtectedApplication 的狀態。

kubectl get ProtectedApplication -A

輸出看起來類似以下內容：

NAMESPACE    NAME            READY TO BACKUP
postgresql   postgresql-ha   true

為 ProtectedApplication 建立備份方案。

export NAMESPACE=postgresql
export PROTECTED_APP=$(kubectl get ProtectedApplication -n $NAMESPACE | grep -v 'NAME' | awk '{ print $1 }')

gcloud beta container backup-restore backup-plans create $BACKUP_PLAN_NAME \
--project=$PROJECT_ID \
--location=$DR_REGION \
--cluster=projects/$PROJECT_ID/locations/$REGION/clusters/$SOURCE_CLUSTER \
--selected-applications=$NAMESPACE/$PROTECTED_APP \
--include-secrets \
--include-volume-data \
--cron-schedule="0 3 * * *" \
--backup-retain-days=7 \
--backup-delete-lock-days=0

手動建立備份。

gcloud beta container backup-restore backups create $BACKUP_NAME \
--project=$PROJECT_ID \
--location=$DR_REGION \
--backup-plan=$BACKUP_PLAN_NAME \
--wait-for-completion

設定還原方案。

gcloud beta container backup-restore restore-plans create $RESTORE_PLAN_NAME \
  --project=$PROJECT_ID \
  --location=$DR_REGION \
  --backup-plan=projects/$PROJECT_ID/locations/$DR_REGION/backupPlans/$BACKUP_PLAN_NAME \
  --cluster=projects/$PROJECT_ID/locations/$DR_REGION/clusters/$TARGET_CLUSTER \
  --cluster-resource-conflict-policy=use-existing-version \
  --namespaced-resource-restore-mode=delete-and-restore \
  --volume-data-restore-policy=restore-volume-data-from-backup \
  --selected-applications=$NAMESPACE/$PROTECTED_APP \
  --cluster-resource-scope-selected-group-kinds="storage.k8s.io/StorageClass","scheduling.k8s.io/PriorityClass"

從備份還原。

gcloud beta container backup-restore restores create $RESTORE_NAME \
  --project=$PROJECT_ID \
  --location=$DR_REGION \
  --restore-plan=$RESTORE_PLAN_NAME \
  --backup=projects/$PROJECT_ID/locations/$DR_REGION/backupPlans/$BACKUP_PLAN_NAME/backups/$BACKUP_NAME \
  --wait-for-completion

確認叢集已還原

如要確認還原的叢集是否包含所有預期的 Pod、PersistentVolume 和 StorageClass 資源，請按照下列步驟操作：

設定備份叢集 cluster-db2 的 kubectl 指令列存取權。

gcloud container clusters get-credentials $TARGET_CLUSTER --location $DR_REGION --project $PROJECT_ID

確認 StatefulSet 已準備就緒，且有 3/3 個 Pod。

kubectl get all -n $NAMESPACE

輸出結果會與下列內容相似：

NAME                                                   READY   STATUS    RESTARTS        AGE
pod/postgresql-postgresql-ha-pgpool-778798b5bd-k2q4b   1/1     Running   0               4m49s
pod/postgresql-postgresql-ha-postgresql-0              2/2     Running   2 (4m13s ago)   4m49s
pod/postgresql-postgresql-ha-postgresql-1              2/2     Running   0               4m49s
pod/postgresql-postgresql-ha-postgresql-2              2/2     Running   0               4m49s

NAME                                                   TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)    AGE
service/postgresql-postgresql-ha-pgpool                ClusterIP   192.168.241.46    <none>        5432/TCP   4m49s
service/postgresql-postgresql-ha-postgresql            ClusterIP   192.168.220.20    <none>        5432/TCP   4m49s
service/postgresql-postgresql-ha-postgresql-headless   ClusterIP   None              <none>        5432/TCP   4m49s
service/postgresql-postgresql-ha-postgresql-metrics    ClusterIP   192.168.226.235   <none>        9187/TCP   4m49s

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/postgresql-postgresql-ha-pgpool   1/1     1            1           4m49s

NAME                                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/postgresql-postgresql-ha-pgpool-778798b5bd   1         1         1       4m49s

NAME                                                   READY   AGE
statefulset.apps/postgresql-postgresql-ha-postgresql   3/3     4m49s

確認 postgres 命名空間中的所有 Pod 都在執行。

kubectl get pods -n $NAMESPACE

輸出結果會與下列內容相似：

postgresql-postgresql-ha-pgpool-569d7b8dfc-2f9zx   1/1     Running   0          7m56s
postgresql-postgresql-ha-postgresql-0              2/2     Running   0          7m56s
postgresql-postgresql-ha-postgresql-1              2/2     Running   0          7m56s
postgresql-postgresql-ha-postgresql-2              2/2     Running   0          7m56s

驗證 PersistentVolume 和 StorageClass。在還原程序中，GKE 備份服務會在目標工作負載中建立 Proxy 類別，以取代來源工作負載中佈建的 StorageClass (範例輸出中的 gce-pd-gkebackup-dn)。

kubectl get pvc -n $NAMESPACE

輸出結果會與下列內容相似：

NAME                                         STATUS   VOLUME                 CAPACITY   ACCESS MODES   STORAGECLASS          AGE
data-postgresql-postgresql-ha-postgresql-0   Bound    pvc-be91c361e9303f96   8Gi        RWO            gce-pd-gkebackup-dn   10m
data-postgresql-postgresql-ha-postgresql-1   Bound    pvc-6523044f8ce927d3   8Gi        RWO            gce-pd-gkebackup-dn   10m
data-postgresql-postgresql-ha-postgresql-2   Bound    pvc-c9e71a99ccb99a4c   8Gi        RWO            gce-pd-gkebackup-dn   10m

確認還原的資料是否符合預期

如要驗證還原的資料是否符合預期，請按照下列步驟操作：

連線至 PostgreSQL 執行個體。

./scripts/launch-client.sh
kubectl exec -it pg-client -n postgresql -- /bin/bash

確認每個資料表的資料列數。
```
psql -h $HOST_PGPOOL -U postgres -a -q -f /tmp/scripts/count-rows.sql
select COUNT(*) from tb01;
```
您應該會看到與先前在「建立測試資料集」中寫入的資料類似的結果。輸出結果會與下列內容相似：
```
300000
(1 row)
```
結束用戶端 Pod。
```
exit
```