停用虛刪除功能

總覽 使用方式

本頁說明如何為機構中新舊值區停用軟刪除功能。

新值區預設會啟用虛刪除功能,以防資料遺失。如有需要,您可以修改虛刪除政策,為現有 bucket 停用虛刪除功能,也可以設定全機構預設標記,為新 bucket 預設停用虛刪除功能。請注意,停用虛刪除功能後,您就無法復原已刪除的資料,包括誤刪或惡意刪除的資料。

必要的角色

如要取得停用軟刪除功能所需的權限,請要求管理員在機構層級授予下列 IAM 角色:

這些預先定義角色具備停用虛刪除功能所需的權限。如要查看確切的必要權限,請展開「必要權限」部分:

所需權限

如要停用虛刪除功能,您必須具備下列權限:

  • storage.buckets.get
  • storage.buckets.update
  • storage.buckets.list (如果您打算使用 Google Cloud 控制台執行本頁的操作說明,才需要這項權限)

    如要瞭解代碼管理員(roles/resourcemanager.tagAdmin) 角色具備的必要權限,請參閱「管理代碼的必要權限」。

如要瞭解如何授予角色,請參閱「將 IAM 與值區搭配使用」或「管理專案存取權」。

停用特定值區的虛刪除功能

開始之前,請先考量以下事項:

  • 如果在停用虛刪除政策時,值區中含有虛刪除的物件,系統會保留這些物件,直到先前套用的保留期限到期為止。

  • 在 bucket 停用虛刪除政策後,Cloud Storage 不會保留新刪除的物件。

如要為特定值區停用軟刪除功能,請按照下列操作說明執行:

控制台

  1. 在 Google Cloud 控制台,前往「Cloud Storage bucket」頁面。

    前往「Buckets」(值區) 頁面

  2. 在 bucket 清單中,點選要停用軟刪除政策的 bucket 名稱。

  3. 按一下「保護」分頁標籤。

  4. 在「虛刪除政策」部分,按一下「停用」即可停用虛刪除政策。

  5. 按一下「確認」。

如要瞭解如何透過 Google Cloud 控制台取得 Cloud Storage 作業失敗的詳細錯誤資訊,請參閱「疑難排解」一文。

指令列

執行 gcloud storage buckets update 指令並加上 --clear-soft-delete 旗標:

gcloud storage buckets update --clear-soft-delete gs://BUCKET_NAME

其中:

  • BUCKET_NAME 是值區名稱。例如:my-bucket

REST API

JSON API

  1. 安裝並初始化 gcloud CLI,以便為 Authorization 標頭產生存取權杖。

  2. 建立包含下列資訊的 JSON 檔案:

    {
      "softDeletePolicy": {
        "retentionDurationSeconds": "0"
      }
    }
  3. 使用 cURL 透過 PATCH Bucket 要求呼叫 JSON API

    curl -X PATCH --data-binary @JSON_FILE_NAME \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://storage.googleapis.com/storage/v1/b/BUCKET_NAME"

    其中:

    • JSON_FILE_NAME 是您在步驟 2 建立的 JSON 檔案路徑。
    • BUCKET_NAME 是相關值區的名稱。例如:my-bucket

為專案中最大的 100 個 bucket 停用虛刪除功能

使用 Google Cloud 控制台時,您最多可以一次停用 100 個 bucket 的虛刪除功能,並依虛刪除的位元組數最多或虛刪除的位元組數與現有位元組數的比例最高來排序 bucket,方便您管理對虛刪除費用影響最大的 bucket。

  1. 在 Google Cloud 控制台,前往「Cloud Storage bucket」頁面。

    前往「Buckets」(值區) 頁面

  2. 在「Cloud Storage」頁面中,按一下「設定」

  3. 按一下「軟刪除」分頁標籤。

  4. 在「依刪除位元組數排序的前幾名儲存空間」清單中,選取要停用軟刪除功能的儲存空間。

  5. 按一下「停用虛刪除功能」

    您選取的 bucket 已停用虛刪除功能。

為專案中的多個或所有 bucket 停用虛刪除功能

使用 Google Cloud CLI 執行 gcloud storage buckets update 指令,並搭配 --project 標記和 * 萬用字元,即可為專案中的多個或所有 bucket 停用虛刪除功能:

gcloud storage buckets update --project=PROJECT_ID --clear-soft-delete gs://*

其中:

  • PROJECT_ID 是專案 ID。例如:my-project

停用資料夾中所有 bucket 的虛刪除功能

使用 Google Cloud CLI 執行 gcloud projects listgcloud storage buckets update 指令,在指定資料夾的所有專案中,停用 bucket 的軟刪除功能。

執行 gcloud projects listgcloud storage buckets update 指令,列出指定資料夾中的所有 bucket,然後停用資料夾中所有 bucket 的軟刪除功能:

gcloud projects list --filter="parent.id: FOLDER_ID" --format="value(projectId)" | while read project
do
  gcloud storage buckets update --project=$project --clear-soft-delete gs://*
done

其中:

  • FOLDER_ID 是資料夾名稱。例如:123456

在機構層級停用虛刪除功能

使用 Google Cloud CLI 執行 gcloud storage buckets update 指令,並加上 --clear-soft-delete 旗標和 * 萬用字元,在機構層級停用軟刪除功能:

執行帶有 --clear-soft-delete 標記和 * 萬用字元gcloud storage buckets update 指令,為貴機構內的所有值區停用軟刪除功能:

gcloud projects list --format="value(projectId)" | while read project
do
  gcloud storage buckets update --project=$project --clear-soft-delete gs://*
done

Cloud Storage 會停用現有 bucket 的虛刪除功能。 已虛刪除的物件會保留在 bucket 中,直到虛刪除保留期限屆滿,之後才會永久刪除。

為新值區停用虛刪除功能

新值區預設會啟用虛刪除功能,但您可以使用標記,防止系統預設啟用這項功能。標記會使用 storage.defaultSoftDeletePolicy 鍵在機構層級套用 0d (零天) 軟刪除政策,這會停用這項功能,並防止日後保留已刪除的資料。

建立新值區時,請按照下列操作說明,預設停用軟刪除功能。請注意,下列操作說明與設定強制執行特定虛刪除政策的機構政策不同,也就是說,您仍可視需要指定政策,在特定值區上啟用虛刪除功能。

  1. 使用 Google Cloud CLI 建立 storage.defaultSoftDeletePolicy 標記,用於變更新 bucket 的預設虛刪除保留時間。請注意,只有 storage.defaultSoftDeletePolicy 標記名稱會更新預設的軟刪除保留時間長度。

    使用 gcloud resource-manager tags keys create 指令建立標記鍵:

    gcloud resource-manager tags keys create storage.defaultSoftDeletePolicy \
     --parent=organizations/ORGANIZATION_ID \
     --description="Configures the default softDeletePolicy for new Storage buckets."
    

    其中:

    • ORGANIZATION_ID 是您要設定預設軟刪除保留時間的機構數字 ID。例如,12345678901。如要瞭解如何找出機構 ID,請參閱「取得機構資源 ID」。
  2. 使用 gcloud resource-manager tags values create 指令,為 0d (零天) 建立標記值,即可在新值區中預設停用軟刪除保留期限:

    gcloud resource-manager tags values create 0d \
      --parent=ORGANIZATION_ID/storage.defaultSoftDeletePolicy \
      --description="Disables soft delete for new Storage buckets."
    

    其中:

    • ORGANIZATION_ID 是您要設定預設軟刪除保留時間的機構數字 ID。例如:12345678901
  3. 使用 gcloud resource-manager tags bindings create 指令將標記附加至資源:

    gcloud resource-manager tags bindings create \
     --tag-value=ORGANIZATION_ID/storage.defaultSoftDeletePolicy/0d \
     --parent=RESOURCE_ID
    

    其中:

    • ORGANIZATION_ID 是建立代碼的機構的數字 ID。例如:12345678901

    • RESOURCE_ID 是要建立標記繫結的機構完整名稱。舉例來說,如要將標記附加至 organizations/7890123456,請輸入 //cloudresourcemanager.googleapis.com/organizations/7890123456

針對超過指定費用門檻的 bucket 停用虛刪除功能

使用 Python 適用的 Cloud 用戶端程式庫時,您可以透過 Python 用戶端程式庫範例,為超過指定相對成本門檻的 bucket 停用軟刪除功能。這個範例會執行下列作業:

  1. 計算各個儲存空間類別的相對儲存空間費用。

  2. 評估 bucket 累積的虛刪除費用。

  3. 設定虛刪除用量的費用門檻,並列出超過所設門檻的 bucket,以及停用超過門檻 bucket 的虛刪除功能。

如要進一步瞭解如何設定 Python 用戶端程式庫及使用範例,請參閱 Cloud Storage 軟刪除成本分析工具 README.md 頁面

以下範例會針對超過指定費用門檻的值區停用軟刪除功能:

from __future__ import annotations

import argparse
import json
import google.cloud.monitoring_v3 as monitoring_client


def get_relative_cost(storage_class: str) -> float:
    """Retrieves the relative cost for a given storage class and location.

    Args:
        storage_class: The storage class (e.g., 'standard', 'nearline').

    Returns:
        The price per GB from the https://cloud.google.com/storage/pricing,
        divided by the standard storage class.
    """
    relative_cost = {
        "STANDARD": 0.023 / 0.023,
        "NEARLINE": 0.013 / 0.023,
        "COLDLINE": 0.007 / 0.023,
        "ARCHIVE": 0.0025 / 0.023,
    }

    return relative_cost.get(storage_class, 1.0)


def get_soft_delete_cost(
    project_name: str,
    soft_delete_window: float,
    agg_days: int,
    lookback_days: int,
) -> dict[str, list[dict[str, float]]]:
    """Calculates soft delete costs for buckets in a Google Cloud project.

    Args:
        project_name: The name of the Google Cloud project.
        soft_delete_window: The time window in seconds for considering
          soft-deleted objects (default is 7 days).
        agg_days: Aggregate results over a time period, defaults to 30-day period
        lookback_days: Look back up to upto days, defaults to 360 days

    Returns:
        A dictionary with bucket names as keys and cost data for each bucket,
        broken down by storage class.
    """

    query_client = monitoring_client.QueryServiceClient()

    # Step 1: Get storage class ratios for each bucket.
    storage_ratios_by_bucket = get_storage_class_ratio(
        project_name, query_client, agg_days, lookback_days
    )

    # Step 2: Fetch soft-deleted bytes and calculate costs using Monitoring API.
    soft_deleted_costs = calculate_soft_delete_costs(
        project_name,
        query_client,
        soft_delete_window,
        storage_ratios_by_bucket,
        agg_days,
        lookback_days,
    )

    return soft_deleted_costs


def calculate_soft_delete_costs(
    project_name: str,
    query_client: monitoring_client.QueryServiceClient,
    soft_delete_window: float,
    storage_ratios_by_bucket: dict[str, float],
    agg_days: int,
    lookback_days: int,
) -> dict[str, list[dict[str, float]]]:
    """Calculates the relative cost of enabling soft delete for each bucket in a
       project for certain time frame in secs.

    Args:
        project_name: The name of the Google Cloud project.
        query_client: A Monitoring API query client.
        soft_delete_window: The time window in seconds for considering
          soft-deleted objects (default is 7 days).
        storage_ratios_by_bucket: A dictionary of storage class ratios per bucket.
        agg_days: Aggregate results over a time period, defaults to 30-day period
        lookback_days: Look back up to upto days, defaults to 360 days

    Returns:
        A dictionary with bucket names as keys and a list of cost data
        dictionaries
        for each bucket, broken down by storage class.
    """
    soft_deleted_bytes_time = query_client.query_time_series(
        monitoring_client.QueryTimeSeriesRequest(
            name=f"projects/{project_name}",
            query=f"""
                    {{  # Fetch 1: Soft-deleted (bytes seconds)
                        fetch gcs_bucket :: storage.googleapis.com/storage/v2/deleted_bytes
                        | value val(0) * {soft_delete_window}\'s\'  # Multiply by soft delete window
                        | group_by [resource.bucket_name, metric.storage_class], window(), .sum;

                        # Fetch 2: Total byte-seconds (active objects)
                        fetch gcs_bucket :: storage.googleapis.com/storage/v2/total_byte_seconds
                        | filter metric.type != 'soft-deleted-object'
                        | group_by [resource.bucket_name, metric.storage_class], window(1d), .mean  # Daily average
                        | group_by [resource.bucket_name, metric.storage_class], window(), .sum  # Total over window

                    }}  # End query definition
                    | every {agg_days}d  # Aggregate over larger time intervals
                    | within {lookback_days}d  # Limit data range for analysis
                    | ratio  # Calculate ratio (soft-deleted (bytes seconds)/ total (bytes seconds))
                    """,
        )
    )

    buckets: dict[str, list[dict[str, float]]] = {}
    missing_distribution_storage_class = []
    for data_point in soft_deleted_bytes_time.time_series_data:
        bucket_name = data_point.label_values[0].string_value
        storage_class = data_point.label_values[1].string_value
        # To include location-based cost analysis:
        # 1. Uncomment the line below:
        # location = data_point.label_values[2].string_value
        # 2. Update how you calculate 'relative_storage_class_cost' to factor in location
        soft_delete_ratio = data_point.point_data[0].values[0].double_value
        distribution_storage_class = bucket_name + " - " + storage_class
        storage_class_ratio = storage_ratios_by_bucket.get(
            distribution_storage_class
        )
        if storage_class_ratio is None:
            missing_distribution_storage_class.append(
                distribution_storage_class)
        buckets.setdefault(bucket_name, []).append({
            # Include storage class and location data for additional plotting dimensions.
            # "storage_class": storage_class,
            # 'location': location,
            "soft_delete_ratio": soft_delete_ratio,
            "storage_class_ratio": storage_class_ratio,
            "relative_storage_class_cost": get_relative_cost(storage_class),
        })

    if missing_distribution_storage_class:
        print(
            "Missing storage class for following buckets:",
            missing_distribution_storage_class,
        )
        raise ValueError("Cannot proceed with missing storage class ratios.")

    return buckets


def get_storage_class_ratio(
    project_name: str,
    query_client: monitoring_client.QueryServiceClient,
    agg_days: int,
    lookback_days: int,
) -> dict[str, float]:
    """Calculates storage class ratios for each bucket in a project.

    This information helps determine the relative cost contribution of each
    storage class to the overall soft-delete cost.

    Args:
        project_name: The Google Cloud project name.
        query_client: Google Cloud's Monitoring Client's QueryServiceClient.
        agg_days: Aggregate results over a time period, defaults to 30-day period
        lookback_days: Look back up to upto days, defaults to 360 days

    Returns:
        Ratio of Storage classes within a bucket.
    """
    request = monitoring_client.QueryTimeSeriesRequest(
        name=f"projects/{project_name}",
        query=f"""
            {{
            # Fetch total byte-seconds for each bucket and storage class
            fetch gcs_bucket :: storage.googleapis.com/storage/v2/total_byte_seconds
            | group_by [resource.bucket_name, metric.storage_class], window(), .sum;
            # Fetch total byte-seconds for each bucket (regardless of class)
            fetch gcs_bucket :: storage.googleapis.com/storage/v2/total_byte_seconds
            | group_by [resource.bucket_name], window(), .sum
            }}
            | ratio  # Calculate ratios of storage class size to total size
            | every {agg_days}d
            | within {lookback_days}d
            """,
    )

    storage_class_ratio = query_client.query_time_series(request)

    storage_ratios_by_bucket = {}
    for time_series in storage_class_ratio.time_series_data:
        bucket_name = time_series.label_values[0].string_value
        storage_class = time_series.label_values[1].string_value
        ratio = time_series.point_data[0].values[0].double_value

        # Create a descriptive key for the dictionary
        key = f"{bucket_name} - {storage_class}"
        storage_ratios_by_bucket[key] = ratio

    return storage_ratios_by_bucket


def soft_delete_relative_cost_analyzer(
    project_name: str,
    cost_threshold: float = 0.0,
    soft_delete_window: float = 604800,
    agg_days: int = 30,
    lookback_days: int = 360,
    list_buckets: bool = False,
    ) -> str | dict[str, float]: # Note potential string output
    """Identifies buckets exceeding the relative cost threshold for enabling soft delete.

    Args:
        project_name: The Google Cloud project name.
        cost_threshold: Threshold above which to consider removing soft delete.
        soft_delete_window: Time window for calculating soft-delete costs (in
          seconds).
        agg_days: Aggregate results over this time period (in days).
        lookback_days: Look back up to this many days.
        list_buckets: Return a list of bucket names (True) or JSON (False,
          default).

    Returns:
        JSON formatted results of buckets exceeding the threshold and costs
        *or* a space-separated string of bucket names.
    """

    buckets: dict[str, float] = {}
    for bucket_name, storage_sources in get_soft_delete_cost(
        project_name, soft_delete_window, agg_days, lookback_days
    ).items():
        bucket_cost = 0.0
        for storage_source in storage_sources:
            bucket_cost += (
                storage_source["soft_delete_ratio"]
                * storage_source["storage_class_ratio"]
                * storage_source["relative_storage_class_cost"]
            )
        if bucket_cost > cost_threshold:
            buckets[bucket_name] = round(bucket_cost, 4)

    if list_buckets:
        return " ".join(buckets.keys())  # Space-separated bucket names
    else:
        return json.dumps(buckets, indent=2)  # JSON output


def soft_delete_relative_cost_analyzer_main() -> None:
    # Sample run: python storage_soft_delete_relative_cost_analyzer.py <Project Name>
    parser = argparse.ArgumentParser(
        description="Analyze and manage Google Cloud Storage soft-delete costs."
    )
    parser.add_argument(
        "project_name", help="The name of the Google Cloud project to analyze."
    )
    parser.add_argument(
        "--cost_threshold",
        type=float,
        default=0.0,
        help="Relative Cost threshold.",
    )
    parser.add_argument(
        "--soft_delete_window",
        type=float,
        default=604800.0,
        help="Time window (in seconds) for considering soft-deleted objects.",
    )
    parser.add_argument(
        "--agg_days",
        type=int,
        default=30,
        help=(
            "Time window (in days) for aggregating results over a time period,"
            " defaults to 30-day period"
        ),
    )
    parser.add_argument(
        "--lookback_days",
        type=int,
        default=360,
        help=(
            "Time window (in days) for considering the how old the bucket to be."
        ),
    )
    parser.add_argument(
        "--list",
        type=bool,
        default=False,
        help="Return the list of bucketnames seperated by space.",
    )

    args = parser.parse_args()

    response = soft_delete_relative_cost_analyzer(
        args.project_name,
        args.cost_threshold,
        args.soft_delete_window,
        args.agg_days,
        args.lookback_days,
        args.list,
    )
    if not args.list:
        print(
            "To remove soft-delete policy from the listed buckets run:\n"
            # Capture output
            "python storage_soft_delete_relative_cost_analyzer.py"
            " [your-project-name] --[OTHER_OPTIONS] --list > list_of_buckets.txt \n"
            "cat list_of_buckets.txt | gcloud storage buckets update -I "
            "--clear-soft-delete",
            response,
        )
        return
    print(response)


if __name__ == "__main__":
    soft_delete_relative_cost_analyzer_main()

後續步驟