停用软删除

概览 用法

本页介绍了如何在贵组织中的新存储分区和现有存储分区上停用软删除功能。

新存储分区会默认启用软删除,以防止数据丢失。如果需要,您可以通过修改软删除政策来为现有存储桶停用软删除,也可以通过设置组织级默认标记来为新存储桶默认停用软删除。请注意,停用软删除后,您无法恢复已删除的数据,包括意外或恶意删除的数据。

所需的角色

如需获得停用软删除所需的权限,请让您的管理员为您授予组织级别的以下 IAM 角色:

这些预定义角色可提供停用软删除所需的权限。如需查看所需的确切权限,请展开所需权限部分:

所需权限

停用软删除需要以下权限:

  • storage.buckets.get
  • storage.buckets.update
  • storage.buckets.list(只有在您计划使用 Google Cloud 控制台执行本页面上的说明时,才需要此权限)

    如需了解作为 Tag Admin(roles/resourcemanager.tagAdmin) 角色的一部分可提供的所需权限,请参阅管理标记所需的权限

如需了解如何授予角色,请参阅将 IAM 与存储桶搭配使用管理对项目的访问权限

为特定存储桶停用软删除

开始之前,请考虑以下事项:

  • 如果您在停用期间对包含软删除对象的存储桶停用软删除政策,则现有的软删除对象会保留到之前应用的保留时长到期为止。

  • 对存储桶停用软删除政策后,Cloud Storage 不会保留新删除的对象。

如需为特定存储桶停用软删除功能,请按照以下说明操作:

控制台

  1. 在 Google Cloud 控制台中,前往 Cloud Storage 存储分区页面。

    进入“存储桶”

  2. 在存储桶列表中,点击要停用其软删除政策的存储桶的名称。

  3. 点击保护标签页。

  4. 软删除政策部分,点击停用以停用软删除政策。

  5. 点击确认

如需了解如何在 Google Cloud 控制台中获取有关失败的 Cloud Storage 操作的详细错误信息,请参阅问题排查

命令行

运行带有 --clear-soft-delete 标志的 gcloud storage buckets update 命令:

  gcloud storage buckets update --clear-soft-delete gs://BUCKET_NAME

其中:

  • BUCKET_NAME 是存储桶的名称。例如 my-bucket

REST API

JSON API

  1. 安装并初始化 gcloud CLI,以便为 Authorization 标头生成访问令牌。

  2. 创建一个包含以下信息的 JSON 文件:

    {
      "softDeletePolicy": {
        "retentionDurationSeconds": "0"
      }
    }
  3. 使用 cURL 通过 PATCH Bucket 请求调用 JSON API

    curl -X PATCH --data-binary @JSON_FILE_NAME \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://storage.googleapis.com/storage/v1/b/BUCKET_NAME"

    其中:

    • JSON_FILE_NAME 是您在第 2 步中创建的 JSON 文件的路径。
    • BUCKET_NAME 是相关存储桶的名称,例如 my-bucket

为项目中最大的 100 个存储分区停用软删除

借助 Google Cloud 控制台,您可以一次为最多 100 个存储分区停用软删除功能,并按软删除字节数最多或软删除字节数与实时字节数最高比率对存储分区进行排序,以便管理对软删除费用影响最大的存储分区。

  1. 在 Google Cloud 控制台中,前往 Cloud Storage 存储分区页面。

    进入“存储桶”

  2. 在 Cloud Storage 页面中,点击设置

  3. 点击软删除标签页。

  4. 删除的字节数最多的存储桶列表中,选择要停用软删除的存储桶。

  5. 点击关闭软删除

    所选存储桶上已停用软删除。

为项目中的多个存储分区或所有存储分区停用软删除

使用 Google Cloud CLI,运行带有 --project 标志和 * 通配符gcloud storage buckets update 命令,即可为项目中的多个或所有存储分区批量停用软删除功能:

gcloud storage buckets update --project=PROJECT_ID --clear-soft-delete gs://*

其中:

  • PROJECT_ID 是项目的 ID。例如 my-project

为文件夹中的所有存储分区停用软删除

使用 Google Cloud CLI 运行 gcloud projects listgcloud storage buckets update 命令,即可为指定文件夹中的所有项目中的存储分区停用软删除功能。

  1. 运行 gcloud projects listgcloud storage buckets update 命令可列出指定文件夹下的所有存储分区,然后为文件夹中的所有存储分区停用软删除功能:

    gcloud projects list --filter="parent.id: FOLDER_ID" --format="value(projectId)" | while read project
    do
    gcloud storage buckets update --project=$project --clear-soft-delete gs://*
    done
    

    其中:

    • FOLDER_ID 是文件夹的名称。例如 123456

在组织级层停用软删除

使用 Google Cloud CLI 运行带有 --clear-soft-delete 标志和 * 通配符gcloud storage buckets update 命令,以在组织一级停用软删除功能:

  1. 运行带有 --clear-soft-delete 标志和 * 通配符gcloud storage buckets update 命令,为贵组织中的所有存储分区停用软删除功能:

    gcloud projects list --format="value(projectId)" | while read project
    do
    gcloud storage buckets update --project=$project --clear-soft-delete gs://*
    done
    

Cloud Storage 会为现有存储桶停用软删除。已被软删除的对象将保留在存储分区中,直到其软删除保留时长结束,之后才会被永久删除。

为新存储分区停用软删除

虽然新存储分区会默认启用软删除,但您可以使用标记来阻止默认启用软删除。标记使用 storage.defaultSoftDeletePolicy 键在组织级层应用 0d(零天)软删除政策,这会停用该功能并防止日后保留已删除的数据。

按照以下说明在创建新存储桶时默认停用软删除。请注意,以下说明并不等同于设置强制实施特定软删除政策的组织政策,这意味着您仍可以根据需要通过指定政策在特定存储桶上启用软删除。

  1. 使用 Google Cloud CLI 创建 storage.defaultSoftDeletePolicy 标记,用于更改新存储分区的默认软删除保留时长。请注意,只有 storage.defaultSoftDeletePolicy 标记名称会更新默认的软删除保留时长。

    使用 gcloud resource-manager tags keys create 命令创建标记键:

     gcloud resource-manager tags keys create storage.defaultSoftDeletePolicy \
      --parent=organizations/ORGANIZATION_ID \
      --description="Configures the default softDeletePolicy for new Storage buckets."
    

    其中:

    • ORGANIZATION_ID 是您要设置默认软删除保留时长的组织的数字 ID。例如 12345678901。如需了解如何查找组织 ID,请参阅获取组织资源 ID
  2. 使用 gcloud resource-manager tags values create 命令为 0d(零天)创建标记值,以便在新存储桶上默认停用软删除保留期限:

      gcloud resource-manager tags values create 0d \
       --parent=ORGANIZATION_ID/storage.defaultSoftDeletePolicy \
       --description="Disables soft delete for new Storage buckets."
      done
    

    其中:

    • ORGANIZATION_ID 是您要设置默认软删除保留时长的组织的数字 ID。例如 12345678901
  3. 使用 gcloud resource-manager tags bindings create 命令将标记附加到资源:

     gcloud resource-manager tags bindings create \
       --tag-value=ORGANIZATION_ID/storage.defaultSoftDeletePolicy/0d \
       --parent=RESOURCE_ID
    

    其中:

    • ORGANIZATION_ID 是创建标记时所属组织的数字 ID。例如 12345678901

    • RESOURCE_ID 是您要为其创建标记绑定的组织的全名。例如,如需将标记附加到 organizations/7890123456,请输入 //cloudresourcemanager.googleapis.com/organizations/7890123456

为超出指定费用阈值的存储分区停用软删除

使用 Python 版 Cloud 客户端库,您可以使用 Python 客户端库示例为超出指定相对费用阈值的存储分区停用软删除功能。该示例会执行以下操作:

  1. 计算每个存储类别的相对存储费用。

  2. 评估存储分区累积的软删除费用。

  3. 为软删除用量设置费用阈值,并列出超出您设置的阈值的存储分区,还可让您为超出阈值的存储分区停用软删除。

如需详细了解如何设置 Python 客户端库并使用示例,请参阅 Cloud Storage 软删除费用分析器 README.md 页面

以下示例会为超出指定费用阈值的存储分区停用软删除功能:

from __future__ import annotations

import argparse
import json
import google.cloud.monitoring_v3 as monitoring_client


def get_relative_cost(storage_class: str) -> float:
    """Retrieves the relative cost for a given storage class and location.

    Args:
        storage_class: The storage class (e.g., 'standard', 'nearline').

    Returns:
        The price per GB from the https://cloud.google.com/storage/pricing,
        divided by the standard storage class.
    """
    relative_cost = {
        "STANDARD": 0.023 / 0.023,
        "NEARLINE": 0.013 / 0.023,
        "COLDLINE": 0.007 / 0.023,
        "ARCHIVE": 0.0025 / 0.023,
    }

    return relative_cost.get(storage_class, 1.0)


def get_soft_delete_cost(
    project_name: str,
    soft_delete_window: float,
    agg_days: int,
    lookback_days: int,
) -> dict[str, list[dict[str, float]]]:
    """Calculates soft delete costs for buckets in a Google Cloud project.

    Args:
        project_name: The name of the Google Cloud project.
        soft_delete_window: The time window in seconds for considering
          soft-deleted objects (default is 7 days).
        agg_days: Aggregate results over a time period, defaults to 30-day period
        lookback_days: Look back up to upto days, defaults to 360 days

    Returns:
        A dictionary with bucket names as keys and cost data for each bucket,
        broken down by storage class.
    """

    query_client = monitoring_client.QueryServiceClient()

    # Step 1: Get storage class ratios for each bucket.
    storage_ratios_by_bucket = get_storage_class_ratio(
        project_name, query_client, agg_days, lookback_days
    )

    # Step 2: Fetch soft-deleted bytes and calculate costs using Monitoring API.
    soft_deleted_costs = calculate_soft_delete_costs(
        project_name,
        query_client,
        soft_delete_window,
        storage_ratios_by_bucket,
        agg_days,
        lookback_days,
    )

    return soft_deleted_costs


def calculate_soft_delete_costs(
    project_name: str,
    query_client: monitoring_client.QueryServiceClient,
    soft_delete_window: float,
    storage_ratios_by_bucket: dict[str, float],
    agg_days: int,
    lookback_days: int,
) -> dict[str, list[dict[str, float]]]:
    """Calculates the relative cost of enabling soft delete for each bucket in a
       project for certain time frame in secs.

    Args:
        project_name: The name of the Google Cloud project.
        query_client: A Monitoring API query client.
        soft_delete_window: The time window in seconds for considering
          soft-deleted objects (default is 7 days).
        storage_ratios_by_bucket: A dictionary of storage class ratios per bucket.
        agg_days: Aggregate results over a time period, defaults to 30-day period
        lookback_days: Look back up to upto days, defaults to 360 days

    Returns:
        A dictionary with bucket names as keys and a list of cost data
        dictionaries
        for each bucket, broken down by storage class.
    """
    soft_deleted_bytes_time = query_client.query_time_series(
        monitoring_client.QueryTimeSeriesRequest(
            name=f"projects/{project_name}",
            query=f"""
                    {{  # Fetch 1: Soft-deleted (bytes seconds)
                        fetch gcs_bucket :: storage.googleapis.com/storage/v2/deleted_bytes
                        | value val(0) * {soft_delete_window}\'s\'  # Multiply by soft delete window
                        | group_by [resource.bucket_name, metric.storage_class], window(), .sum;

                        # Fetch 2: Total byte-seconds (active objects)
                        fetch gcs_bucket :: storage.googleapis.com/storage/v2/total_byte_seconds
                        | filter metric.type != 'soft-deleted-object'
                        | group_by [resource.bucket_name, metric.storage_class], window(1d), .mean  # Daily average
                        | group_by [resource.bucket_name, metric.storage_class], window(), .sum  # Total over window

                    }}  # End query definition
                    | every {agg_days}d  # Aggregate over larger time intervals
                    | within {lookback_days}d  # Limit data range for analysis
                    | ratio  # Calculate ratio (soft-deleted (bytes seconds)/ total (bytes seconds))
                    """,
        )
    )

    buckets: dict[str, list[dict[str, float]]] = {}
    missing_distribution_storage_class = []
    for data_point in soft_deleted_bytes_time.time_series_data:
        bucket_name = data_point.label_values[0].string_value
        storage_class = data_point.label_values[1].string_value
        # To include location-based cost analysis:
        # 1. Uncomment the line below:
        # location = data_point.label_values[2].string_value
        # 2. Update how you calculate 'relative_storage_class_cost' to factor in location
        soft_delete_ratio = data_point.point_data[0].values[0].double_value
        distribution_storage_class = bucket_name + " - " + storage_class
        storage_class_ratio = storage_ratios_by_bucket.get(
            distribution_storage_class
        )
        if storage_class_ratio is None:
            missing_distribution_storage_class.append(
                distribution_storage_class)
        buckets.setdefault(bucket_name, []).append({
            # Include storage class and location data for additional plotting dimensions.
            # "storage_class": storage_class,
            # 'location': location,
            "soft_delete_ratio": soft_delete_ratio,
            "storage_class_ratio": storage_class_ratio,
            "relative_storage_class_cost": get_relative_cost(storage_class),
        })

    if missing_distribution_storage_class:
        print(
            "Missing storage class for following buckets:",
            missing_distribution_storage_class,
        )
        raise ValueError("Cannot proceed with missing storage class ratios.")

    return buckets


def get_storage_class_ratio(
    project_name: str,
    query_client: monitoring_client.QueryServiceClient,
    agg_days: int,
    lookback_days: int,
) -> dict[str, float]:
    """Calculates storage class ratios for each bucket in a project.

    This information helps determine the relative cost contribution of each
    storage class to the overall soft-delete cost.

    Args:
        project_name: The Google Cloud project name.
        query_client: Google Cloud's Monitoring Client's QueryServiceClient.
        agg_days: Aggregate results over a time period, defaults to 30-day period
        lookback_days: Look back up to upto days, defaults to 360 days

    Returns:
        Ratio of Storage classes within a bucket.
    """
    request = monitoring_client.QueryTimeSeriesRequest(
        name=f"projects/{project_name}",
        query=f"""
            {{
            # Fetch total byte-seconds for each bucket and storage class
            fetch gcs_bucket :: storage.googleapis.com/storage/v2/total_byte_seconds
            | group_by [resource.bucket_name, metric.storage_class], window(), .sum;
            # Fetch total byte-seconds for each bucket (regardless of class)
            fetch gcs_bucket :: storage.googleapis.com/storage/v2/total_byte_seconds
            | group_by [resource.bucket_name], window(), .sum
            }}
            | ratio  # Calculate ratios of storage class size to total size
            | every {agg_days}d
            | within {lookback_days}d
            """,
    )

    storage_class_ratio = query_client.query_time_series(request)

    storage_ratios_by_bucket = {}
    for time_series in storage_class_ratio.time_series_data:
        bucket_name = time_series.label_values[0].string_value
        storage_class = time_series.label_values[1].string_value
        ratio = time_series.point_data[0].values[0].double_value

        # Create a descriptive key for the dictionary
        key = f"{bucket_name} - {storage_class}"
        storage_ratios_by_bucket[key] = ratio

    return storage_ratios_by_bucket


def soft_delete_relative_cost_analyzer(
    project_name: str,
    cost_threshold: float = 0.0,
    soft_delete_window: float = 604800,
    agg_days: int = 30,
    lookback_days: int = 360,
    list_buckets: bool = False,
    ) -> str | dict[str, float]: # Note potential string output
    """Identifies buckets exceeding the relative cost threshold for enabling soft delete.

    Args:
        project_name: The Google Cloud project name.
        cost_threshold: Threshold above which to consider removing soft delete.
        soft_delete_window: Time window for calculating soft-delete costs (in
          seconds).
        agg_days: Aggregate results over this time period (in days).
        lookback_days: Look back up to this many days.
        list_buckets: Return a list of bucket names (True) or JSON (False,
          default).

    Returns:
        JSON formatted results of buckets exceeding the threshold and costs
        *or* a space-separated string of bucket names.
    """

    buckets: dict[str, float] = {}
    for bucket_name, storage_sources in get_soft_delete_cost(
        project_name, soft_delete_window, agg_days, lookback_days
    ).items():
        bucket_cost = 0.0
        for storage_source in storage_sources:
            bucket_cost += (
                storage_source["soft_delete_ratio"]
                * storage_source["storage_class_ratio"]
                * storage_source["relative_storage_class_cost"]
            )
        if bucket_cost > cost_threshold:
            buckets[bucket_name] = round(bucket_cost, 4)

    if list_buckets:
        return " ".join(buckets.keys())  # Space-separated bucket names
    else:
        return json.dumps(buckets, indent=2)  # JSON output


def soft_delete_relative_cost_analyzer_main() -> None:
    # Sample run: python storage_soft_delete_relative_cost_analyzer.py <Project Name>
    parser = argparse.ArgumentParser(
        description="Analyze and manage Google Cloud Storage soft-delete costs."
    )
    parser.add_argument(
        "project_name", help="The name of the Google Cloud project to analyze."
    )
    parser.add_argument(
        "--cost_threshold",
        type=float,
        default=0.0,
        help="Relative Cost threshold.",
    )
    parser.add_argument(
        "--soft_delete_window",
        type=float,
        default=604800.0,
        help="Time window (in seconds) for considering soft-deleted objects.",
    )
    parser.add_argument(
        "--agg_days",
        type=int,
        default=30,
        help=(
            "Time window (in days) for aggregating results over a time period,"
            " defaults to 30-day period"
        ),
    )
    parser.add_argument(
        "--lookback_days",
        type=int,
        default=360,
        help=(
            "Time window (in days) for considering the how old the bucket to be."
        ),
    )
    parser.add_argument(
        "--list",
        type=bool,
        default=False,
        help="Return the list of bucketnames seperated by space.",
    )

    args = parser.parse_args()

    response = soft_delete_relative_cost_analyzer(
        args.project_name,
        args.cost_threshold,
        args.soft_delete_window,
        args.agg_days,
        args.lookback_days,
        args.list,
    )
    if not args.list:
        print(
            "To remove soft-delete policy from the listed buckets run:\n"
            # Capture output
            "python storage_soft_delete_relative_cost_analyzer.py"
            " [your-project-name] --[OTHER_OPTIONS] --list > list_of_buckets.txt \n"
            "cat list_of_buckets.txt | gcloud storage buckets update -I "
            "--clear-soft-delete",
            response,
        )
        return
    print(response)


if __name__ == "__main__":
    soft_delete_relative_cost_analyzer_main()

后续步骤