使用软删除

概览 用法

本页面介绍了如何为存储桶启用、停用、更新软删除政策以及检查其状态。如需了解如何列出和恢复软删除的对象,请参阅使用软删除的对象。 如需详细了解软删除,请参阅概览

所需的角色

如需获得创建和管理软删除政策所需的权限,请让管理员向您授予存储桶或包含该存储桶的项目的 Storage Admin (roles/storage.admin) IAM 角色。

此预定义角色包含创建和管理软删除政策所需的权限。如需查看所需的确切权限,请展开所需权限部分:

所需权限

如需创建和管理软删除政策,需要具备以下权限:

  • storage.buckets.get
  • storage.buckets.update
  • storage.buckets.list(只有在您计划使用 Google Cloud 控制台执行本页面上的说明时,才需要此权限)

如需了解如何授予角色,请参阅将 IAM 与存储桶搭配使用管理对项目的访问权限

编辑存储桶的软删除政策

如需为存储桶启用、停用或更新软删除政策,请执行以下操作:

控制台

  1. 在 Google Cloud 控制台中,进入 Cloud Storage 存储桶页面。

    进入“存储桶”

  2. 在存储桶列表中,点击要管理其软删除政策的存储桶的名称。

  3. 点击保护标签页。

  4. 软删除政策部分中,执行以下操作之一:

    • 如果存储桶没有软删除政策,请点击修改,选择保留时长的时间单位和时间长度,然后点击保存

    • 如果存储桶具有软删除政策,请点击修改以更改保留时长的时间单位和时间长度。

如需了解如何在 Google Cloud 控制台中获取失败的 Cloud Storage 操作的详细错误信息,请参阅问题排查

命令行

如需添加或修改存储桶的软删除政策,请使用带有 --soft-delete-duration 标志的 gcloud storage buckets update 命令:

  gcloud storage buckets update gs://BUCKET_NAME --soft-delete-duration=SOFT_DELETE_DURATION

其中:

  • BUCKET_NAME 是存储桶的名称。例如 my-bucket
  • SOFT_DELETE_DURATION 指定保留软删除对象的时长。例如,2w1d 表示两周零一天。如需了解详情,请参阅软删除保留时长

REST API

JSON API

  1. 安装并初始化 gcloud CLI,以便为 Authorization 标头生成访问令牌。

    或者,您可以创建访问令牌(使用 OAuth 2.0 Playground),并将其包含在 Authorization 标头中。

  2. 创建一个包含以下信息的 JSON 文件:

    {
      "softDeletePolicy": {
        "retentionDurationSeconds": "TIME_IN_SECONDS"
      }
    }

    其中,TIME_IN_SECONDS 是您要保留软删除对象的时长(以秒为单位)。例如 2678400。如需了解详情,请参阅软删除保留时长

  3. 使用 cURL,通过 PATCH Bucket 请求调用 JSON API

    curl -X PATCH --data-binary @JSON_FILE_NAME \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://storage.googleapis.com/storage/v1/b/BUCKET_NAME"

    其中:

    • JSON_FILE_NAME 是您在第 2 步中创建的 JSON 文件的路径。
    • BUCKET_NAME 是相关存储桶的名称,例如 my-bucket

删除存储桶的软删除政策

控制台

  1. 在 Google Cloud 控制台中,进入 Cloud Storage 存储桶页面。

    进入“存储桶”

  2. 在存储桶列表中,点击要删除其软删除政策的存储桶的名称。

  3. 点击保护标签页。

  4. 软删除政策部分,点击停用以移除存储桶的软删除政策。

  5. 点击确认

您也可以将政策时长设置为 0 天以停用。您还可以为多个存储桶批量停用软删除政策

如需了解如何在 Google Cloud 控制台中获取失败的 Cloud Storage 操作的详细错误信息,请参阅问题排查

命令行

如需从存储桶中移除软删除政策,请使用带 --clear-soft-delete 标志的 gcloud storage buckets update 命令:

  gcloud storage buckets update gs://BUCKET_NAME --clear-soft-delete

其中:

  • BUCKET_NAME 是存储桶的名称。例如 my-bucket

REST API

JSON API

  1. 安装并初始化 gcloud CLI,以便为 Authorization 标头生成访问令牌。

    或者,您可以创建访问令牌(使用 OAuth 2.0 Playground),并将其包含在 Authorization 标头中。

  2. 创建一个包含以下信息的 JSON 文件:

    {
      "softDeletePolicy": {
        "retentionDurationSeconds": "TIME_IN_SECONDS"
      }
    }

    如需停用存储桶的软删除政策,请对 TIME_IN_SECONDS 使用值 0

  3. 使用 cURL,通过 PATCH Bucket 请求调用 JSON API

    curl -X PATCH --data-binary @JSON_FILE_NAME \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://storage.googleapis.com/storage/v1/b/BUCKET_NAME"

    其中:

    • JSON_FILE_NAME 是您在第 2 步中创建的 JSON 文件的路径。
    • BUCKET_NAME 是相关存储桶的名称,例如 my-bucket

检查存储桶是否启用了软删除政策

控制台

  1. 在 Google Cloud 控制台中,进入 Cloud Storage 存储桶页面。

    进入“存储桶”

  2. 在存储桶列表中,点击要检查软删除政策的存储桶的名称。

  3. 点击保护标签页。

    状态会显示在软删除政策(用于数据恢复)部分中。

您还可以使用“保护”标签页来检查存储桶是否有软删除政策。

如需了解如何在 Google Cloud 控制台中获取失败的 Cloud Storage 操作的详细错误信息,请参阅问题排查

命令行

如需检查存储桶的软删除政策状态,请使用 gcloud storage buckets describe 命令:

  gcloud storage buckets describe gs://BUCKET_NAME \
      --format="default(soft_delete_policy)"

其中:

  • BUCKET_NAME 是存储桶的名称。例如 my-bucket

REST API

JSON API

  1. 安装并初始化 gcloud CLI,以便为 Authorization 标头生成访问令牌。

    或者,您可以使用 OAuth 2.0 Playground 创建访问令牌,并将其包含在 Authorization 标头中。

  2. 使用 cURL,通过 GET Bucket 请求调用 JSON API

    curl -X GET \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://storage.googleapis.com/storage/v1/b/BUCKET_NAME?fields=softDeletePolicy"

    其中 BUCKET_NAME 是相关存储桶的名称,例如 my-bucket

为项目中的多个存储桶或所有存储桶停用软删除

控制台

借助 Google Cloud 控制台,您可以为按软删除字节数最多或软删除字节数与实时字节数最高比率排序的存储桶停用软删除,这样就可以降低使用软删除所产生的费用。

  1. 在 Google Cloud 控制台中,进入 Cloud Storage 存储桶页面。

    进入“存储桶”

  2. 在 Cloud Storage 页面中,点击设置

  3. 点击软删除标签页。

  4. 删除的字节数最多的存储桶列表中,选择要停用软删除的存储桶。

  5. 点击关闭软删除

    所选存储桶上已停用软删除。

命令行

如需为项目中的所有存储桶停用软删除,请运行带有 --clear-soft-delete 标志的 gcloud storage buckets update 命令:

gcloud storage buckets update --clear-soft-delete gs://*

客户端库

Python

如需了解详情,请参阅 Cloud Storage Python API 参考文档

如需向 Cloud Storage 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


import argparse
import json
from typing import Dict, List
import google.cloud.monitoring_v3 as monitoring_client


def get_relative_cost(storage_class: str) -> float:
    """Retrieves the relative cost for a given storage class and location.

    Args:
        storage_class: The storage class (e.g., 'standard', 'nearline').

    Returns:
        The price per GB from the https://cloud.google.com/storage/pricing,
        divided by the standard storage class.
    """
    relative_cost = {
        "STANDARD": 0.023 / 0.023,
        "NEARLINE": 0.013 / 0.023,
        "COLDLINE": 0.007 / 0.023,
        "ARCHIVE": 0.0025 / 0.023,
    }

    return relative_cost.get(storage_class, 1.0)


def get_soft_delete_cost(
    project_name: str,
    soft_delete_window: float,
    agg_days: int,
    lookback_days: int,
) -> Dict[str, List[Dict[str, float]]]:
    """Calculates soft delete costs for buckets in a Google Cloud project.

    Args:
        project_name: The name of the Google Cloud project.
        soft_delete_window: The time window in seconds for considering
          soft-deleted objects (default is 7 days).
        agg_days: Aggregate results over a time period, defaults to 30-day period
        lookback_days: Look back up to upto days, defaults to 360 days

    Returns:
        A dictionary with bucket names as keys and cost data for each bucket,
        broken down by storage class.
    """

    query_client = monitoring_client.QueryServiceClient()

    # Step 1: Get storage class ratios for each bucket.
    storage_ratios_by_bucket = get_storage_class_ratio(
        project_name, query_client, agg_days, lookback_days
    )

    # Step 2: Fetch soft-deleted bytes and calculate costs using Monitoring API.
    soft_deleted_costs = calculate_soft_delete_costs(
        project_name,
        query_client,
        soft_delete_window,
        storage_ratios_by_bucket,
        agg_days,
        lookback_days,
    )

    return soft_deleted_costs


def calculate_soft_delete_costs(
    project_name: str,
    query_client: monitoring_client.QueryServiceClient,
    soft_delete_window: float,
    storage_ratios_by_bucket: Dict[str, float],
    agg_days: int,
    lookback_days: int,
) -> Dict[str, List[Dict[str, float]]]:
    """Calculates the relative cost of enabling soft delete for each bucket in a
       project for certain time frame in secs.

    Args:
        project_name: The name of the Google Cloud project.
        query_client: A Monitoring API query client.
        soft_delete_window: The time window in seconds for considering
          soft-deleted objects (default is 7 days).
        storage_ratios_by_bucket: A dictionary of storage class ratios per bucket.
        agg_days: Aggregate results over a time period, defaults to 30-day period
        lookback_days: Look back up to upto days, defaults to 360 days

    Returns:
        A dictionary with bucket names as keys and a list of cost data
        dictionaries
        for each bucket, broken down by storage class.
    """
    soft_deleted_bytes_time = query_client.query_time_series(
        monitoring_client.QueryTimeSeriesRequest(
            name=f"projects/{project_name}",
            query=f"""
                    {{  # Fetch 1: Soft-deleted (bytes seconds)
                        fetch gcs_bucket :: storage.googleapis.com/storage/v2/deleted_bytes
                        | value val(0) * {soft_delete_window}\'s\'  # Multiply by soft delete window
                        | group_by [resource.bucket_name, metric.storage_class], window(), .sum;

                        # Fetch 2: Total byte-seconds (active objects)
                        fetch gcs_bucket :: storage.googleapis.com/storage/v2/total_byte_seconds 
                        | filter metric.type != 'soft-deleted-object'
                        | group_by [resource.bucket_name, metric.storage_class], window(1d), .mean  # Daily average
                        | group_by [resource.bucket_name, metric.storage_class], window(), .sum  # Total over window

                    }}  # End query definition
                    | every {agg_days}d  # Aggregate over larger time intervals
                    | within {lookback_days}d  # Limit data range for analysis
                    | ratio  # Calculate ratio (soft-deleted (bytes seconds)/ total (bytes seconds))
                    """,
        )
    )

    buckets: Dict[str, List[Dict[str, float]]] = {}
    missing_distribution_storage_class = []
    for data_point in soft_deleted_bytes_time.time_series_data:
        bucket_name = data_point.label_values[0].string_value
        storage_class = data_point.label_values[1].string_value
        # To include location-based cost analysis:
        # 1. Uncomment the line below:
        # location = data_point.label_values[2].string_value
        # 2. Update how you calculate 'relative_storage_class_cost' to factor in location
        soft_delete_ratio = data_point.point_data[0].values[0].double_value
        distribution_storage_class = bucket_name + " - " + storage_class
        storage_class_ratio = storage_ratios_by_bucket.get(
            distribution_storage_class
        )
        if storage_class_ratio is None:
            missing_distribution_storage_class.append(
                distribution_storage_class)
        buckets.setdefault(bucket_name, []).append({
            # Include storage class and location data for additional plotting dimensions.
            # "storage_class": storage_class,
            # 'location': location,
            "soft_delete_ratio": soft_delete_ratio,
            "storage_class_ratio": storage_class_ratio,
            "relative_storage_class_cost": get_relative_cost(storage_class),
        })

    if missing_distribution_storage_class:
        print(
            "Missing storage class for following buckets:",
            missing_distribution_storage_class,
        )
        raise ValueError("Cannot proceed with missing storage class ratios.")

    return buckets


def get_storage_class_ratio(
    project_name: str,
    query_client: monitoring_client.QueryServiceClient,
    agg_days: int,
    lookback_days: int,
) -> Dict[str, float]:
    """Calculates storage class ratios for each bucket in a project.

    This information helps determine the relative cost contribution of each
    storage class to the overall soft-delete cost.

    Args:
        project_name: The Google Cloud project name.
        query_client: Google Cloud's Monitoring Client's QueryServiceClient.
        agg_days: Aggregate results over a time period, defaults to 30-day period
        lookback_days: Look back up to upto days, defaults to 360 days

    Returns:
        Ratio of Storage classes within a bucket.
    """
    request = monitoring_client.QueryTimeSeriesRequest(
        name=f"projects/{project_name}",
        query=f"""
            {{
            # Fetch total byte-seconds for each bucket and storage class
            fetch gcs_bucket :: storage.googleapis.com/storage/v2/total_byte_seconds
            | group_by [resource.bucket_name, metric.storage_class], window(), .sum;
            # Fetch total byte-seconds for each bucket (regardless of class)
            fetch gcs_bucket :: storage.googleapis.com/storage/v2/total_byte_seconds
            | group_by [resource.bucket_name], window(), .sum
            }}
            | ratio  # Calculate ratios of storage class size to total size
            | every {agg_days}d
            | within {lookback_days}d
            """,
    )

    storage_class_ratio = query_client.query_time_series(request)

    storage_ratios_by_bucket = {}
    for time_series in storage_class_ratio.time_series_data:
        bucket_name = time_series.label_values[0].string_value
        storage_class = time_series.label_values[1].string_value
        ratio = time_series.point_data[0].values[0].double_value

        # Create a descriptive key for the dictionary
        key = f"{bucket_name} - {storage_class}"
        storage_ratios_by_bucket[key] = ratio

    return storage_ratios_by_bucket


def soft_delete_relative_cost_analyzer(
    project_name: str,
    cost_threshold: float = 0.0,
    soft_delete_window: float = 604800,
    agg_days: int = 30,
    lookback_days: int = 360,
    list_buckets: bool = False,
) -> str | Dict[str, float]:  # Note potential string output
    """Identifies buckets exceeding the relative cost threshold for enabling soft delete.

    Args:
        project_name: The Google Cloud project name.
        cost_threshold: Threshold above which to consider removing soft delete.
        soft_delete_window: Time window for calculating soft-delete costs (in
          seconds).
        agg_days: Aggregate results over this time period (in days).
        lookback_days: Look back up to this many days.
        list_buckets: Return a list of bucket names (True) or JSON (False,
          default).

    Returns:
        JSON formatted results of buckets exceeding the threshold and costs
        *or* a space-separated string of bucket names.
    """

    buckets: Dict[str, float] = {}
    for bucket_name, storage_sources in get_soft_delete_cost(
        project_name, soft_delete_window, agg_days, lookback_days
    ).items():
        bucket_cost = 0.0
        for storage_source in storage_sources:
            bucket_cost += (
                storage_source["soft_delete_ratio"]
                * storage_source["storage_class_ratio"]
                * storage_source["relative_storage_class_cost"]
            )
        if bucket_cost > cost_threshold:
            buckets[bucket_name] = round(bucket_cost, 4)

    if list_buckets:
        return " ".join(buckets.keys())  # Space-separated bucket names
    else:
        return json.dumps(buckets, indent=2)  # JSON output


def soft_delete_relative_cost_analyzer_main() -> None:
    # Sample run: python storage_soft_delete_relative_cost_analyzer.py <Project Name>
    parser = argparse.ArgumentParser(
        description="Analyze and manage Google Cloud Storage soft-delete costs."
    )
    parser.add_argument(
        "project_name", help="The name of the Google Cloud project to analyze."
    )
    parser.add_argument(
        "--cost_threshold",
        type=float,
        default=0.0,
        help="Relative Cost threshold.",
    )
    parser.add_argument(
        "--soft_delete_window",
        type=float,
        default=604800.0,
        help="Time window (in seconds) for considering soft-deleted objects.",
    )
    parser.add_argument(
        "--agg_days",
        type=int,
        default=30,
        help=(
            "Time window (in days) for aggregating results over a time period,"
            " defaults to 30-day period"
        ),
    )
    parser.add_argument(
        "--lookback_days",
        type=int,
        default=360,
        help=(
            "Time window (in days) for considering the how old the bucket to be."
        ),
    )
    parser.add_argument(
        "--list",
        type=bool,
        default=False,
        help="Return the list of bucketnames seperated by space.",
    )

    args = parser.parse_args()

    response = soft_delete_relative_cost_analyzer(
        args.project_name,
        args.cost_threshold,
        args.soft_delete_window,
        args.agg_days,
        args.lookback_days,
        args.list,
    )
    if not args.list:
        print(
            "To remove soft-delete policy from the listed buckets run:\n"
            # Capture output
            "python storage_soft_delete_relative_cost_analyzer.py"
            " [your-project-name] --[OTHER_OPTIONS] --list > list_of_buckets.txt \n"
            "cat list_of_buckets.txt | gcloud storage buckets update -I "
            "--clear-soft-delete",
            response,
        )
        return
    print(response)


if __name__ == "__main__":
    soft_delete_relative_cost_analyzer_main()

后续步骤