Get an estimate of monthly storage costs

Advanced website indexing incurs monthly data storage charges based on the size of the web data that you import into your data store. To get an estimate of the size of your web data before importing it, you can call the estimateDataSize method and specify the web pages that you want to import. The estimateDataSize method is a long-running operation that runs until the process for estimating the data size is complete. This can take from a few minutes to over an hour, depending on the number of web pages that you specify. After you have an estimate of the size of your web data, you can get an estimate of your monthly data storage costs using the Vertex AI Search and Conversation pricing page (see the Data Index pricing section) or the Google Cloud's pricing calculator (search for Vertex AI Search and Conversation).

Before you begin

Determine the URL patterns for the websites that you intend to include (and optionally exclude) when you import web data into your data store. You specify these URL patterns when you call the estimateDataSize method.

Procedure

To get an estimate of the size of your web data, follow these steps:

  1. Call the estimateDataSize method.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global:estimateDataSize" \
    -d '{
      "website_data_source": {
        "estimator_uri_patterns": {
          provided_uri_pattern: "URI_PATTERN_TO_INCLUDE",
          exact_match: EXACT_MATCH_BOOLEAN
        },
        "estimator_uri_patterns": {
          provided_uri_pattern: "URI_PATTERN_TO_EXCLUDE",
          exact_match: EXACT_MATCH_BOOLEAN,
          exclusive: EXCLUSIVE_BOOLEAN
        }
      }
    }'
    

    Replace the following:

    • PROJECT_ID: The ID of your project.

    • URI_PATTERN_TO_INCLUDE: The URL patterns for the websites that you want to include in your data size estimate.

    • URI_PATTERN_TO_EXCLUDE: (Optional) The URL patterns for the websites that you want to exclude from your data size estimate.

      For URI_PATTERN_TO_INCLUDE and URI_PATTERN_TO_EXCLUDE, you can use patterns similar to the following:

      • Entire website: www.mysite.com
      • Parts of a website: www.mysite.com/faq
      • Entire domain: mysite.com or *.mysite.com
    • EXCLUSIVE_BOOLEAN: (Optional) If true, then the provided URI pattern represents web pages that are excluded from your data size estimate. The default is false, which means that the provided URI pattern represents web pages that are included in your data size estimate.

    • EXACT_MATCH_BOOLEAN: (Optional) If true, then the provided URI pattern represents a single web page, instead of the web page and all of its children. The default is false, which means that the provided URI pattern represents the web page and all of its children.

    The output is similar to the following:

    {
      "name": "projects/PROJECT_ID/locations/global/operations/estimate-data-size-01234567890123456789",
      "metadata": {
        "@type":  "type.googleapis.com/google.cloud.discoveryengine.v1alpha.EstimateDataSizeMetadata"
      }
    }
    

    This output includes the name field, which is the name of the long-running operation. Save the name value to use in the following step.

  2. Poll the operations.get method.

    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    "https://discoveryengine.googleapis.com/v1/OPERATION_NAME"
    

    Replace OPERATION_NAME with the name value that you saved in the previous step. You can also get the operation name by listing long-running operations.

  3. Evaluate each response.

    • If a response does not contain "done": true, then the process for estimating the data size is not complete. Continue polling.

      The output is similar to the following:

      {
        "name": "projects/PROJECT_ID/locations/global/operations/estimate-data-size-01234567890123456789",
        "metadata": {
          "@type": "type.googleapis.com/google.cloud.discoveryengine.v1alpha.EstimateDataSizeMetadata"
        }
      }
      
    • If a response contains "done": true, then the process for estimating the data size is complete. Save the DATA_SIZE_BYTES value from the response to use in the following step.

      The output is similar to the following:

      {
        "name": "projects/PROJECT_ID/locations/global/operations/estimate-data-size-01234567890123456789",
        "metadata": {
          "@type": "type.googleapis.com/google.cloud.discoveryengine.v1alpha.EstimateDataSizeMetadata",
          "createTime": "2023-12-08T19:54:06.911248Z"
        },
        "done": true,
        "response": {
          "@type": "type.googleapis.com/google.cloud.discoveryengine.v1alpha.EstimateDataSizeResponse",
          "dataSizeBytes": DATA_SIZE_BYTES,
          "documentCount": DOCUMENT_COUNT
        }
      }
      

      This output includes the following values:

      • DATA_SIZE_BYTES: The estimated size of your web data, in bytes.

      • DOCUMENT_COUNT: The estimated number of web pages in your web data.

  4. Divide the DATA_SIZE_BYTES value from the previous step by 1,000,000,000 to get gigabytes. Save this value for the following step.

  5. To get an estimate for your monthly data storage costs:

    1. Go Google Cloud's pricing calculator.

    2. Click Add to estimate.

    3. Search for Vertex AI Search and Conversation and then click the Vertex AI Search and Conversation box.

    4. In the Data Index box, enter the estimated size of your web data, in gigabytes, from the previous step.

      See the Estimated cost box for your estimated data storage cost.