Creating a URL list

This page explains how to create a URL list and test your process for generating MD5 hashes. You can use Storage Transfer Service to transfer data from a list of public data locations to a Cloud Storage bucket. When you configure your transfer, you simply refer to the URL list.

Requirements

The following are requirements of URL lists:

  • The URL list must be a tab-separated values (TSV) file.

  • URLs must be sorted in UTF-8 lexicographical order.

  • The server sets a strong Etag header in the HTTP response when it returns the URL list.

  • The URL list is accessible from a URL beginning with either http or https.

To ensure your data is transferable, verify the following:

  • That each URL you specify is publicly accessible.

    For example, in Cloud Storage, you can share an object publicly and get a link to it.

  • The server's robots.txt file allows access to each URL.

  • The server hosting each object:

    • Supports Range requests.
    • Returns a Content-Length header in each response.

Formatting the URL list

Do the following to format a URL list:

  1. Create a tab-separated values (TSV) file.

  2. Insert the format specifier TsvHttpData-1.0 on the first line.

  3. Add additional lines for each object to transfer. Include the following tab-separated fields, in order, on each line:

    • The HTTP or HTTPS URL of a source object.

      When an object located at http(s)://[HOSTNAME]:[PORT]/[URL_PATH] is transferred to Cloud Storage, the name of the object in Cloud Storage is [HOSTNAME]/[URL_PATH].

    • The size of the object in bytes.

      Ensure that the specified size matches the actual size of the object when it is fetched. If the size of the object received by Cloud Storage does not match the size specified, the object transfer will fail.

    • The Base64-encoded MD5 checksum of the object.

      Ensure that the specified MD5 checksum matches the MD5 checksum computed from the transferred bytes. If the MD5 checksum of the object received by Cloud Storage does not match the MD5 checksum specified, the object transfer will fail.

      See Generating MD5 checksums for information on generating MD5 checksums.

    The following is a sample TSV file that specifies two objects to transfer:

    TsvHttpData-1.0
    https://example.com/buckets/obj1      1357      wHENa08V36iPYAsOa2JAdw==
    https://example.com/buckets/obj2      2468      R9acAaveoPd2y8nniLUYbw==
    

Generating MD5 checksums

Cloud Storage uses the MD5 checksum you provide for each to verify your data's integrity.

Use the following public object to verify that you are generating MD5 checksum correctly:

https://storage.googleapis.com/md5-test/md5-test

This object has a Base64-encoded MD5 checksum of BfnRTwvHpofMOn2Pq7EVyQ==.

Copy the object to a local file called md5-test, and verify the checksum using OpenSSL:

openssl md5 -binary md5-test | openssl enc -base64

What's next