Creating a URL List

This page explains how to create a URL list and test your process for generating MD5 hashes. You can use Storage Transfer Service to transfer data from a list of public data locations to a Google Cloud Storage bucket. After you make the list of locations, you must upload the list to a URL that begins with http or https. When you configure your transfer, you simply refer to the URL list.

Creating the URL list

Use the following process to create a URL list:

  1. Create a tab-separated values (TSV) file.

    The URL list must be a tab-separated values (TSV) file that lists the URLs you want to download.

  2. Set the first line to the format specifier, TsvHttpData-1.0.

  3. Add additional lines for each object to transfer. Include the following tab-separated fields, in order, on each line:

    1. The HTTP or HTTPS URL of a source object. Ensure that each URL you specify is publicly accessible. For example, in Cloud Storage, you can share an object publicly and get a link to it. Also, ensure that the server's robots.txt file allows access to the URL.

      Check that the server that hosts each object supports Range requests and returns a Content-Length header in each response.

    2. The size of the object in bytes. Make sure that the specified size matches the actual size of the object when it is fetched. If the size does not match, the object will not be transferred.

    3. The Base64-encoded MD5 hash of the object. Make sure that the specified MD5 matches the MD5 computed from the transferred bytes. If the MD5 does not match, the object transfer will fail. For more information, see Generating MD5 hashes.
  4. Ensure that when your web server returns the URL list, it sets a strong Etag header in the HTTP response.

The following example shows a TSV file that identifies two objects to transfer:

TsvHttpData-1.0
https://example.com/buckets/obj1      1357      wHENa08V36iPYAsOa2JAdw==
https://example.com/buckets/obj2      2468      R9acAaveoPd2y8nniLUYbw==

Generating MD5 hashes

As noted above, your URL list must provide an MD5 hash for each object that is being transferred.

Use the following public object to verify that you are generating MD5 hashes correctly:

https://storage.googleapis.com/md5-test/md5-test

This object has a Base64-encoded MD5 hash of BfnRTwvHpofMOn2Pq7EVyQ==.

Copy the object to a local file called md5-test, and verify the hash using OpenSSL:

openssl md5 -binary md5-test | openssl enc -base64

What's next

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Storage Documentation